You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The name token is used in nearly all monticore-languages. The restrictions of the name make it harder for users to describe their problem in their language.
e.g. CDs:
classKäse { // "ä" not allowedboolflüßig; // "ü" and "ß"
}
or ODs:
objectÉpoisses: Käse { // "É"flüßig = false;
}
and so on.
Limitation for General Languages
Other languages have a much broader definition of names. A monticore-grammar for these languages is either more restrictive and cannot parse all valid instances, or it redefines the name token and is hard to use with other monticore-languages.
Letters and digits may be drawn from the entire Unicode character set, which supports most writing scripts in use in the world today, including the large sets for Chinese, Japanese, and Korean. This allows programmers to use identifiers in their programs that are written in their native languages.
αρετη is explicitly mentioned in the java specification as an allowed identifier.
Allow Unicode-Characters for name token in MCBasis.mc4. This allows the developer to create models closer to her native language, and ensures that general languages such as Java & XML can be parsed without overwriting the name token.
For continuity: Redefining the content of "Name" would be disruptive. so that should not happen.
But, it makes sense to allow alternative Definitions (Extensions) for "Name", e.g. "UniCodeName" with strong Integration into
(a) Parsing, (b) Symbol-Infrastructure, e.g. symbol X = UnicodeName ... should also work.
(while also allowing restructed "Name" usage in the same models, but for different places)
Furthermore to clarify: Is it enough to define one fixed new Nonterminal (like "UniCodeName") or do we need a general "Name"- extension/redefinition mechanism that allows to introduce new such nonterminals on the fly? (I hope not?)
PS: Some languages (SysML) allow Spaces within Names, but cover them with "...", like in "Driver Seat".
Is your feature request related to a problem? Please describe.
Current Definition of "Name" Token
Only letters
a
-z
are allowed.monticore/monticore-grammar/src/main/grammars/de/monticore/MCBasics.mc4
Lines 20 to 22 in 585f2d6
Limitation for UML-Languages
The name token is used in nearly all monticore-languages. The restrictions of the name make it harder for users to describe their problem in their language.
e.g. CDs:
or ODs:
and so on.
Limitation for General Languages
Other languages have a much broader definition of names. A monticore-grammar for these languages is either more restrictive and cannot parse all valid instances, or it redefines the name token and is hard to use with other monticore-languages.
Java:
https://docs.oracle.com/javase/specs/jls/se23/html/jls-3.html#jls-3.8
αρετη
is explicitly mentioned in the java specification as an allowed identifier.XML
XML also allows unicode-characters in the identifier. As a consequence, the MontiCore-XML Language Overrides the name token:
https://github.com/MontiCore/xml/blob/ed432849540eab55c952aabfa748b923c541b55c/src/main/grammars/de/monticore/lang/XMLBasis.mc4#L22-L47
Describe the solution you'd like?
Allow Unicode-Characters for
name
token inMCBasis.mc4
. This allows the developer to create models closer to her native language, and ensures that general languages such as Java & XML can be parsed without overwriting thename
token.There is a unicode-identifier standard, which can serve as a language-independent basis: https://www.unicode.org/reports/tr31/
Java-RTE also knows the unicode-identifier standard: https://docs.oracle.com/en/java/javase/23/docs/api/java.base/java/lang/Character.html#isUnicodeIdentifierStart(int)
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: