|
|
1.1 root 1: \chapter{Lexical analysis}
2: \section{Reserved words}
3: The following are reserved words. They may not be used as
4: identifiers. In this document the alphabetic reserved words are always
5: shown in boldface.
6: \begin{quote}
7: \raggedright
8: \tt
9: abstraction abstype and andalso
10: as case datatype else end exception do fn fun functor handle if in
11: infix infixr let local nonfix of op open overload
12: raise rec sharing sig signature
13: struct structure then type val while with withtype orelse
14:
15: \verb"{ } [ ] , ; ( ) -> * | : ... = => # _"
16: \end{quote}
17: \section{Special constants}
18: An integer constant is any non-empty sequence of digits, possibly preceded
19: by a negation symbol (\verb|~|).
20:
21: A real constant is an integer constant, possibly followed by a point (.)
22: and one or more digits, possibly followed by an exponent symbol(E) and
23: an integer constant; at least one of the optional parts must occur,
24: hence no integer constant is a real constant. Examples: \verb|0.7| ,
25: \verb|~3.32E5| , \verb|3E~7| . Non-examples: \verb|23| , \verb|.3| ,
26: \verb|4.E5| , \verb|1E2.0| .
27:
28: A string constant is a sequence, between quotes (\verb|"|), of zero or more
29: printable characters, spaces, or escape sequences. Each escape sequence
30: is introduced by the escape character \verb|\|, and stands for a character
31: sequence. The allowed escape sequences are as follows (all other
32: uses of \verb|\| being incorrect):
33: \begin{tabular}{l p{3.9in}}
34: \verb|\n| & A single character interpreted by the system as end-of-line.\\
35: \verb|\t| & Tab. \\
36: \verb|\^c| & The control character c, for any appropriate c.\\
37: \verb|\ddd| & The single character with ASCII code ddd (3 decimal digits).\\
38: \verb|\"| & The double-quote character (\verb'"'). \\
39: \verb|\\| & The backslash character (\verb"\").\\
40: \verb|\f___f\| & This sequence is ignored, where f\_\_\_f stands for a
41: sequence of one or more formatting characters (a subset of the
42: non-printable characters including at least space, tab, newline,
43: formfeed). This allows one to write long strings on more than one
44: line, by writing \verb"\" at the end of one line and at the start of the
45: next.
46: \end{tabular}
47:
48: \section{Identifiers}
49:
50: An identifier is either {\em alphanumeric}: any sequence of letters,
51: digits, primes (\verb"'"), and underbars (\verb"_") starting with a letter or a
52: prime, or {\em symbolic}: any sequence of the following symbols
53: \begin{quote}
54: \verb"! % & $ + - / : < = > ? @ \ ~ \^ | # * `"
55: \end{quote}
56: In either case, however, reserved words are excluded. This means
57: that for example \verb"_" and \verb"|" are not identifiers, but
58: \verb"also_ran" and \verb"|=|" are identifiers.
59:
60: Identifiers are used to stand for 9 different classes of objects,
61: which occupy 6 different name spaces, as follows:
62: \begin{enumerate}
63: \item value variables ({\it var}), value constructors ({\it con}), \\
64: exception constructors ({\it exncon})
65: \item type variables ({\it tyvar})
66: \item type constructors ({\it tycon})
67: \item record labels ({\it lab})
68: \item structures ({\it str}), functors ({\it fct})
69: \item signatures ({\it sgn})
70: \end{enumerate}
71: Thus, an identifier could not in the same scope stand for both a
72: value variable and a constructor, but an identifier can
73: be bound simultaneously to a type constructor and a signature.
74:
75: To remove some ambiguity, it is recommended that constructors start
76: with an uppercase letter, and variables start with a lowercase
77: letter; but this is a convention, not an enforced rule (it is
78: confounded, for example, by symbolic identifiers).
79:
80: A type variable ({\it tyvar}) may be any alphanumeric identifier starting
81: with a prime. The other eight classes ({\it var, con, tycon, ...})
82: are represented by identifiers not starting with a prime. The class
83: lab is also extended to include the numeric labels 1, 2, 3, ... .
84:
85: Type variables are therefore disjoint from the other classes.
86: Otherwise, the class of an occurrence of an identifier is determined
87: from context.
88:
89: Spaces or parentheses are sometimes needed
90: to separate symbolic identifiers and reserved words. Two examples are
91:
92: \begin{tabular}{c c c c c}
93: \verb"a:= !b" &or& \verb"a:=(!b)" &but not& \verb"a:=!b"\\
94: \verb"~ :int->int" &or& \verb"(~):int->int" &but not& \verb"~:int->int"
95: \end{tabular}
96:
97: These punctuation characters cannot be constituents of identifiers
98: and therefore never need spaces around them:
99: \begin{quotation}
100: \verb| " ( ) , . ; [ ] { } |
101: \end{quotation}
102:
103: \section{Comments}
104: A comment is a character sequence (outside of a string)
105: within comment brackets (* *) in which comment brackets are properly
106: nested.
107:
108: \section{The bare syntax}
109: The Standard ML bare language is obtained by stripping the full
110: language of any {\em derived} forms (those that may be defined in
111: terms of other constructs in the language), and of any constructs
112: related to the module system. The bare language will be explained
113: in Chapters \ref{eval} and \ref{types},
114: and successive chapters describe augmentations
115: of it that yield the full language.
116:
117: Figure~\ref{bare} shows the syntax of the bare language. The notation
118: \begin{quotation}
119: phrase x \rep{k} x phrase
120: \end{quotation}
121: indicates the repetition of the {\em phrase} at least $k$ times,
122: separated by the punctuation character $x$.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.