Annotation of researchv10no/cmd/sml/doc/refman/lex.tex, revision 1.1.1.1

1.1       root        1: \chapter{Lexical analysis}
                      2: \section{Reserved words}
                      3: The following are reserved words.   They may not be used as
                      4: identifiers.  In this document the alphabetic reserved words are always
                      5: shown in boldface.
                      6: \begin{quote}
                      7: \raggedright
                      8: \tt
                      9: abstraction abstype and andalso
                     10: as case datatype else end exception do fn fun functor handle if in 
                     11: infix infixr let local nonfix of op open overload 
                     12: raise rec sharing sig signature
                     13: struct structure then type val while with withtype orelse
                     14: 
                     15: \verb"{  }  [  ]  ,  ;  (  )  ->  *  |  :  ...  =  =>  #  _"
                     16: \end{quote}
                     17: \section{Special constants}
                     18: An integer constant is any non-empty sequence of digits, possibly preceded
                     19: by a negation symbol (\verb|~|).
                     20: 
                     21: A real constant is an integer constant, possibly followed by a point (.)
                     22: and one or more digits, possibly followed by an exponent symbol(E) and
                     23: an integer constant; at least one of the optional parts must occur,
                     24: hence no integer constant is a real constant.  Examples: \verb|0.7| ,
                     25: \verb|~3.32E5| , \verb|3E~7| .  Non-examples: \verb|23| , \verb|.3| ,
                     26: \verb|4.E5| , \verb|1E2.0| .
                     27: 
                     28: A string constant is a sequence, between quotes (\verb|"|), of zero or more
                     29: printable characters, spaces, or escape sequences.  Each escape sequence
                     30: is introduced by the escape character \verb|\|, and stands for a character
                     31: sequence.  The allowed escape sequences are as follows (all other
                     32: uses of \verb|\| being incorrect):
                     33: \begin{tabular}{l p{3.9in}}
                     34: \verb|\n| & A single character interpreted by the system as end-of-line.\\
                     35: \verb|\t| & Tab. \\
                     36: \verb|\^c| & The control character c, for any appropriate c.\\
                     37: \verb|\ddd| &  The single character with ASCII code ddd (3 decimal digits).\\
                     38: \verb|\"| & The double-quote character (\verb'"'). \\
                     39: \verb|\\| &  The backslash character (\verb"\").\\
                     40: \verb|\f___f\| & This sequence is ignored, where f\_\_\_f stands for a
                     41: sequence of one or more formatting characters (a subset of the
                     42: non-printable characters including at least space, tab, newline,
                     43: formfeed).  This allows one to write long strings on more than one
                     44: line, by writing \verb"\" at the end of one line and at the start of the
                     45: next.
                     46: \end{tabular}
                     47: 
                     48: \section{Identifiers}
                     49: 
                     50: An identifier is either {\em alphanumeric}: any sequence of letters,
                     51: digits, primes (\verb"'"), and underbars (\verb"_") starting with a letter or a
                     52: prime, or {\em symbolic}: any sequence of the following symbols
                     53: \begin{quote}
                     54: \verb"! % & $ + - / : < = > ? @ \ ~ \^ | # * `"
                     55: \end{quote}
                     56: In either case, however, reserved words are excluded.  This means
                     57: that for example \verb"_" and \verb"|" are not identifiers, but
                     58: \verb"also_ran" and \verb"|=|" are identifiers.
                     59: 
                     60: Identifiers are used to stand for 9 different classes of objects,
                     61: which occupy 6 different name spaces, as follows:
                     62: \begin{enumerate}
                     63: \item value variables ({\it var}), value constructors ({\it con}), \\
                     64: exception constructors ({\it exncon})
                     65: \item type variables ({\it tyvar})
                     66: \item type constructors ({\it tycon})
                     67: \item record labels ({\it lab})
                     68: \item structures ({\it str}), functors ({\it fct})
                     69: \item signatures ({\it sgn})
                     70: \end{enumerate}
                     71: Thus, an identifier could not in the same scope stand for both a
                     72: value variable and a constructor, but an identifier can
                     73: be bound simultaneously to a type constructor and a signature.
                     74: 
                     75: To remove some ambiguity, it is recommended that constructors start
                     76: with an uppercase letter, and variables start with a lowercase
                     77: letter; but this is a convention, not an enforced rule  (it is
                     78: confounded, for example, by symbolic identifiers).
                     79: 
                     80: A type variable ({\it tyvar}) may be any alphanumeric identifier starting
                     81: with a prime.  The other eight classes ({\it var, con, tycon, ...})
                     82: are represented by identifiers not starting with a prime.  The class
                     83: lab is also extended to include the numeric labels 1, 2, 3, ... .
                     84: 
                     85: Type variables are therefore disjoint from the other classes.
                     86: Otherwise, the class of an occurrence of an identifier is determined
                     87: from context.
                     88: 
                     89: Spaces or parentheses are sometimes needed 
                     90: to separate symbolic identifiers and reserved words.  Two examples are
                     91: 
                     92: \begin{tabular}{c c c c c}
                     93: \verb"a:= !b" &or& \verb"a:=(!b)" &but not& \verb"a:=!b"\\
                     94: \verb"~ :int->int" &or& \verb"(~):int->int" &but not& \verb"~:int->int"
                     95: \end{tabular}
                     96: 
                     97: These punctuation characters cannot be constituents of identifiers
                     98: and therefore never need spaces around them:
                     99: \begin{quotation}
                    100: \verb| " ( ) , . ; [ ] { } |
                    101: \end{quotation}
                    102: 
                    103: \section{Comments}
                    104: A comment is a character sequence (outside of a string)
                    105: within comment brackets (* *) in which comment brackets are properly
                    106: nested.
                    107: 
                    108: \section{The bare syntax}
                    109: The Standard ML bare language is obtained by stripping the full
                    110: language of any {\em derived} forms (those that may be defined in
                    111: terms of other constructs in the language), and of any constructs
                    112: related to the module system.  The bare language will be explained
                    113: in Chapters \ref{eval} and \ref{types},
                    114: and successive chapters describe augmentations
                    115: of it that yield the full language.
                    116: 
                    117: Figure~\ref{bare} shows the syntax of the bare language.  The notation
                    118: \begin{quotation}
                    119: phrase x \rep{k} x phrase
                    120: \end{quotation}
                    121: indicates the repetition of the {\em phrase} at least $k$  times,
                    122: separated by the punctuation character $x$.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.