|
|
1.1 root 1: .\" Copyright (c) 1980 Regents of the University of California.
2: .\" All rights reserved. The Berkeley software License Agreement
3: .\" specifies the terms and conditions for redistribution.
4: .\"
5: .\" @(#)ch7.n 6.2 (Berkeley) 5/14/86
6: .\"
7: ." $Header: ch7.n,v 1.3 83/07/01 11:22:58 layer Exp $
8: .Lc The\ Lisp\ Reader 7
9: .sh 2 Introduction \n(ch 1
10: .pp
11: The
12: .i read
13: function is responsible for converting
14: a stream of
15: characters into a Lisp expression.
16: .i Read
17: is table driven and the table it uses is called a
18: .i readtable.
19: The
20: .i print
21: function does the
22: inverse of
23: .i read ;
24: it converts a Lisp expression into a stream of
25: characters.
26: Typically the conversion is done in such
27: a way that if that stream of characters were read by
28: .i read ,
29: the
30: result would be an expression equal to the one
31: .i print
32: was given.
33: .i Print
34: must also refer to the readtable in order to determine
35: how to format its output.
36: The
37: .i explode
38: function, which returns a list of characters rather than
39: printing them, must also refer to the readtable.
40: .pp
41: A readtable is created
42: with the
43: .i makereadtable
44: function, modified with the
45: .i setsyntax
46: function and interrogated with the
47: .i getsyntax
48: function.
49: The structure of a readtable is hidden from the user - a
50: readtable should
51: only be manipulated with the three functions mentioned above.
52: .pp
53: There is one distinguished readtable called the
54: .i current
55: .i readtable
56: whose value determines what
57: .i read ,
58: .i print
59: and
60: .i explode
61: do.
62: The current readtable is the value of the symbol
63: .i readtable .
64: Thus it is possible to rapidly change
65: the current syntax by lambda binding
66: a different readtable to the symbol
67: .i readtable.
68: When the binding is undone, the syntax reverts to its old form.
69: .sh +0 Syntax\ Classes
70: .pp
71: The readtable describes how each of the 128 ascii characters should
72: be treated by the reader and printer.
73: Each character belongs to a
74: .i syntax
75: .i class
76: which has three properties:
77: .ip character\ class\ -
78: Tells what the reader should do when it sees this character.
79: There are a large number of character classes.
80: They are described below.
81: .ip separator\ -
82: Most types of tokens the reader constructs are one character
83: long.
84: Four token types have an arbitrary length: number (1234),
85: symbol print name (franz),
86: escaped symbol print name (|franz|), and string ("franz").
87: The reader can easily determine when it has
88: come to the
89: end of one of the last two types: it just looks for the
90: matching delimiter (| or ").
91: When the reader is reading a number or symbol print name, it
92: stops reading when it comes to a character with the
93: .i separator
94: property.
95: The separator character is pushed back into the input stream and will
96: be the first character read when the reader is called again.
97: .ip escape\ -
98: Tells the printer when to put escapes in front of, or around, a symbol
99: whose print name contains this character.
100: There are three possibilities: always escape a symbol with this character
101: in it, only escape a symbol if this is the only character in the symbol,
102: and only escape a symbol if this is the first character in the symbol.
103: [note: The printer will always escape a symbol which, if printed out, would
104: look like a valid number.]
105: .pp
106: When the Lisp system is built, Lisp code is added to a C-coded kernel
107: and the result becomes the standard lisp system.
108: The readtable present in the C-coded kernel, called the
109: .i raw
110: .i readtable ,
111: contains the bare necessities for reading in Lisp code.
112: During the
113: construction of the complete Lisp system,
114: a copy is made of the raw readtable and
115: then the copy is modified by adding macro characters.
116: The result is what is called the
117: .i standard
118: .i readtable .
119: When a new readtable is created with
120: .i makereadtable,
121: a copy is made of either the
122: raw readtable
123: or the current readtable (which is likely to be the standard readtable).
124: .sh +0 Reader\ Operations
125: .pp
126: The reader has a very simple algorithm.
127: It is either
128: .i scanning
129: for a token,
130: .i collecting
131: a token,
132: or
133: .i processing
134: a token.
135: Scanning involves reading characters and throwing
136: away those which don't start tokens (such as blanks and tabs).
137: Collecting means gathering the characters which make up a
138: token into a buffer.
139: Processing may involve creating symbols, strings, lists,
140: fixnums, bignums or flonums or calling a user written function called
141: a character macro.
142: .pp
143: The components of the syntax class determine when the reader
144: switches between the scanning, collecting and processing states.
145: The reader will continue scanning as long as the character class
146: of the characters it reads is
147: .i cseparator.
148: When it reads a character whose character class is not
149: .i cseparator
150: it stores that character in its buffer and begins the collecting phase.
151: .pp
152: If the character class of that first character is
153: .i ccharacter ,
154: .i cnumber ,
155: .i cperiod ,
156: or
157: .i csign .
158: then it will continue collecting until it runs into a character whose
159: syntax class has the
160: .i separator
161: property.
162: (That last character will be pushed back into the input buffer and will
163: be the first character read next time.)
164: Now the reader goes into the processing phase, checking to see if the
165: token it read is a number or symbol.
166: It is important to note that after
167: the first character is collected the component of the syntax class which
168: tells the reader to stop
169: collecting is the
170: .i separator
171: property, not the character class.
172: .pp
173: If the character class of the character which stopped the scanning is not
174: .i ccharacter ,
175: .i cnumber ,
176: .i cperiod ,
177: or
178: .i csign .
179: then the reader processes that character immediately.
180: The character classes
181: .i csingle-macro ,
182: .i csingle-splicing-macro ,
183: and
184: .i csingle-infix-macro
185: will act like
186: .i ccharacter
187: if the following token is not a
188: .i separator.
189: The processing which is done for a given character class
190: is described in detail in the next section.
191: .sh +0 Character\ Classes
192: .de Cc
193: .sp 2v
194: .tl '\fI\\$1\fP''raw readtable:\\$2'
195: .tl '''standard readtable:\\$3'
196: ..
197: .pc
198: .Cc ccharacter A-Z\ a-z\ ^H\ !#$%&*,/:;<=>?@^_`{}~ A-Z\ a-z\ ^H\ !$%&*/:;<=>?@^_{}~
199: .pc %
200: A normal character.
201: .Cc cnumber 0-9 0-9
202: This type is a digit.
203: The syntax for an integer (fixnum or bignum) is a string of
204: .i cnumber
205: characters optionally followed by a
206: .i cperiod.
207: If the digits are not followed by a
208: .i cperiod ,
209: then they are interpreted in base
210: .i ibase
211: which must be eight or ten.
212: The syntax for a floating point number is
213: either zero or more
214: .i cnumber 's
215: followed by a
216: .i cperiod
217: and then followed by one or more
218: .i cnumber 's.
219: A floating point number
220: may also be an integer or floating point number followed
221: by 'e' or 'd', an optional '+' or '\-'
222: and then zero or more
223: .i cnumber 's.
224: .Cc csign +\- +\-
225: A leading sign for a number.
226: No other characters should be given this class.
227: .Cc cleft-paren ( (
228: A left parenthesis.
229: Tells the reader to begin forming a list.
230: .Cc cright-paren ) )
231: A right parenthesis.
232: Tells the reader that it has reached the end of a list.
233: .Cc cleft-bracket [ [
234: A left bracket.
235: Tells the reader that it should begin forming a list.
236: See the description of
237: .i cright-bracket
238: for the difference between cleft-bracket and cleft-paren.
239: .Cc cright-bracket ] ]
240: A right bracket.
241: A
242: .i cright-bracket
243: finishes the formation of the current
244: list and all enclosing lists until it finds one which
245: begins with a
246: .i cleft-bracket
247: or until it reaches the
248: top level list.
249: .Cc cperiod . .
250: The period is used to separate element of a cons cell
251: [e.g. (a\ .\ (b\ .\ nil)) is the same as (a\ b)].
252: .i cperiod
253: is also used in numbers as described above.
254: .Cc cseparator ^I-^M\ esc\ space ^I-^M\ esc\ space
255: Separates tokens. When the reader is scanning, these character
256: are passed over.
257: Note: there is a difference between the
258: .i cseparator
259: character class and the
260: .i separator
261: property of a syntax class.
262: .Cc csingle-quote \\' \\'
263: This causes
264: .i read
265: to be called recursively and the list
266: (quote <value read>) to be returned.
267: .Cc csymbol-delimiter | |
268: This causes the reader to begin collecting characters and to stop only
269: when another identical
270: .i csymbol-delimiter
271: is seen.
272: The only way to escape a
273: .i csymbol-delimiter
274: within a symbol name is with a
275: .i cescape
276: character.
277: The collected characters are converted into a string which becomes
278: the print name of a symbol.
279: If a symbol with an identical print name already exists, then the
280: allocation is not done, rather the existing symbol is used.
281: .Cc cescape \e \e
282: This causes the next character to read in to be treated as a
283: .b vcharacter .
284: A character whose syntax class is
285: .b vcharacter
286: has a character class
287: .i ccharacter
288: and does not have
289: the
290: .i separator
291: property so it will not separate symbols.
292: .Cc cstring-delimiter """" """"
293: This is the same as
294: .i csymbol-delimiter
295: except the result is returned as a string instead of a symbol.
296: .Cc csingle-character-symbol none none
297: This returns a symbol whose print name is the the single character
298: which has been collected.
299: .Cc cmacro none `,
300: The reader calls the macro function associated with this character and
301: the current readtable, passing it no arguments.
302: The result of the macro is added to the structure the reader is building,
303: just as if that form were directly read by the reader.
304: More details on macros are provided below.
305: .Cc csplicing-macro none #;
306: A
307: .i csplicing-macro
308: differs from a
309: .i cmacro
310: in the way the result is incorporated in the structure the reader is
311: building.
312: A
313: .i csplicing-macro
314: must return a list of forms (possibly empty).
315: The reader acts as
316: if it read each element of
317: the list itself without
318: the surrounding parenthesis.
319: .Cc csingle-macro none none
320: This causes to reader to check the next character.
321: If it is a
322: .i cseparator
323: then this acts like a
324: .i cmacro.
325: Otherwise, it acts like a
326: .i ccharacter.
327: .Cc csingle-splicing-macro none none
328: This is triggered like a
329: .i csingle-macro
330: however the result is spliced in like a
331: .i csplicing-macro.
332: .Cc cinfix-macro none none
333: This is differs from a
334: .i cmacro
335: in that the macro function is passed a form representing what the reader
336: has read so far.
337: The result of the macro replaces what the reader had read so far.
338: .Cc csingle-infix-macro none none
339: This differs from the
340: .i cinfix-macro
341: in that the macro will only be triggered if the character following the
342: .i csingle-infix-macro
343: character is a
344: .i cseparator .
345: .Cc cillegal ^@-^G^N-^Z^\e-^_rubout ^@-^G^N-^Z^\e-^_rubout
346: The characters cause the reader to signal an error if read.
347: .sh +0 Syntax\ Classes
348: .pp
349: The readtable maps each character into a syntax class.
350: The syntax class contains three pieces of information:
351: the character class, whether this is a separator, and the escape
352: properties.
353: The first two properties are used by the reader, the last by
354: the printer (and
355: .i explode ).
356: The initial lisp system has the following syntax classes defined.
357: The user may add syntax classes with
358: .i add-syntax-class .
359: For each syntax class, we list the properties of the class and
360: which characters have this syntax class by default.
361: More information about each syntax class can be found under the
362: description of the syntax class's character class.
363: .de Sy
364: .sp 1v
365: .(b
366: .tl '\fB\\$1\fP''raw readtable:\\$2'
367: .tl '\fI\\$4\fP''standard readtable:\\$3'
368: .tl '\fI\\$5\fP'''
369: .if \n(.$>5 .tl '\fI\\$6\fP'''
370: .)b
371: ..
372: .pc
373: .Sy vcharacter A-Z\ a-z\ ^H\ !#$%&*,/:;<=>?@^_`{}~ A-Z\ a-z\ ^H\ !$%&*/:;<=>?@^_{}~ ccharacter
374: .pc %
375: .Sy vnumber 0-9 0-9 cnumber
376: .Sy vsign +- +- csign
377: .Sy vleft-paren ( ( cleft-paren escape-always separator
378: .Sy vright-paren ) ) cright-paren escape-always separator
379: .Sy vleft-bracket [ [ cleft-bracket escape-always separator
380: .Sy vright-bracket ] ] cright-bracket escape-always separator
381: .Sy vperiod . . cperiod escape-when-unique
382: .Sy vseparator ^I-^M\ esc\ space ^I-^M\ esc\ space cseparator escape-always separator
383: .Sy vsingle-quote \\' \\' csingle-quote escape-always separator
384: .Sy vsymbol-delimiter | | csingle-delimiter escape-always
385: .Sy vescape \e \e cescape escape-always
386: .Sy vstring-delimiter """" """" cstring-delimiter escape-always
387: .Sy vsingle-character-symbol none none csingle-character-symbol separator
388: .Sy vmacro none `, cmacro escape-always separator
389: .Sy vsplicing-macro none #; csplicing-macro escape-always separator
390: .Sy vsingle-macro none none csingle-macro escape-when-unique
391: .Sy vsingle-splicing-macro none none csingle-splicing-macro escape-when-unique
392: .Sy vinfix-macro none none cinfix-macro escape-always separator
393: .Sy vsingle-infix-macro none none csingle-infix-macro escape-when-unique
394: .Sy villegal ^@-^G^N-^Z^\e-^_rubout ^@-^G^N-^Z^\e-^_rubout cillegal escape-always separator
395: .sh +0 Character\ Macros
396: .pp
397: Character macros are
398: user written functions which are executed during the reading process.
399: The value returned by a character macro may or may not be used by
400: the reader, depending on the type of macro and the value returned.
401: Character macros are always attached to a single character with
402: the
403: .i setsyntax
404: function.
405: .sh +1 Types
406: There are three types of character macros: normal, splicing and infix.
407: These types differ in the arguments they are given or in what is done
408: with the result they return.
409: .sh +1 Normal
410: .pp
411: A normal macro
412: is passed no arguments.
413: The value returned by a normal macro is simply used by
414: the reader as if it had read the value itself.
415: Here is an example of a macro which returns the abbreviation
416: for a given state.
417: .Eb
418: \->\fI(de\kAfun stateabbrev nil
419: \h'|\nAu'(cdr (assq (read) '((california . ca) (pennsylvania . pa)))))\fP
420: stateabbrev
421: \-> \fI(setsyntax '\e! 'vmacro 'stateabbrev)\fP
422: t
423: \-> \fI'( ! california ! wyoming ! pennsylvania)\fP
424: (ca nil pa)
425: .Ee
426: Notice what happened to
427: \fI ! wyoming\fP.
428: Since it wasn't in the table, the associated function
429: returned nil.
430: The creator of the macro may have wanted to leave the
431: list alone, in such a case, but couldn't with this
432: type of reader macro.
433: The splicing macro, described next, allows a character macro function
434: to return a value that is ignored.
435: .sh +0 Splicing
436: .pp
437: The value returned from a splicing macro must be a list or nil.
438: If the value is nil, then the value is ignored, otherwise the reader
439: acts as if it read each object in the list.
440: Usually the list only contains one element.
441: If the reader is reading at the top level (i.e. not collecting elements
442: of list),
443: then it is illegal for a splicing macro to return more then one
444: element in the list.
445: The major advantage of a splicing macro over a normal macro is the
446: ability of the splicing macro to return nothing.
447: The comment character (usually ;) is a splicing macro bound to a
448: function which reads to the end of the line and always returns nil.
449: Here is the previous example written as a splicing macro
450: .Eb
451: \-> \fI(de\kAfun stateabbrev nil
452: \h'|\nAu'(\kC(lam\kBbda (value)
453: \h'|\nBu'(cond \kA(value (list value))
454: \h'|\nAu'(t nil)))
455: \h'|\nCu'(cdr (assq (read) '((california . ca) (pennsylvania . pa))))))\fP
456: \-> \fI(setsyntax '! 'vsplicing-macro 'stateabbrev)\fP
457: \-> \fI'(!pennsylvania ! foo !california)\fP
458: (pa ca)
459: \-> \fI'!foo !bar !pennsylvania\fP
460: pa
461: \->
462: .Ee
463: .sh +0 Infix
464: .pp
465: Infix macros are passed a
466: .i conc
467: structure representing what has been read so far.
468: Briefly, a
469: tconc
470: structure is a single list cell whose car points to
471: a list and whose cdr points to the last list cell in that list.
472: The interpretation by the reader of the value
473: returned by an infix macro depends on
474: whether the macro is called while the reader is constructing a
475: list or whether it is called at the top level of the reader.
476: If the macro is called while a list is
477: being constructed, then the value returned should be a tconc
478: structure.
479: The car of that structure replaces the list of elements that the
480: reader has been collecting.
481: If the macro is called at top level, then it will be passed the
482: value nil, and the value it returns should either be nil
483: or a tconc structure.
484: If the macro returns nil, then the value is ignored and the reader
485: continues to read.
486: If the macro returns a tconc structure of one element (i.e. whose car
487: is a list of one element), then that single element is returned
488: as the value of
489: .i read.
490: If the macro returns a tconc structure of more than one element,
491: then that list of elements is returned as the value of read.
492: .Eb
493: \-> \fI(de\kAfun plusop (x)
494: \h'|\nAu'(cond \kB((null x) (tconc nil '\e+))
495: \h'|\nBu'(t (lconc nil (list 'plus (caar x) (read))))))\fP
496:
497: plusop
498: \-> \fI(setsyntax '\e+ 'vinfix-macro 'plusop)\fP
499: t
500: \-> \fI'(a + b)\fP
501: (plus a b)
502: \-> \fI'+\fP
503: |+|
504: \->
505: .Ee
506: .sh -1 Invocations
507: .pp
508: There are three different circumstances in which you would like
509: a macro function to be triggered.
510: .ip \fIAlways\ -\fP
511: Whenever the macro character is seen, the macro should be invoked.
512: This is accomplished by using the character classes
513: .i cmacro ,
514: .i csplicing-macro ,
515: or
516: .i cinfix-macro ,
517: and by using the
518: .i separator
519: property.
520: The syntax classes
521: .b vmacro ,
522: .b vsplicing-macro ,
523: and
524: .b vsingle-macro
525: are defined this way.
526: .ip \fIWhen\ first\ -\fP
527: The macro should only be triggered when the macro character is the first
528: character found after the scanning process.
529: A syntax class for a
530: .i when
531: .i first
532: macro would
533: be defined
534: using
535: .i cmacro ,
536: .i csplicing-macro ,
537: or
538: .i cinfix-macro
539: and not including the
540: .i separator
541: property.
542: .ip \fIWhen\ unique\ -\fP
543: The macro should only be triggered when the macro character is the only
544: character collected in the token collection
545: phase of the reader,
546: i.e the macro character is preceeded by zero or more
547: .i cseparator s
548: and followed by a
549: .i separator.
550: A syntax class for a
551: .i when
552: .i unique
553: macro would
554: be defined using
555: .i csingle-macro ,
556: .i csingle-splicing-macro ,
557: or
558: .i csingle-infix-macro
559: and not including the
560: .i separator
561: property.
562: The syntax classes so defined are
563: .b vsingle-macro ,
564: .b vsingle-splicing-macro ,
565: and
566: .b vsingle-infix-macro .
567: .sh -1 Functions
568: .Lf setsyntax 's_symbol\ 's_synclass\ ['ls_func]
569: .Wh
570: ls_func is the name of a function or a lambda body.
571: .Re
572: t
573: .Se
574: S_symbol should be a symbol whose print name is only one character.
575: The syntax class for
576: that character is
577: set to s_synclass in the current readtable.
578: If s_synclass is a class that requires a character macro, then
579: ls_func must be supplied.
580: .No
581: The symbolic syntax codes are new to Opus 38.
582: For compatibility, s_synclass can be one of the fixnum syntax codes
583: which appeared in older versions of the
584: .Fr
585: Manual.
586: This compatibility is only temporary: existing code which uses the
587: fixnum syntax codes should be converted.
588: .Lf getsyntax 's_symbol
589: .Re
590: the syntax class of the first character
591: of s_symbol's print name.
592: s_symbol's print name must be exactly one character long.
593: .No
594: This function is new to Opus 38.
595: It supercedes \fI(status\ syntax)\fP which no longer exists.
596: .Lf add-syntax-class 's_synclass\ 'l_properties
597: .Re
598: s_synclass
599: .Se
600: Defines the syntax class s_synclass to have properties l_properties.
601: The list l_properties should contain a character classes mentioned
602: above.
603: l_properties may contain one of the escape properties:
604: .i escape-always ,
605: .i escape-when-unique ,
606: or
607: .i escape-when-first .
608: l_properties may contain the
609: .i separator
610: property.
611: After a syntax class has been defined with
612: .i add-syntax-class ,
613: the
614: .i setsyntax
615: function can be used to give characters that syntax class.
616: .Eb
617: ; Define a non-separating macro character.
618: ; This type of macro character is used in UCI-Lisp, and
619: ; it corresponds to a FIRST MACRO in Interlisp
620: \-> \fI(add-syntax-class 'vuci-macro '(cmacro escape-when-first))\fP
621: vuci-macro
622: \->
623: .Ee
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.