|
|
1.1 root 1: .\" @(#)m4 6.1 (Berkeley) 5/8/86
2: .\"
3: .EH 'PS1:17-%''The M4 Macro Processor'
4: .OH 'The M4 Macro Processor''PS1:17-%'
5: .if n .ls 2
6: .tr _\(em
7: .tr *\(**
8: .de UC
9: \&\\$3\s-1\\$1\\s0\&\\$2
10: ..
11: .de IT
12: .if n .ul
13: \&\\$3\f2\\$1\fP\&\\$2
14: ..
15: .de UL
16: .if n .ul
17: \&\\$3\f3\\$1\fP\&\\$2
18: ..
19: .de P1
20: .DS I 3n
21: .if n .ls 2
22: .nf
23: .if n .ta 5 10 15 20 25 30 35 40 45 50 55 60
24: .if t .ta .4i .8i 1.2i 1.6i 2i 2.4i 2.8i 3.2i 3.6i 4i 4.4i 4.8i 5.2i 5.6i
25: .if t .tr -\(mi|\(bv'\(fm^\(no*\(**
26: .tr `\(ga'\(aa
27: .if t .tr _\(ul
28: .ft 3
29: .lg 0
30: ..
31: .de P2
32: .ps \\n(PS
33: .vs \\n(VSp
34: .ft R
35: .if n .ls 2
36: .tr --||''^^!!
37: .if t .tr _\(em
38: .fi
39: .lg
40: .DE
41: .if t .tr _\(em
42: ..
43: .hw semi-colon
44: .hw estab-lished
45: .hy 14
46: . \"2=not last lines; 4= no -xx; 8=no xx-
47: . \"special chars in programs
48: . \" start of text
49: .\".RP
50: .....TR 59
51: .....TM 77-1273-6 39199 39199-11
52: .ND "July 1, 1977"
53: .TL
54: The M4 Macro Processor
55: .AU "MH 2C-518" 6021
56: Brian W. Kernighan
57: .AU "MH 2C-517" 3770
58: Dennis M. Ritchie
59: .AI
60: .MH
61: .AB
62: .PP
63: M4 is a macro processor available on
64: .UX
65: and
66: .UC GCOS .
67: Its primary use has been as a
68: front end for Ratfor for those
69: cases where parameterless macros
70: are not adequately powerful.
71: It has also been used for languages as disparate as C and Cobol.
72: M4 is particularly suited for functional languages like Fortran, PL/I and C
73: since macros are specified in a functional notation.
74: .PP
75: M4 provides features seldom found even in much larger
76: macro processors,
77: including
78: .IP " \(bu"
79: arguments
80: .IP " \(bu"
81: condition testing
82: .IP " \(bu"
83: arithmetic capabilities
84: .IP " \(bu"
85: string and substring functions
86: .IP " \(bu"
87: file manipulation
88: .LP
89: .PP
90: This paper is a user's manual for M4.
91: .AE
92: .CS 6 0 6 0 0 1
93: .if t .2C
94: .SH
95: Introduction
96: .PP
97: A macro processor is a useful way to enhance a programming language,
98: to make it more palatable
99: or more readable,
100: or to tailor it to a particular application.
101: The
102: .UL #define
103: statement in C
104: and the analogous
105: .UL define
106: in Ratfor
107: are examples of the basic facility provided by
108: any macro processor _
109: replacement of text by other text.
110: .PP
111: The M4 macro processor is an extension of a macro processor called M3
112: which was written by D. M. Ritchie
113: for the AP-3 minicomputer;
114: M3 was in turn based on a macro processor implemented for [1].
115: Readers unfamiliar with the basic ideas of macro processing
116: may wish to read some of the discussion there.
117: .PP
118: M4 is a suitable front end for Ratfor and C,
119: and has also been used successfully with Cobol.
120: Besides the straightforward replacement of one string of text by another,
121: it provides
122: macros with arguments,
123: conditional macro expansion,
124: arithmetic,
125: file manipulation,
126: and some specialized string processing functions.
127: .PP
128: The basic operation of M4
129: is to copy its input to its output.
130: As the input is read, however, each alphanumeric ``token''
131: (that is, string of letters and digits) is checked.
132: If it is the name of a macro,
133: then the name of the macro is replaced by its defining text,
134: and the resulting string is pushed back onto the
135: input to be rescanned.
136: Macros may be called with arguments, in which case the arguments are collected
137: and substituted into the right places in the defining text
138: before it is rescanned.
139: .PP
140: M4 provides a collection of about twenty built-in
141: macros
142: which perform various useful operations;
143: in addition, the user can define new macros.
144: Built-ins and user-defined macros work exactly the same way, except that
145: some of the built-in macros have side effects
146: on the state of the process.
147: .SH
148: Usage
149: .PP
150: On
151: .UC UNIX ,
152: use
153: .P1
154: m4 [files]
155: .P2
156: Each argument file is processed in order;
157: if there are no arguments, or if an argument
158: is `\-',
159: the standard input is read at that point.
160: The processed text is written on the standard output,
161: which may be captured for subsequent processing with
162: .P1
163: m4 [files] >outputfile
164: .P2
165: On
166: .UC GCOS ,
167: usage is identical, but the program is called
168: .UL \&./m4 .
169: .SH
170: Defining Macros
171: .PP
172: The primary built-in function of M4
173: is
174: .UL define ,
175: which is used to define new macros.
176: The input
177: .P1
178: define(name, stuff)
179: .P2
180: causes the string
181: .UL name
182: to be defined as
183: .UL stuff .
184: All subsequent occurrences of
185: .UL name
186: will be replaced by
187: .UL stuff .
188: .UL name
189: must be alphanumeric and must begin with a letter
190: (the underscore \(ul counts as a letter).
191: .UL stuff
192: is any text that contains balanced parentheses;
193: it may stretch over multiple lines.
194: .PP
195: Thus, as a typical example,
196: .P1
197: define(N, 100)
198: ...
199: if (i > N)
200: .P2
201: defines
202: .UL N
203: to be 100, and uses this ``symbolic constant'' in a later
204: .UL if
205: statement.
206: .PP
207: The left parenthesis must immediately follow the word
208: .UL define ,
209: to signal that
210: .UL define
211: has arguments.
212: If a macro or built-in name is not followed immediately by `(',
213: it is assumed to have no arguments.
214: This is the situation for
215: .UL N
216: above;
217: it is actually a macro with no arguments,
218: and thus when it is used there need be no (...) following it.
219: .PP
220: You should also notice that a macro name is only recognized as such
221: if it appears surrounded by non-alphanumerics.
222: For example, in
223: .P1
224: define(N, 100)
225: ...
226: if (NNN > 100)
227: .P2
228: the variable
229: .UL NNN
230: is absolutely unrelated to the defined macro
231: .UL N ,
232: even though it contains a lot of
233: .UL N 's.
234: .PP
235: Things may be defined in terms of other things.
236: For example,
237: .P1
238: define(N, 100)
239: define(M, N)
240: .P2
241: defines both M and N to be 100.
242: .PP
243: What happens if
244: .UL N
245: is redefined?
246: Or, to say it another way, is
247: .UL M
248: defined as
249: .UL N
250: or as 100?
251: In M4,
252: the latter is true _
253: .UL M
254: is 100, so even if
255: .UL N
256: subsequently changes,
257: .UL M
258: does not.
259: .PP
260: This behavior arises because
261: M4 expands macro names into their defining text as soon as it possibly can.
262: Here, that means that when the string
263: .UL N
264: is seen as the arguments of
265: .UL define
266: are being collected, it is immediately replaced by 100;
267: it's just as if you had said
268: .P1
269: define(M, 100)
270: .P2
271: in the first place.
272: .PP
273: If this isn't what you really want, there are two ways out of it.
274: The first, which is specific to this situation,
275: is to interchange the order of the definitions:
276: .P1
277: define(M, N)
278: define(N, 100)
279: .P2
280: Now
281: .UL M
282: is defined to be the string
283: .UL N ,
284: so when you ask for
285: .UL M
286: later, you'll always get the value of
287: .UL N
288: at that time
289: (because the
290: .UL M
291: will be replaced by
292: .UL N
293: which will be replaced by 100).
294: .SH
295: Quoting
296: .PP
297: The more general solution is to delay the expansion of
298: the arguments of
299: .UL define
300: by
301: .ul
302: quoting
303: them.
304: Any text surrounded by the single quotes \(ga and \(aa
305: is not expanded immediately, but has the quotes stripped off.
306: If you say
307: .P1
308: define(N, 100)
309: define(M, `N')
310: .P2
311: the quotes around the
312: .UL N
313: are stripped off as the argument is being collected,
314: but they have served their purpose, and
315: .UL M
316: is defined as
317: the string
318: .UL N ,
319: not 100.
320: The general rule is that M4 always strips off
321: one level of single quotes whenever it evaluates
322: something.
323: This is true even outside of
324: macros.
325: If you want the word
326: .UL define
327: to appear in the output,
328: you have to quote it in the input,
329: as in
330: .P1
331: `define' = 1;
332: .P2
333: .PP
334: As another instance of the same thing, which is a bit more surprising,
335: consider redefining
336: .UL N :
337: .P1
338: define(N, 100)
339: ...
340: define(N, 200)
341: .P2
342: Perhaps regrettably, the
343: .UL N
344: in the second definition is
345: evaluated as soon as it's seen;
346: that is, it is
347: replaced by
348: 100, so it's as if you had written
349: .P1
350: define(100, 200)
351: .P2
352: This statement is ignored by M4, since you can only define things that look
353: like names, but it obviously doesn't have the effect you wanted.
354: To really redefine
355: .UL N ,
356: you must delay the evaluation by quoting:
357: .P1
358: define(N, 100)
359: ...
360: define(`N', 200)
361: .P2
362: In M4,
363: it is often wise to quote the first argument of a macro.
364: .PP
365: If \` and \' are not convenient for some reason,
366: the quote characters can be changed with the built-in
367: .UL changequote :
368: .P1
369: changequote([, ])
370: .P2
371: makes the new quote characters the left and right brackets.
372: You can restore the original characters with just
373: .P1
374: changequote
375: .P2
376: .PP
377: There are two additional built-ins related to
378: .UL define .
379: .UL undefine
380: removes the definition of some macro or built-in:
381: .P1
382: undefine(`N')
383: .P2
384: removes the definition of
385: .UL N .
386: (Why are the quotes absolutely necessary?)
387: Built-ins can be removed with
388: .UL undefine ,
389: as in
390: .P1
391: undefine(`define')
392: .P2
393: but once you remove one, you can never get it back.
394: .PP
395: The built-in
396: .UL ifdef
397: provides a way to determine if a macro is currently defined.
398: In particular, M4 has pre-defined the names
399: .UL unix
400: and
401: .UL gcos
402: on the corresponding systems, so you can
403: tell which one you're using:
404: .P1
405: ifdef(`unix', `define(wordsize,16)' )
406: ifdef(`gcos', `define(wordsize,36)' )
407: .P2
408: makes a definition appropriate for the particular machine.
409: Don't forget the quotes!
410: .PP
411: .UL ifdef
412: actually permits three arguments;
413: if the name is undefined, the value of
414: .UL ifdef
415: is then the third argument, as in
416: .P1
417: ifdef(`unix', on UNIX, not on UNIX)
418: .P2
419: .SH
420: Arguments
421: .PP
422: So far we have discussed the simplest form of macro processing _
423: replacing one string by another (fixed) string.
424: User-defined macros may also have arguments, so different invocations
425: can have different results.
426: Within the replacement text for a macro
427: (the second argument of its
428: .UL define )
429: any occurrence of
430: .UL $n
431: will be replaced by the
432: .UL n th
433: argument when the macro
434: is actually used.
435: Thus, the macro
436: .UL bump ,
437: defined as
438: .P1
439: define(bump, $1 = $1 + 1)
440: .P2
441: generates code to increment its argument by 1:
442: .P1
443: bump(x)
444: .P2
445: is
446: .P1
447: x = x + 1
448: .P2
449: .PP
450: A macro can have as many arguments as you want,
451: but only the first nine are accessible,
452: through
453: .UL $1
454: to
455: .UL $9 .
456: (The macro name itself is
457: .UL $0 ,
458: although that is less commonly used.)
459: Arguments that are not supplied are replaced by null strings,
460: so
461: we can define a macro
462: .UL cat
463: which simply concatenates its arguments, like this:
464: .P1
465: define(cat, $1$2$3$4$5$6$7$8$9)
466: .P2
467: Thus
468: .P1
469: cat(x, y, z)
470: .P2
471: is equivalent to
472: .P1
473: xyz
474: .P2
475: .UL $4
476: through
477: .UL $9
478: are null, since no corresponding arguments were provided.
479: .PP
480: .PP
481: Leading unquoted blanks, tabs, or newlines that occur during argument collection
482: are discarded.
483: All other white space is retained.
484: Thus
485: .P1
486: define(a, b c)
487: .P2
488: defines
489: .UL a
490: to be
491: .UL b\ \ \ c .
492: .PP
493: Arguments are separated by commas, but parentheses are counted properly,
494: so a comma ``protected'' by parentheses does not terminate an argument.
495: That is, in
496: .P1
497: define(a, (b,c))
498: .P2
499: there are only two arguments;
500: the second is literally
501: .UL (b,c) .
502: And of course a bare comma or parenthesis can be inserted by quoting it.
503: .SH
504: Arithmetic Built-ins
505: .PP
506: M4 provides two built-in functions for doing arithmetic
507: on integers (only).
508: The simplest is
509: .UL incr ,
510: which increments its numeric argument by 1.
511: Thus to handle the common programming situation
512: where you want a variable to be defined as ``one more than N'',
513: write
514: .P1
515: define(N, 100)
516: define(N1, `incr(N)')
517: .P2
518: Then
519: .UL N1
520: is defined as one more than the current value of
521: .UL N .
522: .PP
523: The more general mechanism for arithmetic is a built-in
524: called
525: .UL eval ,
526: which is capable of arbitrary arithmetic on integers.
527: It provides the operators
528: (in decreasing order of precedence)
529: .DS
530: unary + and \(mi
531: ** or ^ (exponentiation)
532: * / % (modulus)
533: + \(mi
534: == != < <= > >=
535: ! (not)
536: & or && (logical and)
537: \(or or \(or\(or (logical or)
538: .DE
539: Parentheses may be used to group operations where needed.
540: All the operands of
541: an expression given to
542: .UL eval
543: must ultimately be numeric.
544: The numeric value of a true relation
545: (like 1>0)
546: is 1, and false is 0.
547: The precision in
548: .UL eval
549: is
550: 32 bits on
551: .UC UNIX
552: and 36 bits on
553: .UC GCOS .
554: .PP
555: As a simple example, suppose we want
556: .UL M
557: to be
558: .UL 2**N+1 .
559: Then
560: .P1
561: define(N, 3)
562: define(M, `eval(2**N+1)')
563: .P2
564: As a matter of principle, it is advisable
565: to quote the defining text for a macro
566: unless it is very simple indeed
567: (say just a number);
568: it usually gives the result you want,
569: and is a good habit to get into.
570: .SH
571: File Manipulation
572: .PP
573: You can include a new file in the input at any time by
574: the built-in function
575: .UL include :
576: .P1
577: include(filename)
578: .P2
579: inserts the contents of
580: .UL filename
581: in place of the
582: .UL include
583: command.
584: The contents of the file is often a set of definitions.
585: The value
586: of
587: .UL include
588: (that is, its replacement text)
589: is the contents of the file;
590: this can be captured in definitions, etc.
591: .PP
592: It is a fatal error if the file named in
593: .UL include
594: cannot be accessed.
595: To get some control over this situation, the alternate form
596: .UL sinclude
597: can be used;
598: .UL sinclude
599: (``silent include'')
600: says nothing and continues if it can't access the file.
601: .PP
602: It is also possible to divert the output of M4 to temporary files during processing,
603: and output the collected material upon command.
604: M4 maintains nine of these diversions, numbered 1 through 9.
605: If you say
606: .P1
607: divert(n)
608: .P2
609: all subsequent output is put onto the end of a temporary file
610: referred to as
611: .UL n .
612: Diverting to this file is stopped by another
613: .UL divert
614: command;
615: in particular,
616: .UL divert
617: or
618: .UL divert(0)
619: resumes the normal output process.
620: .PP
621: Diverted text is normally output all at once
622: at the end of processing,
623: with the diversions output in numeric order.
624: It is possible, however, to bring back diversions
625: at any time,
626: that is, to append them to the current diversion.
627: .P1
628: undivert
629: .P2
630: brings back all diversions in numeric order, and
631: .UL undivert
632: with arguments brings back the selected diversions
633: in the order given.
634: The act of undiverting discards the diverted stuff,
635: as does diverting into a diversion
636: whose number is not between 0 and 9 inclusive.
637: .PP
638: The value of
639: .UL undivert
640: is
641: .ul
642: not
643: the diverted stuff.
644: Furthermore, the diverted material is
645: .ul
646: not
647: rescanned for macros.
648: .PP
649: The built-in
650: .UL divnum
651: returns the number of the currently active diversion.
652: This is zero during normal processing.
653: .SH
654: System Command
655: .PP
656: You can run any program in the local operating system
657: with the
658: .UL syscmd
659: built-in.
660: For example,
661: .P1
662: syscmd(date)
663: .P2
664: on
665: .UC UNIX
666: runs the
667: .UL date
668: command.
669: Normally
670: .UL syscmd
671: would be used to create a file
672: for a subsequent
673: .UL include .
674: .PP
675: To facilitate making unique file names, the built-in
676: .UL maketemp
677: is provided, with specifications identical to the system function
678: .ul
679: mktemp:
680: a string of XXXXX in the argument is replaced
681: by the process id of the current process.
682: .SH
683: Conditionals
684: .PP
685: There is a built-in called
686: .UL ifelse
687: which enables you to perform arbitrary conditional testing.
688: In the simplest form,
689: .P1
690: ifelse(a, b, c, d)
691: .P2
692: compares the two strings
693: .UL a
694: and
695: .UL b .
696: If these are identical,
697: .UL ifelse
698: returns
699: the string
700: .UL c ;
701: otherwise it returns
702: .UL d .
703: Thus we might define a macro called
704: .UL compare
705: which compares two strings and returns ``yes'' or ``no''
706: if they are the same or different.
707: .P1
708: define(compare, `ifelse($1, $2, yes, no)')
709: .P2
710: Note the quotes,
711: which prevent too-early evaluation of
712: .UL ifelse .
713: .PP
714: If the fourth argument is missing, it is treated as empty.
715: .PP
716: .UL ifelse
717: can actually have any number of arguments,
718: and thus provides a limited form of multi-way decision capability.
719: In the input
720: .P1
721: ifelse(a, b, c, d, e, f, g)
722: .P2
723: if the string
724: .UL a
725: matches the string
726: .UL b ,
727: the result is
728: .UL c .
729: Otherwise, if
730: .UL d
731: is the same as
732: .UL e ,
733: the result is
734: .UL f .
735: Otherwise the result is
736: .UL g .
737: If the final argument
738: is omitted, the result is null,
739: so
740: .P1
741: ifelse(a, b, c)
742: .P2
743: is
744: .UL c
745: if
746: .UL a
747: matches
748: .UL b ,
749: and null otherwise.
750: .SH
751: String Manipulation
752: .PP
753: The built-in
754: .UL len
755: returns the length of the string that makes up its argument.
756: Thus
757: .P1
758: len(abcdef)
759: .P2
760: is 6, and
761: .UL len((a,b))
762: is 5.
763: .PP
764: The built-in
765: .UL substr
766: can be used to produce substrings of strings.
767: .UL substr(s,\ i,\ n)
768: returns the substring of
769: .UL s
770: that starts at the
771: .UL i th
772: position
773: (origin zero),
774: and is
775: .UL n
776: characters long.
777: If
778: .UL n
779: is omitted, the rest of the string is returned,
780: so
781: .P1
782: substr(`now is the time', 1)
783: .P2
784: is
785: .P1
786: ow is the time
787: .P2
788: If
789: .UL i
790: or
791: .UL n
792: are out of range, various sensible things happen.
793: .PP
794: .UL index(s1,\ s2)
795: returns the index (position) in
796: .UL s1
797: where the string
798: .UL s2
799: occurs, or \-1
800: if it doesn't occur.
801: As with
802: .UL substr ,
803: the origin for strings is 0.
804: .PP
805: The built-in
806: .UL translit
807: performs character transliteration.
808: .P1
809: translit(s, f, t)
810: .P2
811: modifies
812: .UL s
813: by replacing any character found in
814: .UL f
815: by the corresponding character of
816: .UL t .
817: That is,
818: .P1
819: translit(s, aeiou, 12345)
820: .P2
821: replaces the vowels by the corresponding digits.
822: If
823: .UL t
824: is shorter than
825: .UL f ,
826: characters which don't have an entry in
827: .UL t
828: are deleted; as a limiting case,
829: if
830: .UL t
831: is not present at all,
832: characters from
833: .UL f
834: are deleted from
835: .UL s .
836: So
837: .P1
838: translit(s, aeiou)
839: .P2
840: deletes vowels from
841: .UL s .
842: .PP
843: There is also a built-in called
844: .UL dnl
845: which deletes all characters that follow it up to
846: and including the next newline;
847: it is useful mainly for throwing away
848: empty lines that otherwise tend to clutter up M4 output.
849: For example, if you say
850: .P1
851: define(N, 100)
852: define(M, 200)
853: define(L, 300)
854: .P2
855: the newline at the end of each line is not part of the definition,
856: so it is copied into the output, where it may not be wanted.
857: If you add
858: .UL dnl
859: to each of these lines, the newlines will disappear.
860: .PP
861: Another way to achieve this, due to J. E. Weythman,
862: is
863: .P1
864: divert(-1)
865: define(...)
866: ...
867: divert
868: .P2
869: .SH
870: Printing
871: .PP
872: The built-in
873: .UL errprint
874: writes its arguments out on the standard error file.
875: Thus you can say
876: .P1
877: errprint(`fatal error')
878: .P2
879: .PP
880: .UL dumpdef
881: is a debugging aid which
882: dumps the current definitions of defined terms.
883: If there are no arguments, you get everything;
884: otherwise you get the ones you name as arguments.
885: Don't forget to quote the names!
886: .SH
887: Summary of Built-ins
888: .PP
889: Each entry is preceded by the
890: page number where it is described.
891: .DS
892: .tr '\'`\`
893: .ta .25i
894: 3 changequote(L, R)
895: 1 define(name, replacement)
896: 4 divert(number)
897: 4 divnum
898: 5 dnl
899: 5 dumpdef(`name', `name', ...)
900: 5 errprint(s, s, ...)
901: 4 eval(numeric expression)
902: 3 ifdef(`name', this if true, this if false)
903: 5 ifelse(a, b, c, d)
904: 4 include(file)
905: 3 incr(number)
906: 5 index(s1, s2)
907: 5 len(string)
908: 4 maketemp(...XXXXX...)
909: 4 sinclude(file)
910: 5 substr(string, position, number)
911: 4 syscmd(s)
912: 5 translit(str, from, to)
913: 3 undefine(`name')
914: 4 undivert(number,number,...)
915: .DE
916: .SH
917: Acknowledgements
918: .PP
919: We are indebted to Rick Becker, John Chambers,
920: Doug McIlroy,
921: and especially Jim Weythman,
922: whose pioneering use of M4 has led to several valuable improvements.
923: We are also deeply grateful to Weythman for several substantial contributions
924: to the code.
925: .SG
926: .SH
927: References
928: .LP
929: .IP [1]
930: B. W. Kernighan and P. J. Plauger,
931: .ul
932: Software Tools,
933: Addison-Wesley, Inc., 1976.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.