Annotation of researchv10dc/vol2/grap/paper.ms, revision 1.1.1.1

1.1       root        1: .so ../ADM/mac
                      2: .XX grap 109 "Grap \(em A Language for Typesetting Graphs"
                      3: .EQ
                      4: delim $$
                      5: .EN
                      6: .so macros
                      7: .ds g \f2grap\fP
                      8: .ds G \f2Grap\fP
                      9: .TL
                     10: Grap \(em A Language for Typesetting Graphs
                     11: .br
                     12: Tutorial and User Manual
                     13: .AU
                     14: Jon L. Bentley
                     15: Brian W. Kernighan
                     16: .AI
                     17: .MH
                     18: .AB
                     19: \*G
                     20: is a language for describing plots of data.
                     21: This graph of the 1984
                     22: age distribution in the United States
                     23: .grap agepop1.g
                     24: is produced by the
                     25: \*g
                     26: commands
                     27: .P1
                     28: .get agepop1.g
                     29: .P2
                     30: (Each line in the data file
                     31: .UL agepop.d
                     32: contains an age and the number of Americans of that
                     33: age alive in 1984; the file is sorted by age.)
                     34: .PP
                     35: The
                     36: \*g
                     37: preprocessor works with
                     38: .I pic |reference(latest pic)
                     39: and
                     40: .I troff |reference(latest troff reference).
                     41: Most of its input is passed
                     42: through untouched, but statements between
                     43: .UL .G1
                     44: and
                     45: .UL .G2
                     46: are translated into
                     47: .I pic
                     48: commands that draw graphs.
                     49: .AE
                     50: .NH
                     51: Introduction
                     52: .PP
                     53: \*G
                     54: is a language for describing graphical
                     55: displays of data.
                     56: It provides such services as automatic scaling and
                     57: labeling of axes, and
                     58: .UL for
                     59: statements,
                     60: .UL if
                     61: statements, and macros to facilitate user
                     62: programmability.
                     63: \*G
                     64: is intended primarily for including graphs in
                     65: documents prepared on the
                     66: .UX
                     67: operating system, and is only marginally
                     68: useful for elementary tasks in data analysis.
                     69: .PP
                     70: Section 2 of this document is a tutorial introduction to
                     71: \*g;
                     72: readers who find it slow going may wish to skim ahead.
                     73: The examples in Section 3 illustrate
                     74: the various kinds of graphs that
                     75: \*g
                     76: can produce and some common
                     77: \*g
                     78: idioms.
                     79: Mundane matters about using
                     80: \*g
                     81: are discussed in Section 4,
                     82: and Section 5 contains a brief reference manual.
                     83: .PP
                     84: We have tried to illustrate good principles of
                     85: statistics and graphical design in the
                     86: graphs we present.
                     87: In several places, though, good taste has lost to
                     88: the necessity of illustrating
                     89: \*g
                     90: capabilities.
                     91: Readers interested in statistical
                     92: integrity and taste should
                     93: consult the literature, for example |reference(chambers graphs)
                     94: |reference(tufte graphs) |reference(cleveland elements).
                     95: .NH
                     96: Tutorial
                     97: .PP
                     98: The following is a simple
                     99: \*g
                    100: program\(dg
                    101: .FS
                    102: \(dg Throughout
                    103: this document we will show only the first five
                    104: lines and the last line of data files;
                    105: omitted lines are indicated by ``...''.
                    106: .FE
                    107: .P1
                    108: \&.G1
                    109: .d 400mtimes.d
                    110: \&.G2
                    111: .P2
                    112: The single number on each line
                    113: is the winning time in seconds for the
                    114: men's 400 meter run,
                    115: from the first modern Olympic Games (1896)
                    116: to the twenty-first (1988).
                    117: If the file
                    118: .UL olymp.g
                    119: contains the text above,
                    120: then typing the command
                    121: .P1
                    122: grap olymp.g | pic | troff > junk
                    123: .P2
                    124: creates a
                    125: .I troff
                    126: output file
                    127: .UL junk
                    128: that contains the
                    129: picture
                    130: .grap 4001.g
                    131: The graph shows the decrease
                    132: in winning times from 54.2
                    133: seconds to 43.87 seconds.
                    134: If the times are
                    135: contained in the file
                    136: .UL 400mtimes.d ,
                    137: we could
                    138: produce the same graph with the
                    139: shorter program
                    140: .P1
                    141: .get 4001.g
                    142: .P2
                    143: Writing
                    144: .UL copy
                    145: .UL \&"fname"
                    146: in a
                    147: \*g
                    148: program is equivalent to including the
                    149: contents of file
                    150: .UL fname
                    151: at that point in the file.
                    152: (In the interests of compatibility with other programs,
                    153: .UL include
                    154: is a synonym for
                    155: .UL copy .)
                    156: .PP
                    157: Each line in the file
                    158: .UL 400mpairs.d
                    159: contains two numbers, the
                    160: year of the Olympics and the winning time:
                    161: .P1
                    162: .d 400mpairs.d
                    163: .P2
                    164: If we plot this data with the program
                    165: .P1
                    166: .get 4002.g
                    167: .P2
                    168: the bottom ($x$) axis represents the year of the Olympics.
                    169: .grap 4002.g
                    170: The ``holes'' in $x$-values reflect the fact
                    171: that the 1916, 1940, and 1944 Olympics
                    172: were cancelled due to war.
                    173: Because the previous data
                    174: (in
                    175: .UL 400mtimes.d )
                    176: had just one number per
                    177: line,
                    178: \*g
                    179: viewed it as a ``time series'' and
                    180: supplied $x$-values of $1, ~ 2, ~ 3, ...$
                    181: before plotting
                    182: the data as $y$-values.
                    183: The input to the
                    184: second program has two values per line,
                    185: so they are interpreted as $( x , y )$ pairs.
                    186: .PP
                    187: Rather than a scatter plot of points, we might prefer to
                    188: see the winning times connected by a solid
                    189: line.
                    190: The program
                    191: .P1
                    192: .get 4003.g
                    193: .P2
                    194: produces the graph
                    195: .grap 4003.g
                    196: Eric Liddell of Great Britain
                    197: won his gold medal
                    198: in Paris in 1924 with a time of 47.6 seconds.
                    199: (Remember ``Chariots
                    200: of Fire''?)
                    201: .PP
                    202: We can make the graph more attractive
                    203: by modifying its frame
                    204: and adding labels.
                    205: .P1
                    206: .get 4004.g
                    207: .P2
                    208: The
                    209: .UL frame
                    210: command describes
                    211: the graph's bounding box:
                    212: the overall frame (which has four sides)
                    213: is invisible, it is 2 inches high and 3 inches
                    214: wide (which happen to be the
                    215: default height and width),
                    216: and the left and bottom
                    217: sides are solid (they could have been
                    218: dashed or dotted instead).
                    219: The labels appear on the left and bottom, as requested.
                    220: .grap 4004.g
                    221: .PP
                    222: To set the range of each axis,
                    223: \*g
                    224: examines the data and pads both
                    225: dimensions
                    226: by seven percent at each end.
                    227: The
                    228: .UL coord
                    229: (``coordinates'') command
                    230: allows you to specify the range of one or both axes explicitly;
                    231: it also turns off automatic padding.
                    232: .P1
                    233: .get 4005.g
                    234: .P2
                    235: The $y$-axis now ranges from 42 to 56 seconds
                    236: (a little more than before),
                    237: and the $x$-axis from 1894 to 1990
                    238: (a little less).
                    239: .grap 4005.g
                    240: .PP
                    241: The ticks in the preceding graphs were generated
                    242: by
                    243: \*g
                    244: guessing at reasonable values.
                    245: If you would rather provide your own,
                    246: you may
                    247: use the
                    248: .UL ticks
                    249: command,
                    250: which comes in the flavors illustrated below.
                    251: .P1
                    252: .get 4006.g
                    253: .P2
                    254: The first
                    255: .UL ticks
                    256: command deals with the left axis:
                    257: it puts the ticks facing out at
                    258: the numbers in the list.
                    259: \*G
                    260: puts labels only at values
                    261: with strings,
                    262: except that when no labels at all are
                    263: given, each number serves as its own label,
                    264: as in the second
                    265: .UL ticks
                    266: command.
                    267: That command
                    268: is for the bottom axis:
                    269: it puts the ticks facing in at steps of 20
                    270: from 1900 to 1980.
                    271: The command
                    272: .UL "ticks off"
                    273: turns off all ticks.
                    274: \*G
                    275: does its best to place labels appropriately, but
                    276: it sometimes needs your help:
                    277: the
                    278: .UL "left .2"
                    279: clause moves the left label 0.2 inches further left to
                    280: avoid the new ticks.
                    281: .grap 4006.g
                    282: .PP
                    283: The file
                    284: .UL 400wpairs.d
                    285: contains the times for
                    286: the women's 400 meter race, which has been run
                    287: only since 1964.
                    288: .P1
                    289: .d 400wpairs.d
                    290: .P2
                    291: To add these times to the graph,
                    292: we use
                    293: .P1
                    294: .get 4007.g
                    295: .P2
                    296: The
                    297: .UL new
                    298: command tells
                    299: \*g
                    300: to end
                    301: the old curve and to start a new curve
                    302: (which in this case will be drawn
                    303: with a dotted line).
                    304: Text is placed on the graph by
                    305: commands of the form
                    306: .P1
                    307: "string" at xvalue, yvalue
                    308: .P2
                    309: The
                    310: .UL size
                    311: clauses following the quoted strings tell
                    312: \*g
                    313: to shrink the characters by three points (absolute point sizes
                    314: may also be specified).
                    315: Strings are usually centered at the specified position,
                    316: but can be adjusted by clauses to be illustrated shortly.
                    317: .grap 4007.g
                    318: .PP
                    319: The file
                    320: .UL phone.d
                    321: records the number of telephones in the United States from
                    322: 1900 to 1970.
                    323: .P1
                    324: .d phone.d
                    325: .P2
                    326: Each line gives a year and the number of telephones
                    327: present in that year
                    328: (in millions, truncated to the nearest hundred thousand).
                    329: The simple
                    330: \*g
                    331: program
                    332: .P1
                    333: .get phone1.g
                    334: .P2
                    335: produces the simple graph
                    336: .grap phone1.g
                    337: .PP
                    338: The number of telephones appears to
                    339: grow exponentially;
                    340: to study that we will plot the data with
                    341: a logarithmic $y$-axis by adding
                    342: .UL log
                    343: .UL y
                    344: to the
                    345: .UL coord
                    346: command.
                    347: We will also add cosmetic changes of labels, more ticks,
                    348: and a solid line to replace the unconnected dots.
                    349: .P1
                    350: .get phone2.g
                    351: .P2
                    352: The third
                    353: .UL ticks
                    354: command provides a string that is used to print the tick
                    355: labels.
                    356: .UC C
                    357: programmers will recognize it as a
                    358: .UL printf
                    359: format string; others may view the
                    360: .CW %g
                    361: as the place to put
                    362: the number and anything else (in this case just an apostrophe) as
                    363: literal text to appear in the labels.
                    364: To suppress
                    365: labels, use the empty format string ("").
                    366: The program produces
                    367: .grap phone2.g
                    368: The number of telephones grew rapidly
                    369: in the first decade of this century,
                    370: and then settled down to an exponential growth rate upset only
                    371: by a decrease in the Great Depression and a post-war growth
                    372: spurt
                    373: to return the curve to its pre-Depression line.
                    374: .PP
                    375: Our presentation so far has been to
                    376: start with a simple
                    377: \*g
                    378: program that illustrates the data, and then refine it.
                    379: Later in this document we will ignore the design
                    380: phase, and present rather complex graphs in
                    381: their final form.
                    382: Beware.
                    383: .PP
                    384: All the examples so far have placed data on the
                    385: graph implicitly by
                    386: .UL copy ing
                    387: a file of numbers
                    388: (either a time series with one number per line or
                    389: pairs of numbers).
                    390: It is also possible to draw points and lines explicitly.
                    391: The
                    392: \*g 
                    393: commands to draw on a graph
                    394: are illustrated in the following
                    395: fragment.
                    396: .P1
                    397: .get geom.g
                    398: .P2
                    399: .PP
                    400: The
                    401: .UL grid
                    402: command is similar to the
                    403: .UL ticks
                    404: command, except that grid lines extend
                    405: across the frame.
                    406: The next few commands plot text at specified positions.
                    407: The plotting characters (such as
                    408: .UL bullet )
                    409: are implemented as predefined
                    410: macros \(em more on that shortly.
                    411: Unlike arbitrary characters,
                    412: the visual centers of the markers
                    413: are near their plotting centers.
                    414: The
                    415: .UL circle
                    416: command draws a circle centered at the specified location.
                    417: A radius in inches may be specified;
                    418: if no radius is given, then the circle will be the
                    419: small circle shown at the center of the graph.
                    420: The
                    421: .UL line
                    422: and
                    423: .UL arrow
                    424: commands draw the obvious objects shown at the upper left.
                    425: .grap geom.g
                    426: .PP
                    427: This figure also illustrates the combined use of the
                    428: .UL draw
                    429: and
                    430: .UL next
                    431: commands.
                    432: Saying
                    433: .UL draw
                    434: .UL A
                    435: .UL solid
                    436: defines the style
                    437: for a connected sequence of line fragments to be called
                    438: .UL A .
                    439: Subsequent commands of
                    440: .UL next
                    441: .UL A
                    442: .UL at
                    443: .I point
                    444: add
                    445: .I point
                    446: to the end of
                    447: .UL A .
                    448: There are two such sequences active in the above
                    449: example
                    450: .UL A "" (
                    451: and
                    452: .UL B );
                    453: note that their
                    454: .UL next
                    455: commands are intermixed.
                    456: Because the predefined string
                    457: .UL delta
                    458: follows the specification of
                    459: .UL B ,
                    460: that string is plotted at each point in the sequence.
                    461: .PP
                    462: \*G
                    463: has numeric variables (implemented as double-precision
                    464: floating point numbers) and
                    465: the usual collection of arithmetic operators and
                    466: mathematical functions; see the reference section
                    467: for details.
                    468: .PP
                    469: \*G
                    470: provides the same rudimentary macro facility that
                    471: .I pic
                    472: does:
                    473: .P1
                    474: define \f2name\fP  { \f2replacement text\fP }
                    475: .P2
                    476: defines
                    477: .IT name
                    478: to be the
                    479: .IT "replacement text" .
                    480: The replacement may be any text that contains balanced open and closing braces
                    481: .UL "{ }" .
                    482: (Alternatively, the
                    483: .IT "replacement text
                    484: may be quoted by
                    485: any single character that does not appear in the replacement;
                    486: the string is terminated by the next occurrence of that character.)
                    487: Any subsequent occurrence of
                    488: .IT name
                    489: will be replaced by
                    490: .IT "replacement text" .
                    491: .EQ
                    492: delim %%
                    493: .EN
                    494: .PP
                    495: The replacement text of a macro definition may
                    496: contain occurrences of
                    497: .UL $1 ,
                    498: .UL $2 ,
                    499: etc.;
                    500: these will be replaced by the corresponding actual
                    501: arguments when the macro is invoked.
                    502: The invocation for a macro with arguments is
                    503: .P1
                    504: name(arg1, arg2, ...)
                    505: .P2
                    506: Non-existent arguments are replaced by null
                    507: strings.
                    508: .EQ
                    509: delim $$
                    510: .EN
                    511: .PP
                    512: The following
                    513: \*g
                    514: program uses macros and arithmetic to plot
                    515: crude approximations to
                    516: the square and square root functions.
                    517: .P1
                    518: .get macarith.g
                    519: .P2
                    520: The macro
                    521: .UL root
                    522: uses the
                    523: .UL ^
                    524: exponentiation operator.
                    525: (Because
                    526: \*g
                    527: has the square root function
                    528: .UL sqrt ,
                    529: that macro is in fact superfluous.)
                    530: The program produces
                    531: .grap macarith.g
                    532: .PP
                    533: The
                    534: .UL copy
                    535: command has a
                    536: .UL thru
                    537: parameter that allows each line of a file to
                    538: be treated as though it were a macro call, with
                    539: the first field serving as
                    540: the first argument,
                    541: and so on.
                    542: This is the typical
                    543: \*g
                    544: mechanism for plotting files that are not stored as
                    545: time series or as $(x,y)$ pairs.
                    546: We will illustrate its use on the file
                    547: .UL states.d ,
                    548: which contains data on the fifty states.
                    549: .P1
                    550: .d states.d
                    551: .P2
                    552: The first field is the postal abbreviation of the state's
                    553: name (Alaska, Wyoming, Vermont, ...), the second field
                    554: is the number of Representatives to Congress from the state
                    555: after the 1981 reapportionment, and the third field is
                    556: the population of the state as measured in the 1980 Census.
                    557: The states appear in increasing order of
                    558: population.
                    559: .PP
                    560: We will first plot this data as
                    561: population, representative pairs.
                    562: (In the
                    563: .UL coord
                    564: statement,
                    565: .UL "log log"
                    566: is a synonym for
                    567: .UL "log x log y" .)
                    568: .P1
                    569: .get states1.g
                    570: .P2
                    571: Although the population is given in persons,
                    572: the
                    573: .UL PlotState
                    574: macro
                    575: plots the population in millions by dividing
                    576: the third input field
                    577: by one million (written in exponential notation
                    578: as
                    579: .UL 1e6 ,
                    580: for $1 times 10 sup 6$).
                    581: .grap states1.g
                    582: Using
                    583: .UL circle
                    584: as a plotting symbol displays
                    585: overlapping points that are obscured when
                    586: the data is plotted with bullets.
                    587: The representation of a state is roughly proportional
                    588: to its population, except in the very small states.
                    589: .PP
                    590: Our next plot will use the state's rank
                    591: in population as the $x$-coordinate and two
                    592: different $y$-coordinates: population and number of
                    593: representatives.
                    594: We will use two
                    595: .UL coord
                    596: commands to define the two coordinate systems
                    597: .UL pop
                    598: and
                    599: .UL rep .
                    600: We then explicitly give the coordinate system
                    601: whenever we refer to a point,
                    602: both in constructing axes and plotting data.
                    603: .P1
                    604: .get states2.g
                    605: .P2
                    606: The
                    607: .UL copy
                    608: statement in the program uses an
                    609: .I "immediate macro"
                    610: enclosed in curly brackets and thus avoids having to
                    611: name a macro for this task.
                    612: Because the program assumes that the states are
                    613: sorted in increasing order of population, it
                    614: generates
                    615: .UL thisrank
                    616: internally as a
                    617: \*g
                    618: variable.
                    619: The program produces
                    620: .grap states2.g
                    621: .PP
                    622: The plotting symbols were chosen for contrast in
                    623: both shape and shading.
                    624: This graph also indicates that representation is proportional
                    625: to population.
                    626: Once we see this graph, though, we should realize that we don't
                    627: really need two coordinate systems: we can relate the two by
                    628: dividing the population of the U.S. \(em about 226,000,000 \(em by
                    629: the number of representatives \(em 435 \(em to see that each
                    630: representative should count as 520,000 people.
                    631: If the purpose of this graph were to tell a story about
                    632: American politics rather than to illustrate
                    633: multiple coordinate systems,
                    634: it should be redrawn with a single coordinate
                    635: system.
                    636: .PP
                    637: Many graphs plot both observed data and a function
                    638: that (theoretically) describes the data.
                    639: There are many ways to draw a function
                    640: in \*g:
                    641: a series of
                    642: .UL next
                    643: commands is tedious but works, as does writing a
                    644: simple program to write a data file that is subsequently
                    645: read and plotted by \*g.
                    646: The
                    647: .UL for
                    648: statement often provides a better solution.
                    649: This
                    650: \*g
                    651: program
                    652: .P1
                    653: .get sin1.g
                    654: .P2
                    655: produces
                    656: .grap sin1.g
                    657: .a
                    658: The
                    659: .UL for
                    660: statement uses the same syntax as the
                    661: .UL ticks
                    662: statement, but the
                    663: .UL from
                    664: keyword can be replaced by
                    665: .UL = '', ``
                    666: which will look more familiar to programmers.
                    667: It varies the index variable over the specified range
                    668: and for each value executes all statements inside the delimiter
                    669: characters, which use the same rules as macro
                    670: delimiters.
                    671: It is, of course, useful for many tasks beyond plotting functions.
                    672: .EQ
                    673: delim %%
                    674: .EN
                    675: .PP
                    676: The
                    677: .UL if
                    678: statement provides a simple mechanism for conditional execution.
                    679: If a file contains data on both cities and states (and lines
                    680: describing states have ``S'' in the first field), it could be plotted
                    681: by statements like
                    682: .P1
                    683: if "$1" == "S" then {
                    684:        PlotState($2,$3,$4)
                    685: } else {
                    686:        PlotCity($2,$3,$4,$5,$6)
                    687: }
                    688: .P2
                    689: The
                    690: .UL else
                    691: clause
                    692: is optional; delimiters use the same rules as macros and
                    693: .UL for
                    694: statements.
                    695: .EQ
                    696: delim $$
                    697: .EN 
                    698: .NH
                    699: A Collection of Examples
                    700: .PP
                    701: The previous section covered the
                    702: \*g
                    703: commands that are used in common graphs.
                    704: In this section we'll spend less time on
                    705: language features, and survey a wider variety of
                    706: graphs.
                    707: These examples are intended more for browsing and
                    708: reference than for straight-through reading.
                    709: Be prepared to refer to the manual in Section 5 when you stumble over a new
                    710: \*g
                    711: feature.
                    712: .PP
                    713: The file
                    714: .UL cars.d
                    715: contains the mileage (miles per gallon) and the weight
                    716: (pounds) for 74 models of automobiles sold in the United States
                    717: in the 1979 model year.
                    718: .P1
                    719: .d cars.d
                    720: .P2
                    721: The trivial
                    722: \*g
                    723: program
                    724: .P1
                    725: .get cars1.g
                    726: .P2
                    727: produces
                    728: .grap cars1.g
                    729: This graph shows that weights bottom out somewhat
                    730: below 2000
                    731: pounds and that heavier cars get worse mileage;
                    732: it is hard to say much more about the relationship
                    733: between weight and mileage.
                    734: .PP
                    735: The next graph provides labels, uses circles
                    736: to expose data hidden in the clouds of bullets,
                    737: and re-expresses the $x$-axis in gallons per mile.
                    738: It also changes the point size and vertical spacing
                    739: to a size appropriate for camera-ready journal articles
                    740: and books; the size changes should be made outside the
                    741: \*g
                    742: program.
                    743: The
                    744: .UL \&.ft
                    745: command changes to a Helvetica font, which
                    746: some people prefer for graphs.
                    747: .P1
                    748: .get cars2.g
                    749: .P2
                    750: \*G
                    751: supports logarithmic re-expression of data with the
                    752: .UL log
                    753: clause in the
                    754: .UL coord
                    755: statement; any other re-expression of data must be done
                    756: with
                    757: \*g
                    758: arithmetic, as above.
                    759: .br
                    760: .grap cars2.g
                    761: This graph shows that
                    762: gallons per mile is roughly proportional to weight.
                    763: (The two outliers near 4000 pounds are the Cadillac
                    764: Seville and the Oldsmobile 98.)
                    765: .PP
                    766: In
                    767: .I "Visual Display of Quantitative Information" ,
                    768: Tufte proposes the ``dot-dash-plot'' as a means for maximizing
                    769: data ink (showing the two-dimensional distribution and
                    770: the two one-dimensional marginal distributions) while minimizing
                    771: what he calls ``chart junk'' \(em ink wasted on borders
                    772: and non-data labels.
                    773: His preference is easy to express in \*g:
                    774: .P1
                    775: .get cars3.g
                    776: .P2
                    777: Although visually attractive, we do not find the
                    778: resulting graph as useful for interpreting the data.
                    779: .grap cars3.g
                    780: Tufte's graph does point out two facts that are
                    781: not obvious in the previous graphs:
                    782: there is a gap in car weights near 3000 pounds (exhibited
                    783: by the hole in the $y$-axis ticks), and the gallons per
                    784: mile axis is regularly structured (the ticks
                    785: are the reciprocals of an almost dense sequence of integers).
                    786: The reader may decide whether those insights are worth
                    787: the decrease in clarity.
                    788: .PP
                    789: Throughout the twentieth century, horses, cars and people
                    790: have gotten faster;
                    791: let's study those improvements.
                    792: For horses, we'll consider the winning times
                    793: of the Kentucky Derby from 1909 to 1988, in
                    794: the file
                    795: .UL speedhorse.d :
                    796: .P1
                    797: .d speedhorse.d
                    798: .P2
                    799: The program
                    800: .P1
                    801: .get speedhorse1.g
                    802: .P2
                    803: produces the graph
                    804: .grap speedhorse1.g
                    805: Each race is recorded with a bullet and
                    806: record times are marked by horizontal lines.
                    807: Secretariat is the only horse to have run the
                    808: one-and-a-quarter-mile
                    809: race in under two minutes; he won in 1973 in
                    810: 1:59.4.
                    811: .PP
                    812: For automobiles we will study the
                    813: world land speed record (even though those vehicles
                    814: are by now just low-flying airplanes).
                    815: The file
                    816: .UL speedcar.d
                    817: lists years in which speed records were set and the record
                    818: set in that year, in miles per hour averaged over a one-mile
                    819: course.
                    820: .P1
                    821: .d speedcar.d
                    822: .P2
                    823: We will plot the data with the following
                    824: \*g
                    825: program, which uses nested braces in the
                    826: .UL copy
                    827: and
                    828: .UL if
                    829: statements.
                    830: .P1
                    831: .get speedcar1.g
                    832: .P2
                    833: .PP
                    834: Each record line is drawn after the
                    835: .I next
                    836: record is read, because
                    837: the program must know when the record was broken to draw
                    838: its line.
                    839: The
                    840: .UL if
                    841: statement handles the first record, and the extra
                    842: .UL line
                    843: command extends the last record out to the current date.
                    844: .grap speedcar1.g
                    845: The horizontal lines reflect the nature of world records: they
                    846: last until they are broken.
                    847: The records could also have been plotted by a scatterplot
                    848: in which each point represents the setting of a record,
                    849: but it would be misleading to connect adjacent
                    850: points with line segments
                    851: (which we inappropriately did in the graphs
                    852: of the Olympic 400 meter run).
                    853: .PP
                    854: The following graph shows the world record times for the
                    855: one mile run;
                    856: because its
                    857: \*g
                    858: program is so similar to its automotive counterpart,
                    859: we won't show the program or data.
                    860: .grap speedman1.g
                    861: The three graphs show three different kinds of
                    862: changes.
                    863: Although horses are getting faster, they appear to
                    864: be approaching a barrier near two minutes.
                    865: Cars show great jumps as new technologies are introduced
                    866: followed by a plateau as limits of the
                    867: technology are reached.
                    868: Milers have shown a fairly consistent
                    869: linear improvement
                    870: over this century, but there must be an
                    871: asymptote down there somewhere.
                    872: .PP
                    873: The next file gives the median heights of boys
                    874: in the United States aged 2 to 18, together with
                    875: the fifth and ninety-fifth percentiles.
                    876: .P1
                    877: .d boyhts.d
                    878: .P2
                    879: The heights are given in centimeters (1 foot = 30.48 centimeters).
                    880: The trivial program
                    881: .P1
                    882: .get boyhts1.g
                    883: .P2
                    884: displays the data as
                    885: .grap boyhts1.g
                    886: Because there are four numbers on each input line, the first is
                    887: taken as an $x$-value and the remaining three are plotted
                    888: as $y$-values.
                    889: .PP
                    890: The three curves appear to be roughly straight
                    891: (at least up to age 16),
                    892: so it makes sense to fit a line
                    893: through them.
                    894: We will use the standard least squares regression
                    895: in which
                    896: .EQ
                    897: slope ~=~ {
                    898: {n SIGMA x y ~ - ~ SIGMA x SIGMA y }
                    899: over
                    900: {n SIGMA x sup 2 ~ - ~ ( SIGMA x ) sup 2 }
                    901: }
                    902: .EN
                    903: (where the summations range over all $n$ $x$ and $y$ values
                    904: in the data set) and the $y$-intercept is
                    905: .EQ
                    906: {SIGMA y ~ - ~ slope times SIGMA x} over n
                    907: .EN
                    908: The following
                    909: \*g
                    910: program boldly (and rather foolishly) implements that formula.
                    911: .P1
                    912: .get boyhts3.g
                    913: .P2
                    914: It plots the extreme fifth percentiles as a bar through
                    915: the median, which is plotted as a bullet.
                    916: All heights are converted to feet before plotting and calculating
                    917: the regression line.
                    918: .grap boyhts3.g
                    919: .PP
                    920: \*G
                    921: .UL print
                    922: statements write on
                    923: .UL stderr
                    924: as they are processed by \*g;
                    925: their single argument can be either an expression or a string.
                    926: The
                    927: .UL print
                    928: statements (which are commented out in
                    929: the above
                    930: \*g
                    931: program) at one time
                    932: showed that the regression line is
                    933: .EQ
                    934: Height ~ in ~ Feet ~ = ~ 2.61 ~ + ~ .19 times Age
                    935: .EN
                    936: Thus for most American
                    937: boys between 3 and 16, you may safely assume
                    938: that they started out life at 2 feet 7 inches and grew at the
                    939: rate of two and a quarter inches per year.
                    940: .PP
                    941: This program probably misapplies \*g;
                    942: if you really want to perform least squares regressions on
                    943: data, you should usually use a simple
                    944: .I awk
                    945: program like
                    946: .P1
                    947: .get regress.awk
                    948: .P2
                    949: (Be warned, though, that this program is not numerically
                    950: robust.)
                    951: .PP
                    952: While we're on the subject of fitting straight lines to data,
                    953: we'll redraw three graphs from J. W. Tukey's
                    954: .I "Exploratory Data Analysis" .
                    955: The file
                    956: .UL usapop.d
                    957: records the population of the United States
                    958: in millions at ten-year intervals.
                    959: .P1
                    960: .d usapop.d
                    961: .P2
                    962: Tukey's first two graphs indicate that the later population
                    963: growth was linear while the early growth was exponential.
                    964: The following
                    965: \*g
                    966: program plots them as a pair, using
                    967: .UL graph
                    968: commands to place internally unrelated graphs adjacent to
                    969: one another.
                    970: .P1
                    971: .get usapop1.g
                    972: .P2
                    973: The statements defining each graph are indented for clarity.
                    974: The second graph has the northern point of its frame 0.05
                    975: inch below the southern point of the frame of the first graph;
                    976: the
                    977: .UL with
                    978: clause is passed directly through to
                    979: .I pic
                    980: without being evaluated for macros or expressions.
                    981: The names of both graphs begin with capital letters to
                    982: conform to
                    983: .I pic
                    984: syntax for labels.
                    985: .grap usapop1.g
                    986: .PP
                    987: Polynomial functions lie between the linear and exponential
                    988: functions; Tukey shows how a seventh-degree polynomial provides
                    989: a better (and longer) fit to the early population growth.
                    990: .P1
                    991: .get usapop2.g
                    992: .P2
                    993: This program re-expresses the $x$-axis with
                    994: \*g
                    995: arithmetic and uses an
                    996: .UL if
                    997: statement to graph only part of the data file.
                    998: It produces
                    999: .grap usapop2.g
                   1000: .nr k \n%
                   1001: The
                   1002: .I eqn
                   1003: .UL "space 0"
                   1004: clause is necessary to keep
                   1005: .I eqn
                   1006: from adding extra space that would interfere
                   1007: with positions computed by \*g;
                   1008: see Section 4.
                   1009: .PP
                   1010: The file
                   1011: .UL army.d
                   1012: contains four related time series
                   1013: describing the United States Army.
                   1014: .P1
                   1015: .d army.d
                   1016: .P2
                   1017: The first field is the year; the next four fields give
                   1018: the number of male officers, female officers, enlisted males
                   1019: and enlisted females, each in thousands.
                   1020: (Actually, there were no female enlisted personnel in the
                   1021: Army until 1943; the value 1 in 1940 and 1942 is just
                   1022: a placeholder, since
                   1023: \*g
                   1024: has no mechanism for handling missing data.)
                   1025: The following
                   1026: \*g
                   1027: program draws the four series with four different sets of
                   1028: .UL draw
                   1029: and
                   1030: .UL next
                   1031: commands.
                   1032: .P1
                   1033: .get army1.g
                   1034: .P2
                   1035: The program labels the lines by
                   1036: .UL copy ing
                   1037: immediate data;
                   1038: the program is therefore shorter to write and easier to change.
                   1039: The delimiter string
                   1040: .UL XXX
                   1041: in the
                   1042: .UL until
                   1043: clause could be deleted in this graph: the
                   1044: .UL \&.G2
                   1045: line also denotes the end of data.
                   1046: Even though that string is enclosed in quotes,
                   1047: it may not contain spaces.
                   1048: The $y$-positions of the labels are the
                   1049: result of several iterations.
                   1050: .grap army1.g
                   1051: .PP
                   1052: This data can tell many stories: the buildup during the
                   1053: Second World War is obvious, as is the exodus after the
                   1054: war; increases during Korea and Vietnam are
                   1055: also apparent.
                   1056: We will consider a different story: the ratio of
                   1057: enlisted men to the three other classes of personnel.
                   1058: There are several ways to plot this data
                   1059: (the most obvious graph uses three time series showing how
                   1060: the ratios change over time, and is
                   1061: left as an exercise for the reader).
                   1062: .PP
                   1063: We will instead construct a graph that gives little insight into this
                   1064: data, but illustrates a general method that is quite useful
                   1065: in conjunction with \*g.
                   1066: The graph is a ``scatterplot vector'' that shows how one
                   1067: variable (the number of enlisted men) varies as a function of
                   1068: the other three.
                   1069: Breaking with tradition, we first show the final graphs, all
                   1070: of which have logarithmic scales.
                   1071: .grap army2.g
                   1072: The number of enlisted men is almost linearly
                   1073: related to the number of male officers, it is somewhat related to the number
                   1074: of female officers, and it varies widely as a function of the number
                   1075: of enlisted women.
                   1076: .PP
                   1077: Much more interesting than the graph itself is the method we used to
                   1078: produce it.
                   1079: We wrote a miniature ``compiler'' that accepts as
                   1080: its ``source language'' a description of a scatterplot vector and
                   1081: produces as ``object code'' a
                   1082: \*g
                   1083: program to draw the graph.
                   1084: The source program for the above example is
                   1085: .P1
                   1086: .get army2.v
                   1087: .P2
                   1088: The program lists several
                   1089: global attributes of the graph, the
                   1090: $y$-variable to be plotted, and as many $x$-variables as
                   1091: are desired; with each variable is its field in the file
                   1092: and a descriptive string.
                   1093: The language is ``compiled'' by the following
                   1094: .I awk
                   1095: program.
                   1096: .P1
                   1097: .get scatvec.awk
                   1098: .P2
                   1099: Running this program on the above description produces the following
                   1100: output, which is typically piped directly to \*g.
                   1101: .P1
                   1102: .get army2.g
                   1103: .P2
                   1104: The generated program uses the
                   1105: .I pic
                   1106: trick of re-using the same name
                   1107: .UL A ) (
                   1108: for several objects.
                   1109: .PP
                   1110: Although the program above is merely a toy,
                   1111: ``minicompilers'' can produce useful preprocessors
                   1112: for \*g.
                   1113: The
                   1114: .UL scatmat
                   1115: program, for instance, is a 90-line
                   1116: .I awk
                   1117: program that reads a simple input language and produces as
                   1118: output a
                   1119: \*g
                   1120: program to produce a ``scatterplot matrix'', which
                   1121: is a handy graphical device for spotting pairwise interactions
                   1122: among several variables.
                   1123: If
                   1124: \*g
                   1125: lacks a feature you desire, consider building
                   1126: a simple preprocessor to provide it.
                   1127: An alternative is to define
                   1128: macros for the task; which approach is best depends
                   1129: strongly on the job you wish to accomplish.
                   1130: .PP
                   1131: The next graph uses iterators to make a graph without
                   1132: reading data from a file.
                   1133: Rather, its ``data'' is a
                   1134: function of two variables
                   1135: that describes a
                   1136: derivative field and a function of one variable
                   1137: that describes one solution to the differential
                   1138: equation.
                   1139: .P1
                   1140: .get ode1.g
                   1141: .P2
                   1142: The left label uses
                   1143: .I eqn
                   1144: text between the $font CW "$$"$ delimiters.
                   1145: The variable
                   1146: .UL scale
                   1147: ensures that all lines in the direction field are the same
                   1148: length.
                   1149: The
                   1150: .UL in
                   1151: clauses in the
                   1152: .UL ticks
                   1153: statements specify that the ticks go in zero inches
                   1154: to avoid overprinting.
                   1155: The variables
                   1156: .UL tx
                   1157: and
                   1158: .UL ty
                   1159: are so named because
                   1160: .UL x
                   1161: and
                   1162: .UL y
                   1163: are reserved words for the
                   1164: .UL coord
                   1165: statement.
                   1166: .grap ode1.g
                   1167: .PP
                   1168: Programmers familiar with floating point arithmetic may be
                   1169: surprised that the above graph is correct.
                   1170: Because of roundoff error, iteration
                   1171: .UL "from 0 to 1 by .05" '' ``
                   1172: usually produces the values
                   1173: $0, ~ .05, ~ .10, ~ ..., ~ .95$.
                   1174: \*G
                   1175: uses a ``fuzzy test''
                   1176: in the
                   1177: .UL for
                   1178: statement to avoid that problem, which may in turn introduce
                   1179: other problems.
                   1180: Such problems may be avoided by iterating over an integer range
                   1181: and incrementing a non-integer value within the loop.
                   1182: .PP
                   1183: Most of the data we have seen so far is inherently
                   1184: two (or more) dimensional.
                   1185: As an example of one-dimensional data, we will return to
                   1186: the populations of the fifty states, which
                   1187: is the third field in the file
                   1188: .UL states.d
                   1189: introduced earlier;
                   1190: the file is sorted in increasing order of population.
                   1191: Our first graph takes the most space, but
                   1192: it also gives the most information.
                   1193: .P1
                   1194: .get states8.g
                   1195: .P2
                   1196: The
                   1197: .UL L
                   1198: macro (for Label)
                   1199: with input parameter $X$ evaluates to the number
                   1200: $2 sup X / 1,000,000$ followed by the string "$X$"
                   1201: (the
                   1202: .UL ticks
                   1203: command expects a number followed by a string label).
                   1204: .grap states8.g
                   1205: The dotted line is the least squares regression
                   1206: .EQ
                   1207: log sub 10 ~ Population ~ = ~ 7.214 ~ - ~ .03 times Rank
                   1208: .EN
                   1209: which gives 15.3 million as the population of the
                   1210: largest state and .515 million as the population
                   1211: of the smallest state.
                   1212: It says that
                   1213: population drops by a factor of two every ten states
                   1214: (compare the top and left scales).
                   1215: As sloppy as the exponential fit is, though, it is a much better
                   1216: fit to this data
                   1217: than a Zipf's Law curve is (drawing that curve is left as
                   1218: an exercise for the reader).
                   1219: .PP
                   1220: The next graph is a more standard representation of
                   1221: one-dimensional data.
                   1222: .P1
                   1223: .get states3.g
                   1224: .P2
                   1225: The markers were chosen to be
                   1226: .UL vticks
                   1227: because they denote only an $x$-value.
                   1228: .grap states3.g
                   1229: .PP
                   1230: The next one-dimensional graph uses the state's name as
                   1231: its marker; to reduce overprinting the graph is ``jittered''
                   1232: by using a random number as a $y$-value.
                   1233: .P1
                   1234: .get states4.g
                   1235: .P2
                   1236: The function
                   1237: .UL rand()
                   1238: returns a pseudo-random real number chosen uniformly over the interval [0,1).
                   1239: .grap states4.g
                   1240: This graph is too cluttered; circles would have been
                   1241: a better choice as a plotting symbol (bullets, once again, would
                   1242: hide data).
                   1243: .PP
                   1244: Histograms are a standard way of presenting one-dimensional
                   1245: data in two-dimensional form.
                   1246: Our first step in building a histogram of the population
                   1247: data is the following
                   1248: .I awk
                   1249: program, which counts how many states are in each ``bin''
                   1250: of a million people.
                   1251: .P1
                   1252: .get states5.awk
                   1253: .P2
                   1254: The variable
                   1255: .UL bzs
                   1256: tells where bin zero starts; although it is zero in this
                   1257: graph, it might be 95 in a histogram
                   1258: of human body temperatures in degrees Fahrenheit.
                   1259: The program produces the following output in
                   1260: .UL states2.d :
                   1261: .P1
                   1262: .d states2.d
                   1263: .P2
                   1264: There are 12 states with population between 0 and 999,999,
                   1265: 5 states with population between 1,000,000 and 1,999,999,
                   1266: and so on.
                   1267: .PP
                   1268: This
                   1269: \*g
                   1270: program uses three
                   1271: .UL line
                   1272: commands to plot each rectangle in the histogram.
                   1273: .P1
                   1274: .get states5.g
                   1275: .P2
                   1276: It produces
                   1277: .grap states5.g
                   1278: .PP
                   1279: The same file can be plotted in a
                   1280: more attractive (and more useful) form by
                   1281: .P1
                   1282: .get states6.g
                   1283: .P2
                   1284: which produces
                   1285: one of Bill Cleveland's ``dot charts'' or ``lolliplots'':
                   1286: .grap states6.g
                   1287: (We use
                   1288: .UL \e(bu ,
                   1289: the
                   1290: .I troff
                   1291: character for a bullet, rather than the built-in string to
                   1292: get a larger size.)
                   1293: .PP
                   1294: Other histograms are possible.
                   1295: The following
                   1296: .I awk
                   1297: program
                   1298: .P1
                   1299: .get states7.awk
                   1300: .P2
                   1301: produces the file
                   1302: .UL states3.d
                   1303: .P1
                   1304: .d states3.d
                   1305: .P2
                   1306: which lists the state's abbreviation, bin number, and
                   1307: height within the bin.
                   1308: The
                   1309: \*g
                   1310: program
                   1311: .P1
                   1312: .get states7.g
                   1313: .P2
                   1314: reads that file to make the following histogram, in which
                   1315: the state names are used to display the heights of the bins.
                   1316: In each bin, the states occur in increasing order of
                   1317: population from bottom to top.
                   1318: .grap states7.g
                   1319: .PP
                   1320: The next data set is a run-time profile of an early version of \*g,
                   1321: created by compiling the program with the
                   1322: .UL -p
                   1323: option and running
                   1324: .UL prof
                   1325: after the program executed.
                   1326: .P1
                   1327: .d prof1.d
                   1328: .P2
                   1329: Although there were more than fifty procedures in the program, the
                   1330: top four time-hogs accounted for more than half of the run time.
                   1331: This file is difficult for
                   1332: \*g
                   1333: to deal with:
                   1334: even though
                   1335: .UL if
                   1336: statements would allow us to extract lines 2 through 11
                   1337: of the file, we could not remove the leading
                   1338: .CW _ 
                   1339: from a routine name or access the last field in a record.
                   1340: We will therefore process it with
                   1341: the following
                   1342: .I awk
                   1343: program.
                   1344: .P1
                   1345: .get prof1.awk
                   1346: .P2
                   1347: The program produces
                   1348: .P1
                   1349: .d prof2.d
                   1350: .P2
                   1351: We could even use the
                   1352: .I sh
                   1353: statement to execute the
                   1354: .I awk
                   1355: program from within \*g, which would make the latter entirely
                   1356: self-contained (see the reference manual for details).
                   1357: .PP
                   1358: We will display the data with this program.
                   1359: .P1
                   1360: .get prof1.g
                   1361: .P2
                   1362: Observe that the program knows nothing about the range of the data.
                   1363: It uses default ticks and a
                   1364: .UL frame
                   1365: statement with a computed height to achieve
                   1366: total data independence.
                   1367: .grap prof1.g
                   1368: This bar chart highlights the fact that most of the time spent by
                   1369: \*g
                   1370: is devoted to input and output.
                   1371: .PP
                   1372: J. W. Tukey's box and whisker plots
                   1373: represent the median, quartiles, and extremes of a
                   1374: one-dimensional distribution.
                   1375: The following
                   1376: \*g
                   1377: program defines a macro to draw a box plot, and then
                   1378: uses that shape to compare the distribution of heights of
                   1379: volcanoes with the distribution of heights of States of the Union.
                   1380: .P1
                   1381: .get box1.g
                   1382: .P2
                   1383: Boxes are one of many shapes used for the graphical
                   1384: representation of several quantities.
                   1385: If you use such shapes frequently then you should
                   1386: make a library file of their macros to
                   1387: .UL copy
                   1388: into your
                   1389: \*g
                   1390: programs.
                   1391: The above program produces
                   1392: .grap box1.g
                   1393: Even though the extreme heights are the same, state heights
                   1394: have a lower median and a greater spread.
                   1395: .PP
                   1396: Someday you may use
                   1397: \*g
                   1398: to prepare overhead transparencies, only to find that
                   1399: everything comes out too small.
                   1400: The following program illustrates some ways to get larger
                   1401: graphs.
                   1402: .P1
                   1403: .zzz slide1.g
                   1404: .P2
                   1405: The
                   1406: .UL ps
                   1407: and
                   1408: .UL vs
                   1409: commands preceding the graph set the text size to 14 points and
                   1410: the vertical spacing to 18 points; the two quantities are
                   1411: reset by the commands following the
                   1412: .UL .G2 .
                   1413: Such size changes should be made outside the
                   1414: \*g
                   1415: program, as mentioned earlier.
                   1416: The
                   1417: .UL 4
                   1418: following the
                   1419: .UL .G1
                   1420: stretches the graph (including
                   1421: \*g's
                   1422: estimate of the accompanying text) to be four inches wide;
                   1423: it is an alternative to altering the
                   1424: .UL frame
                   1425: command.
                   1426: The macro
                   1427: .UL blob
                   1428: is a plotting symbol that is much larger than
                   1429: .UL bullet ;
                   1430: the different name ensures that later references to
                   1431: .UL bullet
                   1432: are unaffected.
                   1433: The
                   1434: .I troff
                   1435: commands within the
                   1436: .UL blob
                   1437: string move the character down one-tenth of an em
                   1438: to center its plotting position (determined experimentally)
                   1439: and then reset the vertical position.
                   1440: The program produces this trivial (but large) graph.
                   1441: .br
                   1442: .grap slide1.g
                   1443: .NH
                   1444: Using Grap
                   1445: .PP
                   1446: Following are a few day-to-day matters about using \*g.
                   1447: .NH 2
                   1448: Errors
                   1449: .PP
                   1450: \*G
                   1451: attempts to pinpoint input errors; for example,
                   1452: the input
                   1453: .P1
                   1454: \&.G1
                   1455: i = i + 1
                   1456: .P2
                   1457: results in this message on
                   1458: .UL stderr :
                   1459: .P1
                   1460: grap: syntax error near line 1, file -
                   1461:  context is
                   1462:        i = i >>>  + <<<  1
                   1463: .P2
                   1464: The error was noticed
                   1465: at the
                   1466: .UL + .
                   1467: Unfortunately, pinpointing is not the same as explaining:
                   1468: the real error is that the variable
                   1469: .UL i
                   1470: was not initialized.
                   1471: .PP
                   1472: The ``words''
                   1473: .UL x
                   1474: and
                   1475: .UL y
                   1476: are reserved (for the
                   1477: .UL coord
                   1478: statement);
                   1479: you will get an equally inexplicable syntax error message if you use them
                   1480: as variable names.
                   1481: (This design is bad, but not nearly so bad as
                   1482: having the
                   1483: .UL log
                   1484: and
                   1485: .UL exp
                   1486: functions use base 10.)
                   1487: .PP
                   1488: \*G
                   1489: tries to load a file of standard macro definitions
                   1490: .UL /usr/lib/grap.defines ) (
                   1491: for terms like
                   1492: .UL bullet ,
                   1493: .UL plus ,
                   1494: etc.
                   1495: It doesn't complain if that file isn't found,
                   1496: but if you later use one of these words,
                   1497: you'll get a syntax error message.
                   1498: .PP
                   1499: Certain constructs suggested by analogy to
                   1500: .I pic
                   1501: do not work.
                   1502: For example,
                   1503: .UL .GS
                   1504: and
                   1505: .UL .GE
                   1506: would have been nicer than
                   1507: .UL .G1
                   1508: and
                   1509: .UL .G2 ,
                   1510: but they were already taken.
                   1511: The
                   1512: .I pic
                   1513: construct
                   1514: .P1
                   1515: \&.PS <file
                   1516: .P2
                   1517: has been superseded by 
                   1518: \*g's
                   1519: .UL copy
                   1520: command (which in turn has been retrofitted into
                   1521: .I pic ).
                   1522: .NH 2
                   1523: \fITroff\fP issues
                   1524: .PP
                   1525: You may use
                   1526: .I troff
                   1527: commands like
                   1528: .UL .ps
                   1529: or
                   1530: .UL .ft
                   1531: to change text sizes and fonts within a graph,
                   1532: or use balanced
                   1533: .UL \es
                   1534: and
                   1535: .UL \ef
                   1536: commands within a string.
                   1537: Do not, however,
                   1538: add space
                   1539: .UL .sp ) (
                   1540: or change the line spacing
                   1541: .UL .vs , (
                   1542: .UL .ls )
                   1543: within a graph.
                   1544: Some defined terms like
                   1545: .UL bullet
                   1546: contain embedded size changes;
                   1547: further qualifying them with
                   1548: \*g
                   1549: .UL size
                   1550: commands may not always work.
                   1551: .PP
                   1552: Because
                   1553: \*g
                   1554: is built on top of
                   1555: .I pic ,
                   1556: the following quote from the
                   1557: .I pic
                   1558: manual is relevant:
                   1559: ``There is a subtle problem with complicated equations inside
                   1560: .I pic
                   1561: pictures \(em they come out wrong if
                   1562: .I eqn
                   1563: has to leave extra vertical space for the equation.
                   1564: If your equation involves more than subscripts and superscripts,
                   1565: you must add to the beginning of each such equation the extra information
                   1566: .UL "space 0" ''.
                   1567: This feature was illustrated in the graph of the
                   1568: United States population in Section 3.
                   1569: .NH 2
                   1570: Alternatives
                   1571: .PP
                   1572: Besides
                   1573: \*g
                   1574: and your local draftsperson, what other choices are there?
                   1575: .PP
                   1576: The S system |reference(slanguage chambers) provides
                   1577: a host of tools for statistical analysis,
                   1578: but somewhat fewer tools than
                   1579: \*g
                   1580: for producing document-quality graphs.
                   1581: S produces graphs on the screen of a DMD 5620 terminal much more quickly than
                   1582: \*g
                   1583: (often in seconds rather than minutes), but it
                   1584: takes somewhat longer to learn (at least for us).
                   1585: If you expect to do a lot of interactive data analysis, then
                   1586: S is probably the right tool for you.
                   1587: S may be used to generate 
                   1588: .I pic
                   1589: commands.
                   1590: .PP
                   1591: The standard UNIX program
                   1592: .I graph
                   1593: provides many of the basic features of
                   1594: \*g,
                   1595: though with quite a bit less control over details, particularly
                   1596: text.
                   1597: It produces output only in the
                   1598: .UX
                   1599: .I plot (5)
                   1600: language,
                   1601: which may be processed by a variety of filters
                   1602: for a variety of output devices.
                   1603: .PP
                   1604: The original
                   1605: .UX
                   1606: typesetter graphics programs are
                   1607: .I pic
                   1608: and
                   1609: .I ideal ;
                   1610: you may be able to do as well without using
                   1611: \*g
                   1612: as an intermediary.
                   1613: In particular,
                   1614: .I ideal
                   1615: provides shading and clipping,
                   1616: which are useful
                   1617: in presentation-quality bar charts and the like, but are
                   1618: well beyond the capabilities of 
                   1619: .I pic .
                   1620: .EQ
                   1621: delim $$
                   1622: .EN
                   1623: .NH
                   1624: References
                   1625: .LP
                   1626: |reference_placement
                   1627: .NH
                   1628: Reference Manual
                   1629: .PP
                   1630: In the following, 
                   1631: .I italic
                   1632: terms are syntactic categories,
                   1633: .UL typewriter
                   1634: terms are literals,
                   1635: parenthesized constructs are optional, and ... indicates repetition.
                   1636: In most cases, the order of statements,
                   1637: constructs and attributes is immaterial.
                   1638: .P1
                   1639: .IT "grap program" :
                   1640:        .G1 \f2(width in inches)\fP
                   1641:        \f2grap statement\fP
                   1642:        ...
                   1643:        .G2
                   1644: .P2
                   1645: A width on the
                   1646: .UL .G1
                   1647: line overrides the computed width, as in
                   1648: .I pic .
                   1649: .P1
                   1650: .IT "grap statement" :
                   1651: .I
                   1652:          frame \(or label \(or coord \(or ticks \(or grid \(or plot \(or line \(or circle \(or draw \(or new \(or next
                   1653:        \(or graph \(or numberlist \(or copy \(or for \(or if \(or sh \(or pic \(or assignment \(or print
                   1654: .ft
                   1655: .P2
                   1656: .PP
                   1657: The
                   1658: .UL frame
                   1659: statement defines the frame that surrounds the graph:
                   1660: .P1
                   1661: .IT frame :
                   1662:        frame \f2(\fPht \f2expr)\fP \f2(\fPwid \f2expr)\fP \f2((side) linedesc)\fP \f2...\fP
                   1663: .IT side :
                   1664:        top \(or bot \(or left \(or right
                   1665: .IT linedesc :
                   1666:        solid \(or invis \(or dotted \f2(expr)\fP \(or dashed \f2(expr)\fP
                   1667: .P2
                   1668: Height and width default to 2 and 3 inches;
                   1669: sides default to solid.
                   1670: If
                   1671: .I side
                   1672: is omitted, the
                   1673: .I linedesc
                   1674: applies to the entire frame.
                   1675: The optional expressions after
                   1676: .UL dotted
                   1677: and
                   1678: .UL dashed
                   1679: change the spacing exactly as in
                   1680: .I pic .
                   1681: .PP
                   1682: The
                   1683: .UL label
                   1684: statement places a label on a specified side:
                   1685: .P1
                   1686: .IT label :
                   1687:        label \f2side\fP \f2strlist\fP \f2...\fP \f2shift\fP
                   1688: .IT shift:
                   1689:        left\f2 \(or \fPright\f2 \(or \fPup\f2 \(or \fPdown \f2expr ...\fP
                   1690: .IT strlist :
                   1691:        \f2str ... (\fPrjust\f2 \(or \fPljust\f2 \(or \fPabove\f2 \(or \fPbelow\f2) ... (\fPsize \f2(\fP\(+-\f2) expr) ...\fP
                   1692: .IT str :
                   1693:        "\f2...\fP"
                   1694: .P2
                   1695: Lists of text strings are stacked vertically.
                   1696: In any context, string lists may contain clauses
                   1697: to adjust the position or change the point size.
                   1698: Each clause applies to the string preceding it
                   1699: and all following strings.
                   1700: Labels may also have a
                   1701: .UL width
                   1702: attribute, to override
                   1703: \*g's
                   1704: default computation.
                   1705: .PP
                   1706: Normally the coordinate system is defined by the data,
                   1707: with 7 percent extra on each side.
                   1708: (To change that to 5 percent, assign 0.05 to the
                   1709: \*g
                   1710: variable
                   1711: .UL margin ,
                   1712: which is reset to 0.07 at each
                   1713: .UL .G1
                   1714: statement.)
                   1715: The
                   1716: .UL coord
                   1717: statement defines an overriding system:
                   1718: .P1
                   1719: .IT coord :
                   1720:        coord \f2(name)\fP \f2(\fPx \f2expr,expr)\fP \f2(\fPy \f2expr,expr)\fP \f2(\fPlog x \(or log y \(or log log\f2) \fP
                   1721: .P2
                   1722: Coordinate systems can be named;
                   1723: ranges, logarithmic scaling, etc., are done separately for each.
                   1724: .PP
                   1725: The
                   1726: .UL ticks
                   1727: statement places tick marks on one side of the frame:
                   1728: .P1
                   1729: .IT ticks :
                   1730:        ticks \f2side\fP \f2(\fPin \(or out \f2(expr))\fP \f2(shift)  (tick-locations)\fP
                   1731: .IT tick-locations :
                   1732:          at \f2(name) expr (str)\fP, \f2expr (str)\fP, \f2...\fP
                   1733:        \(or from \f2(name) expr\fP to \f2expr\fP \f2(\fPby \f2(op) expr)\fP \f2str\fP
                   1734: .P2
                   1735: If no ticks are specified, they will be provided automatically;
                   1736: .UL ticks
                   1737: .UL off
                   1738: suppresses automatic ticks.
                   1739: The optional expression after
                   1740: .UL in
                   1741: or
                   1742: .UL out
                   1743: specifies the length of the ticks in inches.
                   1744: The optional name refers to a coordinate system.
                   1745: If
                   1746: .IT str
                   1747: contains
                   1748: format specifiers like
                   1749: .UL %f
                   1750: or
                   1751: .UL %g ,
                   1752: they are interpreted as by
                   1753: .UL printf .
                   1754: If no
                   1755: .IT str
                   1756: is supplied, the tick labels will be the values of the
                   1757: expressions.
                   1758: .PP
                   1759: If the
                   1760: .UL by
                   1761: clause is omitted, steps are of size 1.
                   1762: If the
                   1763: .UL by
                   1764: expression is preceded by one of
                   1765: .UL + ,
                   1766: .UL - ,
                   1767: .UL *
                   1768: or
                   1769: .UL / ,
                   1770: the step is scaled by that operator,
                   1771: e.g.,
                   1772: .UL *10
                   1773: means that each step is 10 times the previous one.
                   1774: .PP
                   1775: The
                   1776: .UL grid
                   1777: statement produces grid lines along (i.e., perpendicular to)
                   1778: the named side.
                   1779: .P1
                   1780: .IT grid :
                   1781:        grid \f2side (linedesc) (shift)  (tick-locations)\fP
                   1782: .P2
                   1783: Grids are labeled by the same mechanism as
                   1784: .UL ticks .
                   1785: It is possible to draw grids without ticks by placing the phrase
                   1786: .UL ticks
                   1787: .UL off
                   1788: after the side name and before the iterator.
                   1789: .PP
                   1790: Plot
                   1791: statements place text at a point:
                   1792: .P1
                   1793: .IT plot :
                   1794:        \f2strlist\fP at \f2point\fP
                   1795:        plot \f2expr (str)\fP at \f2point\fP
                   1796: .IT point :
                   1797:        \f2(name) expr,expr\fP
                   1798: .P2
                   1799: As in the
                   1800: .UL label
                   1801: statement, the string list may contain
                   1802: position and size modifiers.
                   1803: The
                   1804: .UL plot
                   1805: statement uses the optional format string as in C's
                   1806: .UL printf
                   1807: statement \(em it may contain a
                   1808: .UL %f
                   1809: or
                   1810: .UL %g .
                   1811: The optional name refers to a coordinate system.
                   1812: .PP
                   1813: The
                   1814: .UL line
                   1815: statement draws a line or arrow from here to there:
                   1816: .P1
                   1817: .IT line :
                   1818:        \f2(\fPline \(or arrow\f2)\fP from \f2point\fP to \f2point (linedesc)\fP
                   1819: .P2
                   1820: The
                   1821: .UL circle
                   1822: statement draws a circle:
                   1823: .P1
                   1824: .IT circle :
                   1825:        circle at \f2point (\fPradius \f2expr)\fP
                   1826: .P2
                   1827: The radius is in inches; the default size is small.
                   1828: .PP
                   1829: The 
                   1830: .UL draw
                   1831: statement defines a sequence of lines:
                   1832: .P1
                   1833: .IT draw :
                   1834:        draw \f2(name) linedesc (str)\fP
                   1835: .P2
                   1836: Subsequent data for the named sequence
                   1837: will be plotted as a line of the specified style,
                   1838: with the optional
                   1839: .IT str
                   1840: plotted at each point.
                   1841: The
                   1842: .UL next
                   1843: statement continues a sequence:
                   1844: .P1
                   1845: .IT next :
                   1846:        next \f2(name)\fP at \f2point (linedesc)\fP
                   1847: .P2
                   1848: If a line description is specified, it overrides the default
                   1849: display mode for the line segment ending at
                   1850: .I point .
                   1851: The
                   1852: .UL new
                   1853: statement starts a new sequence; it has the same format as the
                   1854: .UL draw
                   1855: statement.
                   1856: .PP
                   1857: A line consisting of a set of numbers
                   1858: is treated as a family of points
                   1859: $x$, $y sub 1$, $y sub 2$, etc.,
                   1860: to be plotted at the single
                   1861: $x$ value.
                   1862: .P1
                   1863: .IT numberlist :
                   1864:        \f2number\fP ...
                   1865: .P2
                   1866: If there is only one number it is treated as
                   1867: a $y$ value, and $x$ values of 1, 2, 3, ...
                   1868: are supplied automatically.
                   1869: .PP
                   1870: \*G 
                   1871: provides arithmetic with the operators
                   1872: .UL + ,
                   1873: .UL - ,
                   1874: .UL * ,
                   1875: .UL / ,
                   1876: and
                   1877: .UL ^ .
                   1878: Variables may be assigned to;
                   1879: assignments are expressions.
                   1880: Built-in functions include
                   1881: .UL log ,
                   1882: .UL exp
                   1883: (both base 10 \(em beware!),
                   1884: .UL int
                   1885: (truncates towards zero),
                   1886: .UL sin ,
                   1887: .UL cos 
                   1888: (both use radians),
                   1889: .UL atan2(dy,dx) ,
                   1890: .UL sqrt ,
                   1891: .UL min
                   1892: (two arguments only),
                   1893: .UL max
                   1894: (ditto),
                   1895: and
                   1896: .UL rand()
                   1897: (returns a real number random on [0,1)).
                   1898: .PP
                   1899: The
                   1900: .UL for
                   1901: statement provides a modest looping facility:
                   1902: .P1
                   1903: .IT for :
                   1904:        for \f2var\fP from \f2expr\fP to \f2expr (\fPby \f2(op) expr)\fP do { \f2anything\fP }
                   1905: .P2
                   1906: The string may contain internally balanced braces.
                   1907: Alternatively, any other character may appear immediately after the word
                   1908: .UL do ,
                   1909: and the string is terminated by the next occurrence of that character.
                   1910: The text
                   1911: .IT anything
                   1912: (which may contain newlines) is repeated as 
                   1913: .IT var
                   1914: takes on values from
                   1915: .IT expr1
                   1916: to
                   1917: .IT expr2 .
                   1918: As with tick iterators, the
                   1919: .UL by
                   1920: clause is optional, and may proceed arithmetically or multiplicatively.
                   1921: In a
                   1922: .UL for
                   1923: statement,
                   1924: the
                   1925: .UL from
                   1926: may be replaced by
                   1927: .UL = ''. ``
                   1928: .PP
                   1929: The
                   1930: .UL if-then-else
                   1931: statement provides conditional evaluation:
                   1932: .P1
                   1933: .IT if :
                   1934:        if \f2expr\fP then { \f2anything\fP } else { \f2anything\fP }
                   1935: .P2
                   1936: The
                   1937: .UL else
                   1938: clause
                   1939: is optional.
                   1940: Relational operators include
                   1941: .UL == ,
                   1942: .UL != ,
                   1943: .UL > ,
                   1944: .UL >= ,
                   1945: .UL < ,
                   1946: .UL <= ,
                   1947: .UL ! ,
                   1948: .UL || ,
                   1949: and
                   1950: .UL && .
                   1951: Strings may be compared with the operators
                   1952: .UL ==
                   1953: and
                   1954: .UL != .
                   1955: .PP
                   1956: It is possible to convert numeric expressions to formatted strings:
                   1957: .P1
                   1958: sprintf("\f2format\fP", \f2expr\fP, \f2expr\fP, ...)
                   1959: .P2
                   1960: is equivalent to a quoted string in any context.
                   1961: Variants of
                   1962: .UL %f
                   1963: and
                   1964: .UL %g
                   1965: are the only sensible format conversions.
                   1966: .PP
                   1967: \*G
                   1968: provides the same macro processor that
                   1969: .I pic
                   1970: does:
                   1971: .P1
                   1972: define \f2macro-name\fP { \f2anything\fP }
                   1973: .P2
                   1974: .EQ
                   1975: delim %%
                   1976: .EN
                   1977: Subsequent occurrences of the macro name will be replaced
                   1978: by the string, with arguments of the form \f(CW$\fIn\fR
                   1979: replaced by corresponding actual arguments.
                   1980: Macro definitions persist across
                   1981: .UL .G2
                   1982: boundaries, as do values of variables.
                   1983: .EQ
                   1984: delim $$
                   1985: .EN
                   1986: .PP
                   1987: The
                   1988: .UL copy
                   1989: statement is somewhat overloaded:
                   1990: .P1
                   1991: copy "\f2filename\fP"
                   1992: .P2
                   1993: includes the contents of the named file at that point;
                   1994: .P1
                   1995: copy "\f2filename\fP" thru \f2macro-name\fP
                   1996: .P2
                   1997: copies the file through the macro; and
                   1998: .P1
                   1999: copy thru \f2macro-name\fP
                   2000: .P2
                   2001: copies subsequent lines through the macro;
                   2002: each number or quoted string is treated as an argument.
                   2003: In each case, copying continues until end of file or the next
                   2004: .UL .G2 .
                   2005: The optional clause
                   2006: .UL until
                   2007: .IT str
                   2008: causes copying to terminate when a line whose
                   2009: first field is
                   2010: .IT str
                   2011: occurs.
                   2012: In all cases, the macro can be specified inline rather than by name:
                   2013: .P1
                   2014: copy thru { \f2macro body\fP }
                   2015: .P2
                   2016: .PP
                   2017: The
                   2018: .UL sh
                   2019: command passes text through to the UNIX shell.
                   2020: .P1
                   2021: .IT sh :
                   2022:        sh { \f2anything\fP }
                   2023: .P2
                   2024: The body of the command is scanned for macros.
                   2025: The built-in macro
                   2026: .UL pid
                   2027: is a string consisting of the process identification number;
                   2028: it can be used to generate unique file names.
                   2029: .PP
                   2030: The
                   2031: .UL pic
                   2032: command passes text through to
                   2033: .I pic 
                   2034: with the 
                   2035: .UL pic '' ``
                   2036: removed; variables and macros are not evaluated.
                   2037: Lines beginning with a period (that are not numbers)
                   2038: are passed through literally, under the assumption that they
                   2039: are
                   2040: .I troff
                   2041: commands.
                   2042: .PP
                   2043: The
                   2044: .UL graph
                   2045: statement
                   2046: .P1
                   2047: .IT graph :
                   2048:        graph \f2Picname (pic-text)\fP
                   2049: .P2
                   2050: defines a new graph named
                   2051: .I Picname ,
                   2052: resetting all coordinate systems.
                   2053: If any
                   2054: .UL graph
                   2055: commands are used in a
                   2056: \*g
                   2057: program, then the statement after the
                   2058: .UL \&.G1
                   2059: must be a
                   2060: .UL graph
                   2061: command.
                   2062: The
                   2063: .I pic-text
                   2064: can be used to position this graph relative
                   2065: to previous graphs by referring to their
                   2066: .UL Frame s,
                   2067: as in
                   2068: .P1
                   2069:        graph First
                   2070:         ...
                   2071:        graph Second with .Frame.w at First.Frame.e + (0.1,0)
                   2072: .P2
                   2073: Macros and expressions in
                   2074: .I pic-text
                   2075: are not evaluated.
                   2076: .I Picname s
                   2077: must begin with a capital letter to satisfy 
                   2078: .I pic
                   2079: syntax.
                   2080: .PP
                   2081: The
                   2082: .UL print
                   2083: statement
                   2084: .P1
                   2085: .IT print :
                   2086:        print \f2(expr\fP \(or \f2str)\fP
                   2087: .P2
                   2088: writes on
                   2089: .UL stderr
                   2090: as
                   2091: \*g
                   2092: processes its input; it is sometimes useful for debugging.
                   2093: .PP
                   2094: Many reserved words have synonyms, such as
                   2095: .UL thru
                   2096: for
                   2097: .UL through ,
                   2098: .UL tick
                   2099: for
                   2100: .UL ticks,
                   2101: and
                   2102: .UL bot
                   2103: for
                   2104: .UL bottom .
                   2105: .PP
                   2106: The
                   2107: .UL #
                   2108: introduces a comment, which ends at the end of the line.
                   2109: Statements may be continued over several lines by preceding each
                   2110: newline with a
                   2111: backslash character.
                   2112: Multiple statements may appear on a single line separated
                   2113: by semicolons.
                   2114: \*G
                   2115: ignores any line that is entirely blank, including those
                   2116: processed by
                   2117: .UL "copy thru"
                   2118: commands.
                   2119: .PP
                   2120: When
                   2121: \*g
                   2122: is first executed it reads standard macro definitions
                   2123: from the file
                   2124: .UL /usr/lib/grap.defines .
                   2125: The definitions include
                   2126: .UL bullet ,
                   2127: .UL plus ,
                   2128: .UL box ,
                   2129: .UL star ,
                   2130: .UL dot ,
                   2131: .UL times ,
                   2132: .UL htick ,
                   2133: .UL vtick ,
                   2134: .UL square ,
                   2135: and
                   2136: .UL delta .

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.