Annotation of researchv10dc/vol2/sam/sam.ms, revision 1.1

1.1     ! root        1: .so ../ADM/mac
        !             2: .XX sam 399 "The Text Editor Sam"
        !             3: .nr dP 2
        !             4: .nr dV 3p
        !             5: .de Cs
        !             6: .br
        !             7: .fi
        !             8: .ft 2
        !             9: .ps -2
        !            10: .vs -2
        !            11: ..
        !            12: .de Ce
        !            13: .br
        !            14: .nf
        !            15: .ft 1
        !            16: .ps
        !            17: .vs
        !            18: .sp
        !            19: ..
        !            20: .TL
        !            21: The Text Editor Sam\(dg
        !            22: .AU
        !            23: Rob Pike
        !            24: .AI
        !            25: .MH
        !            26: .AB
        !            27: .PP
        !            28: .I Sam
        !            29: is an interactive multi-file text editor intended for
        !            30: bitmap displays.
        !            31: A textual command language
        !            32: supplements the mouse-driven, cut-and-paste interface
        !            33: to make complex or
        !            34: repetitive editing tasks easy to specify.
        !            35: The language is characterized by the composition of regular expressions
        !            36: to describe the structure of the text being modified.
        !            37: The treatment of files as a database, with changes logged
        !            38: as atomic transactions, guides the implementation and
        !            39: makes a general `undo' mechanism straightforward.
        !            40: .PP
        !            41: .I Sam
        !            42: is implemented as two processes connected by a low-bandwidth stream,
        !            43: one process handling the display and the other the editing
        !            44: algorithms.  Therefore it can run with the display process
        !            45: in a bitmap terminal and the editor on a local host,
        !            46: with both processes on a bitmap-equipped host, or with
        !            47: the display process in the terminal and the editor in a
        !            48: remote host.
        !            49: By suppressing the display process,
        !            50: it can even run without a bitmap terminal.
        !            51: .AE
        !            52: .2C
        !            53: .FS
        !            54: \(dg This article originally appeared as |reference(spe sam pike).
        !            55: It is reprinted by permission of John Wiley & Sons, Ltd.
        !            56: .FE
        !            57: .NH
        !            58: Introduction
        !            59: .LP
        !            60: .I Sam
        !            61: is an interactive text editor that combines cut-and-paste interactive editing with
        !            62: an unusual command language based on the composition of regular expressions.
        !            63: It is written as two programs: one, the `host part,' runs on a
        !            64: .UX
        !            65: system
        !            66: and implements the command language and provides file access; the other, the
        !            67: `terminal part,' runs asynchronously
        !            68: on a machine with a mouse and bitmap display
        !            69: and supports the display and interactive editing.
        !            70: The host part may be even run in isolation on an ordinary terminal
        !            71: to edit text using the command
        !            72: language, much like a traditional line editor,
        !            73: without assistance from a mouse or display.
        !            74: Most often,
        !            75: the terminal part runs on a Blit|reference(blit bstj) terminal
        !            76: (actually on a Teletype DMD 5620, the production version of the Blit), whose
        !            77: host connection is an ordinary 9600 bps RS-232 link;
        !            78: on the Sun computer the host and display processes run on a single machine,
        !            79: connected by a pipe.
        !            80: .PP
        !            81: .I Sam
        !            82: edits uninterpreted
        !            83: ASCII text.
        !            84: It has no facilities for multiple fonts, graphics or tables,
        !            85: unlike MacWrite,|reference(macwrite) Bravo,|reference(bravo)
        !            86: Tioga|reference(tioga)
        !            87: or Lara.|reference(lara)
        !            88: Also unlike them, it has a rich command language.
        !            89: (Throughout this paper, the phrase
        !            90: .I
        !            91: command language
        !            92: .R
        !            93: refers to
        !            94: textual commands; commands activated from the mouse form the
        !            95: .I mouse
        !            96: .I language. )
        !            97: .I Sam
        !            98: developed as an editor for use by programmers, and tries to join
        !            99: the styles of the
        !           100: .UX
        !           101: text editor
        !           102: .I ed
        !           103: |reference(v7manual)|reference(kernighan pike)
        !           104: with that of interactive cut-and-paste editors by
        !           105: providing a comfortable mouse-driven interface
        !           106: to a program with a solid command language driven by regular expressions.
        !           107: The command language developed more than the mouse language, and
        !           108: acquired a notation for describing the structure of files
        !           109: more richly than as a sequence of lines,
        !           110: using a dataflow-like syntax for specifying changes.
        !           111: .PP
        !           112: The interactive style was influenced by
        !           113: .I jim ,
        !           114: |reference(bstj blit)
        !           115: an early cut-and-paste editor for the Blit, and by
        !           116: .I mux ,
        !           117: |reference(latest volume1)
        !           118: the Blit window system.
        !           119: .I Mux
        !           120: merges the original Blit window system,
        !           121: .I mpx ,
        !           122: |reference(bstj blit)
        !           123: with cut-and-paste editing, forming something like a
        !           124: multiplexed version of
        !           125: .I jim
        !           126: that edits the output of (and input to) command sessions rather than files.
        !           127: .PP
        !           128: The first part of this paper describes the command language, then the mouse
        !           129: language, and explains how they interact.
        !           130: That is followed by a description of the implementation,
        !           131: first of the host part, then of the terminal part.
        !           132: A principle that influenced the design of
        !           133: .I sam
        !           134: is that it should have no explicit limits, such as upper limits on
        !           135: file size or line length.
        !           136: A secondary consideration is that it be efficient.
        !           137: To honor these two goals together requires a method for efficiently
        !           138: manipulating
        !           139: huge strings (files) without breaking them into lines,
        !           140: perhaps while making thousands of changes
        !           141: under control of the command language.
        !           142: .I Sam 's
        !           143: method is to
        !           144: treat the file as a transaction database, implementing changes as atomic
        !           145: updates.  These updates may be unwound easily to `undo' changes.
        !           146: Efficiency is achieved through a collection of caches that minimizes
        !           147: disc traffic and data motion, both within the two parts of the program
        !           148: and between them.
        !           149: .PP
        !           150: The terminal part of
        !           151: .I sam
        !           152: is fairly straightforward.
        !           153: More interesting is how the two halves of the editor stay
        !           154: synchronized when either half may initiate a change.
        !           155: This is achieved through a data structure that organizes the
        !           156: communications and is maintained in parallel by both halves.
        !           157: .PP
        !           158: The last part of the paper chronicles the writing of
        !           159: .I sam
        !           160: and discusses the lessons that were learned through its development and use.
        !           161: .PP
        !           162: The paper is long, but is composed largely of two papers of reasonable length:
        !           163: a description of the user interface of
        !           164: .I sam
        !           165: and a discussion of its implementation.
        !           166: They are combined because the implementation is strongly influenced by
        !           167: the user interface, and vice versa.
        !           168: .NH
        !           169: The Interface
        !           170: .LP
        !           171: .I Sam
        !           172: is a text editor for multiple files.
        !           173: File names may be provided when it is invoked:
        !           174: .P1 2n
        !           175: sam file1 file2 ...
        !           176: .P2
        !           177: and there are commands
        !           178: to add new files and discard unneeded ones.
        !           179: Files are not read until necessary
        !           180: to complete some command.
        !           181: Editing operations apply to an internal copy
        !           182: made when the file is read; the
        !           183: .UX
        !           184: file associated with the copy
        !           185: is changed only by an explicit command.
        !           186: To simplify the discussion, the internal copy is here called a
        !           187: .I file ,
        !           188: while the disc-resident original is called a
        !           189: .I
        !           190: disc file.
        !           191: .R
        !           192: .PP
        !           193: .I Sam
        !           194: is usually connected to a bitmap display that presents a cut-and-paste
        !           195: editor driven by the mouse.
        !           196: In this mode, the command language is still available:
        !           197: text typed in a special window, called the
        !           198: .I sam
        !           199: .I window,
        !           200: is interpreted
        !           201: as commands to be executed in the current file.
        !           202: Cut-and-paste editing may be used in any window \(em even in the
        !           203: .I sam
        !           204: window to construct commands.
        !           205: The other mode of operation, invoked by starting
        !           206: .I sam
        !           207: with the option
        !           208: .CW -d
        !           209: (for `no download'),
        !           210: does not use the mouse or bitmap display, but still permits
        !           211: editing using the textual command language, even on an ordinary terminal,
        !           212: interactively or from a script.
        !           213: .PP
        !           214: The following sections describe first the command language (under
        !           215: .CW sam\ -d
        !           216: and in the
        !           217: .I sam
        !           218: window), and then the mouse interface.
        !           219: These two languages are nearly independent, but connect through the
        !           220: .I current
        !           221: .I text,
        !           222: described below.
        !           223: .NH 2
        !           224: The Command Language
        !           225: .LP
        !           226: A file consists of its contents, which are an array of characters
        !           227: (that is, a string); the
        !           228: .I name
        !           229: of the associated disc file; the
        !           230: .I
        !           231: modified bit
        !           232: .R
        !           233: that states whether the contents match those of
        !           234: the disc file;
        !           235: and a substring of the contents, called the
        !           236: .I
        !           237: current text
        !           238: .R
        !           239: or
        !           240: .I dot
        !           241: (see Figures 1 and 2).
        !           242: If the current text is a null string, dot falls between characters.
        !           243: The
        !           244: .I value
        !           245: of dot is the location of the current text; the
        !           246: .I contents
        !           247: of dot are the characters it contains.
        !           248: .I Sam
        !           249: imparts to the text no two-dimensional interpretation such as columns
        !           250: or fields; text is always one-dimensional.
        !           251: Even the idea of a `line' of text as understood by most
        !           252: .UX
        !           253: programs
        !           254: \(em a sequence of characters terminated by a newline character \(em
        !           255: is only weakly supported.
        !           256: .PP
        !           257: The
        !           258: .I
        !           259: current file
        !           260: .R
        !           261: is the file to which editing commands refer.
        !           262: The current text is therefore dot in the current file.
        !           263: If a command doesn't explicitly name a particular file or piece of text,
        !           264: the command is assumed to apply to the current text.
        !           265: For the moment, ignore the presence of multiple files and consider
        !           266: editing a single file.
        !           267: .1C
        !           268: .KF L
        !           269: .IB fig1.ps 3.10i 5.65i c
        !           270: .Cs
        !           271: Figure 1. A typical
        !           272: .I sam
        !           273: screen, with the editing menu presented.
        !           274: The
        !           275: .I sam
        !           276: (command language) window is in the middle, with file windows above and below.
        !           277: (The user interface makes it easy to create these abutting windows.)
        !           278: The partially obscured window is a third file window.
        !           279: The uppermost window is that to which typing and mouse operations apply,
        !           280: as indicated by its heavy border.
        !           281: Each window has its current text highlighted in reverse video.
        !           282: The
        !           283: .I sam
        !           284: window's current text is the null string on the last visible line,
        !           285: indicated by a vertical bar.
        !           286: See also Figure 2.
        !           287: .Ce
        !           288: .KE
        !           289: .2C
        !           290: .PP
        !           291: Commands have one-letter names.
        !           292: Except for non-editing commands such as writing
        !           293: the file to disc, most commands make some change
        !           294: to the text in dot and leave dot set to the text resulting from the change.
        !           295: For example, the delete command,
        !           296: .CW d ,
        !           297: deletes the text in dot, replacing it by the null string and setting dot
        !           298: to the result.
        !           299: The change command,
        !           300: .CW c ,
        !           301: replaces dot by text delimited by an arbitrary punctuation character,
        !           302: conventionally
        !           303: a slash.  Thus,
        !           304: .P1 2n
        !           305: c/Peter/
        !           306: .P2
        !           307: replaces the text in dot by the string
        !           308: .CW Peter .
        !           309: Similarly,
        !           310: .P1 2n
        !           311: a/Peter/
        !           312: .P2
        !           313: (append) adds the string after dot, and
        !           314: .P1 2n
        !           315: i/Peter/
        !           316: .P2
        !           317: (insert) inserts before dot.
        !           318: All three leave dot set to the new text,
        !           319: .CW Peter .
        !           320: .PP
        !           321: Newlines are part of the syntax of commands:
        !           322: the newline character lexically terminates a command.
        !           323: Within the inserted text, however, newlines are never implicit.
        !           324: But since it is often convenient to insert multiple lines of text,
        !           325: .I sam
        !           326: has a special
        !           327: syntax for that case:
        !           328: .P1 2n
        !           329: a
        !           330: some lines of text
        !           331: to be inserted in the file,
        !           332: terminated by a period
        !           333: on a line by itself
        !           334: \&.
        !           335: .P2
        !           336: In the one-line syntax, a newline character may be specified by a C-like
        !           337: escape, so
        !           338: .P1 2n
        !           339: c/\en/
        !           340: .P2
        !           341: replaces dot by a single newline character.
        !           342: .PP
        !           343: .I Sam
        !           344: also has a substitute command,
        !           345: .CW s :
        !           346: .P1 2n
        !           347: s/\f2expression\fP/\f2replacement\fP/
        !           348: .P2
        !           349: substitutes the replacement text for the first match, in dot,
        !           350: of the regular expression.
        !           351: Thus, if dot is the string
        !           352: .CW Peter ,
        !           353: the command
        !           354: .P1 2n
        !           355: s/t/st/
        !           356: .P2
        !           357: changes it to
        !           358: .CW Pester .
        !           359: In general,
        !           360: .CW s
        !           361: is unnecessary, but it was inherited from
        !           362: .I ed
        !           363: and it has some convenient variations.
        !           364: For instance, the replacement text may include the matched text,
        !           365: specified by
        !           366: .CW & :
        !           367: .P1 2n
        !           368: s/Peter/Oh, &, &, &, &!/
        !           369: .P2
        !           370: .PP
        !           371: There are also three commands that apply programs
        !           372: to text:
        !           373: .P1 2n
        !           374: < \f2Unix program\fP
        !           375: .P2
        !           376: replaces dot by the output of the
        !           377: .UX
        !           378: program.
        !           379: Similarly, the
        !           380: .CW >
        !           381: command
        !           382: runs the program with dot as its standard input, and
        !           383: .CW |
        !           384: does both.  For example,
        !           385: .P1 2n
        !           386: | sort
        !           387: .P2
        !           388: replaces dot by the result of applying the standard sorting utility to it.
        !           389: Again, newlines have no special significance for these
        !           390: .I sam
        !           391: commands.
        !           392: The text acted upon and resulting from these commands is not necessarily
        !           393: bounded by newlines, although for connection with
        !           394: .UX
        !           395: programs,
        !           396: newlines may be necessary to obey conventions.
        !           397: .PP
        !           398: One more command:
        !           399: .CW p
        !           400: prints the contents of dot.
        !           401: Table I summarizes
        !           402: .I sam 's
        !           403: commands.
        !           404: .1C
        !           405: .KF
        !           406: .Tm Table of Commands  s
        !           407: .TS
        !           408: center;
        !           409: c s
        !           410: lfCW l.
        !           411: Table I. \fISam\fP commands
        !           412: .sp .4
        !           413: .ft CW
        !           414: _
        !           415: .ft
        !           416: .sp .4
        !           417: \f1Text commands\fP    
        !           418: .sp .4
        !           419: _
        !           420: .sp .4
        !           421: a/\f2text\fP/  Append text after dot
        !           422: c/\f2text\fP/  Change text in dot
        !           423: i/\f2text\fP/  Insert text before dot
        !           424: d      Delete text in dot
        !           425: s/\f2regexp\fP/\f2text\fP/     Substitute text for match of regular expression in dot
        !           426: m \f2address\fP        Move text in dot after address
        !           427: t \f2address\fP        Copy text in dot after address
        !           428: .sp .4
        !           429: _
        !           430: .sp .4
        !           431: \f1Display commands\fP 
        !           432: .sp .4
        !           433: _
        !           434: .sp .2
        !           435: p      Print contents of dot
        !           436: \&=    Print value (line numbers and character numbers) of dot
        !           437: .sp .4
        !           438: _
        !           439: .sp .4
        !           440: \f1File commands\fP
        !           441: .sp .4
        !           442: _
        !           443: .sp .2
        !           444: b \f2file-list\fP      Set current file to first file in list that \fIsam\fP has in menu
        !           445: B \f2file-list\fP      Same as \f(CWb\fP, but load new files
        !           446: n      Print menu lines of all files
        !           447: D \f2file-list\fP      Delete named files from \fIsam\fP
        !           448: .sp .4
        !           449: _
        !           450: .sp .4
        !           451: \f1I/O commands\fP     
        !           452: .sp .4
        !           453: _
        !           454: .sp .2
        !           455: e \f2filename\fP       Replace file with named disc file
        !           456: r \f2filename\fP       Replace dot by contents of named disc file
        !           457: w \f2filename\fP       Write file to named disc file
        !           458: f \f2filename\fP       Set file name and print new menu line
        !           459: < \f2Unix-command\fP   Replace dot by standard output of command
        !           460: > \f2Unix-command\fP   Send dot to standard input of command
        !           461: | \f2Unix-command\fP   Replace dot by result of command applied to dot
        !           462: ! \f2Unix-command\fP   Run the command
        !           463: .sp .4
        !           464: _
        !           465: .sp .4
        !           466: \f1Loops and conditionals\fP   
        !           467: .sp .4
        !           468: _
        !           469: .sp .2
        !           470: x/\f2regexp\fP/ \f2command\fP  For each match of regexp, set dot and run command
        !           471: y/\f2regexp\fP/ \f2command\fP  Between adjacent matches of regexp, set dot and run command
        !           472: X/\f2regexp\fP/ \f2command\fP  Run command in each file whose menu line matches regexp
        !           473: Y/\f2regexp\fP/ \f2command\fP  Run command in each file whose menu line does not match
        !           474: g/\f2regexp\fP/ \f2command\fP  If dot contains a match of regexp, run command
        !           475: v/\f2regexp\fP/ \f2command\fP  If dot does not contain a match of regexp, run command
        !           476: .sp .4
        !           477: _
        !           478: .sp .4
        !           479: \f1Miscellany\fP       
        !           480: .sp .4
        !           481: _
        !           482: .sp .2
        !           483: k      Set address mark to value of dot
        !           484: q      Quit
        !           485: u \f2n\fP      Undo last \f2n\fP (default 1) changes
        !           486: { }    Braces group commands
        !           487: .sp .3
        !           488: .ft CW
        !           489: _
        !           490: .ft
        !           491: .TE
        !           492: .sp
        !           493: .Tm  n         S
        !           494: .Tm  k         S
        !           495: .Tm  q         S
        !           496: .Tm  p         S
        !           497: .KE
        !           498: .2C
        !           499: .PP
        !           500: The value of dot may be changed by
        !           501: specifying an
        !           502: .I address
        !           503: for the command.
        !           504: The simplest address is a line number:
        !           505: .P1 2n
        !           506: 3
        !           507: .P2
        !           508: refers to the third line of the file, so
        !           509: .P1 2n
        !           510: 3d
        !           511: .P2
        !           512: deletes the third line of the file, and implicitly renumbers
        !           513: the lines so the old line 4 is now numbered 3.
        !           514: (This is one of the few places where
        !           515: .I sam
        !           516: deals with lines directly.)
        !           517: Line
        !           518: .CW 0
        !           519: is the null string at the beginning of the file.
        !           520: If a command consists of only an address, a
        !           521: .CW p
        !           522: command is assumed, so typing an unadorned
        !           523: .CW 3
        !           524: prints line 3 on the terminal.
        !           525: There are a couple of other basic addresses:
        !           526: a period addresses dot itself; and
        !           527: a dollar sign
        !           528: .CW $ ) (
        !           529: addresses the null string at the end of the file.
        !           530: .PP
        !           531: An address is always a single substring of the file.
        !           532: Thus, the address
        !           533: .CW 3
        !           534: addresses the characters
        !           535: after the second newline of
        !           536: the file through the third newline of the file.
        !           537: A
        !           538: .I
        !           539: compound address
        !           540: .R
        !           541: is constructed by the comma operator
        !           542: .P1 2n
        !           543: \f2address1\fP,\f2address2\fP
        !           544: .P2
        !           545: and addresses the substring of the file from the beginning of
        !           546: .I address1
        !           547: to the end of
        !           548: .I address2 .
        !           549: For example, the command
        !           550: .CW 3,5p
        !           551: prints the third through fifth lines of the file and
        !           552: .CW .,$d
        !           553: deletes the text from the beginning of dot to the end of the file.
        !           554: .PP
        !           555: These addresses are all absolute positions in the file, but
        !           556: .I sam
        !           557: also has relative addresses, indicated by
        !           558: .CW +
        !           559: or
        !           560: .CW - .
        !           561: For example,
        !           562: .P1 2n
        !           563: $-3
        !           564: .P2
        !           565: is the third line before the end of the file and
        !           566: .P1 2n
        !           567: \&.+1
        !           568: .P2
        !           569: is the line after dot.
        !           570: If no address appears to the left of the
        !           571: .CW +
        !           572: or
        !           573: .CW - ,
        !           574: dot is assumed;
        !           575: if nothing appears to the right,
        !           576: .CW 1
        !           577: is assumed.
        !           578: Therefore,
        !           579: .CW .+1
        !           580: may be abbreviated to just a plus sign.
        !           581: .PP
        !           582: The
        !           583: .CW +
        !           584: operator acts relative to the end of its first argument, while the
        !           585: .CW -
        !           586: operator acts relative to the beginning.  Thus
        !           587: .CW .+1
        !           588: addresses the first line after dot,
        !           589: .CW .-
        !           590: addresses the first line before dot, and
        !           591: .CW +-
        !           592: refers to the line containing the end of dot.  (Dot may span multiple lines, and
        !           593: .CW +
        !           594: selects the line after the end of dot, then
        !           595: .CW -
        !           596: backs up one line.)
        !           597: .PP
        !           598: The final type of address is a regular expression, which addresses the
        !           599: text matched by the expression.  The expression is enclosed in slashes, as in
        !           600: .P1 2n
        !           601: /\f2expression\fP/
        !           602: .P2
        !           603: The expressions are the same as those in the
        !           604: .UX
        !           605: program
        !           606: .I egrep ,
        !           607: |reference(v7manual)|reference(kernighan pike)
        !           608: and include closures, alternations, and so on.
        !           609: They find the
        !           610: .I
        !           611: leftmost longest
        !           612: .R
        !           613: string that matches the expression, that is,
        !           614: the first match after the point where the search is started,
        !           615: and if more than one match begins at the same spot, the longest such match.
        !           616: (I assume familiarity with the syntax for regular expressions in
        !           617: .UX
        !           618: programs.|reference(bsdmanual))
        !           619: For example,
        !           620: .P1 2n
        !           621: /x/
        !           622: .P2
        !           623: matches the next
        !           624: .CW x
        !           625: character in the file,
        !           626: .P1 2n
        !           627: /xx*/
        !           628: .P2
        !           629: matches the next run of one or more
        !           630: .CW x 's,
        !           631: and
        !           632: .P1 2n
        !           633: /x|Peter/
        !           634: .P2
        !           635: matches the next
        !           636: .CW x
        !           637: or
        !           638: .CW Peter .
        !           639: For compatibility with other
        !           640: .UX
        !           641: programs, the `any character' operator,
        !           642: a period,
        !           643: does not match a newline, so
        !           644: .P1 2n
        !           645: /.*/
        !           646: .P2
        !           647: matches the text from dot to the end of the line, but excludes the newline
        !           648: and so will not match across
        !           649: the line boundary.
        !           650: .PP
        !           651: Regular expressions are always relative addresses.
        !           652: The direction is forwards by default,
        !           653: so
        !           654: .CW /Peter/
        !           655: is really an abbreviation for
        !           656: .CW +/Peter/ .
        !           657: The search can be reversed with a minus sign, so
        !           658: .P1 2n
        !           659: .CW -/Peter/
        !           660: .P2
        !           661: finds the first
        !           662: .CW Peter
        !           663: before dot.
        !           664: Regular expressions may be used with other address forms, so
        !           665: .CW 0+/Peter/
        !           666: finds the first
        !           667: .CW Peter
        !           668: in the file and
        !           669: .CW $-/Peter/
        !           670: finds the last.
        !           671: Table II summarizes
        !           672: .I sam 's
        !           673: addresses.
        !           674: .1C
        !           675: .KF
        !           676: .TS
        !           677: center;
        !           678: c s
        !           679: lfCW l.
        !           680: Table II. \fISam\fP addresses
        !           681: .sp .4
        !           682: .ft CW
        !           683: _
        !           684: .ft
        !           685: .sp .4
        !           686: \f1Simple addresses\fP 
        !           687: .sp .4
        !           688: _
        !           689: .sp .2
        !           690: #\f2n\fP       The empty string after character \f2n\fP
        !           691: \f2n\fP        Line \f2n\fP.
        !           692: /\f2regexp\fP/ The first following match of the regular expression
        !           693: -/\f2regexp\fP/        The first previous match of the regular expression
        !           694: $      The null string at the end of the file
        !           695: \&.    Dot
        !           696: \&'    The address mark, set by \f(CWk\fP command
        !           697: "\f2regexp\fP" Dot in the file whose menu line matches regexp
        !           698: .sp .4
        !           699: _
        !           700: .sp .4
        !           701: \f1Compound addresses\fP       
        !           702: .sp .4
        !           703: _
        !           704: .sp .2
        !           705: \f2a1\fP+\f2a2\fP      The address \f2a2\fP evaluated starting at right of \f2a1\fP
        !           706: \f2a1\fP-\f2a2\fP      \f2a2\fP evaluated in the reverse direction starting at left of \f2a1\fP
        !           707: \f2a1\fP,\f2a2\fP      From the left of \f2a1\fP to the right of \f2a2\fP (default \f(CW0,$\fP)
        !           708: \f2a1\fP;\f2a2\fP      Like \f(CW,\fP but sets dot after evaluating \f2a1\fP
        !           709: .sp .4
        !           710: _
        !           711: .sp .4
        !           712: .T&
        !           713: c s.
        !           714: T{
        !           715: The operators
        !           716: .CW +
        !           717: and
        !           718: .CW -
        !           719: are high precedence, while
        !           720: .CW ,
        !           721: and
        !           722: .CW ;
        !           723: are low precedence.
        !           724: In both
        !           725: .CW +
        !           726: and
        !           727: .CW -
        !           728: forms,
        !           729: .I a2
        !           730: defaults to 1 and
        !           731: .I a1
        !           732: defaults to dot.
        !           733: If both
        !           734: .I a1
        !           735: and
        !           736: .I a2
        !           737: are present,
        !           738: .CW +
        !           739: may be elided.
        !           740: T}
        !           741: .sp .5
        !           742: .ft CW
        !           743: _
        !           744: .ft
        !           745: .TE
        !           746: .sp
        !           747: .KE
        !           748: .2C
        !           749: .PP
        !           750: The language discussed so far will not seem novel
        !           751: to people who use
        !           752: .UX
        !           753: text editors
        !           754: such as
        !           755: .I ed
        !           756: or
        !           757: .I vi .
        !           758: |reference(bsdmanual)
        !           759: Moreover, the kinds of editing operations these commands allow, with the exception
        !           760: of regular expressions and line numbers,
        !           761: are clearly more conveniently handled by a mouse-based interface.
        !           762: Indeed,
        !           763: .I sam 's
        !           764: mouse language (discussed at length below) is the means by which
        !           765: simple changes are usually made.
        !           766: For large or repetitive changes, however, a textual language
        !           767: outperforms a manual interface.
        !           768: .PP
        !           769: Imagine that, instead of deleting just one occurrence of the string
        !           770: .CW Peter ,
        !           771: we wanted to eliminate every
        !           772: .CW Peter .
        !           773: What's needed is an iterator that runs a command for each occurrence of some
        !           774: text.
        !           775: .I Sam 's
        !           776: iterator is called
        !           777: .CW x ,
        !           778: for extract:
        !           779: .P1 2n
        !           780: x/\f2expression\fP/ \f2command\fP
        !           781: .P2
        !           782: finds all matches in dot of the specified expression, and for each
        !           783: such match, sets dot to the text matched and runs the command.
        !           784: So to delete all the
        !           785: .CW Peters:
        !           786: .P1 2n
        !           787: 0,$ x/Peter/ d
        !           788: .P2
        !           789: (Blanks in these examples are to improve readability;
        !           790: .I sam
        !           791: neither requires nor interprets them.)
        !           792: This searches the entire file
        !           793: .CW 0,$ ) (
        !           794: for occurrences of the string
        !           795: .CW Peter ,
        !           796: and runs the
        !           797: .CW d
        !           798: command with dot set to each such occurrence.
        !           799: (By contrast, the comparable
        !           800: .I ed
        !           801: command would delete all
        !           802: .I lines
        !           803: containing
        !           804: .CW Peter ;
        !           805: .I sam
        !           806: deletes only the
        !           807: .CW Peters .)
        !           808: The address
        !           809: .CW 0,$
        !           810: is commonly used, and may be abbreviated to just a comma.
        !           811: As another example,
        !           812: .P1 2n
        !           813: , x/Peter/ p
        !           814: .P2
        !           815: prints a list of
        !           816: .CW Peters,
        !           817: one for each appearance in the file, with no intervening text (not even newlines
        !           818: to separate the instances).
        !           819: .PP
        !           820: Of course, the text extracted by
        !           821: .CW x
        !           822: may be selected by a regular expression,
        !           823: which complicates deciding what set of matches is chosen \(em
        !           824: matches may overlap.  This is resolved by generating the matches
        !           825: starting from the beginning of dot using the leftmost-longest rule,
        !           826: and searching for each match starting from the end of the previous one.
        !           827: Regular expressions may also match null strings, but a null match
        !           828: adjacent to a non-null match is never selected; at least one character
        !           829: must intervene.
        !           830: For example,
        !           831: .P1 2n
        !           832: , c/AAA/
        !           833: x/B*/ c/-/
        !           834: , p
        !           835: .P2
        !           836: produces as output
        !           837: .P1 2n
        !           838: -A-A-A-
        !           839: .P2
        !           840: because the pattern
        !           841: .CW B*
        !           842: matches the null strings separating the
        !           843: .CW A 's.
        !           844: .PP
        !           845: The
        !           846: .CW x
        !           847: command has a complement,
        !           848: .CW y ,
        !           849: with similar syntax, that executes the command with dot set to the text
        !           850: .I between
        !           851: the matches of the expression.
        !           852: For example,
        !           853: .P1 2n
        !           854: , c/AAA/
        !           855: y/A/ c/-/
        !           856: , p
        !           857: .P2
        !           858: produces the same result as the example above.
        !           859: .PP
        !           860: The
        !           861: .CW x
        !           862: and
        !           863: .CW y
        !           864: commands are looping constructs, and
        !           865: .I sam
        !           866: has a pair of conditional commands to go with them.
        !           867: They have similar syntax:
        !           868: .P1 2n
        !           869: g/\f2expression\fP/ \f2command\fP
        !           870: .P2
        !           871: (guard)
        !           872: runs the command exactly once if dot contains a match of the expression.
        !           873: This is different from
        !           874: .CW x ,
        !           875: which runs the command for
        !           876: .I each
        !           877: match:
        !           878: .CW x
        !           879: loops;
        !           880: .CW g
        !           881: merely tests, without changing the value of dot.
        !           882: Thus,
        !           883: .P1 2n
        !           884: , x/Peter/ d
        !           885: .P2
        !           886: deletes all occurrences of
        !           887: .CW Peter ,
        !           888: but
        !           889: .P1 2n
        !           890: , g/Peter/ d
        !           891: .P2
        !           892: deletes the whole file (reduces it to a null string) if
        !           893: .CW Peter
        !           894: occurs anywhere in the text.
        !           895: The complementary conditional is
        !           896: .CW v ,
        !           897: which runs the command if there is
        !           898: .I no
        !           899: match of the expression.
        !           900: .PP
        !           901: These control-structure-like commands may be composed to construct more
        !           902: involved operations.  For example, to print those lines of text that
        !           903: contain the string
        !           904: .CW Peter :
        !           905: .P1 2n
        !           906: , x/.*\en/ g/Peter/ p
        !           907: .P2
        !           908: The
        !           909: .CW x
        !           910: breaks the file into lines, the
        !           911: .CW g
        !           912: selects those lines containing
        !           913: .CW Peter ,
        !           914: and the
        !           915: .CW p
        !           916: prints them.
        !           917: This command gives an address for the
        !           918: .CW x
        !           919: command (the whole file), but because
        !           920: .CW g
        !           921: does not have an explicit address, it applies to the value of
        !           922: dot produced by the
        !           923: .CW x
        !           924: command, that is, to each line.
        !           925: All commands in
        !           926: .I sam
        !           927: except for the command to write a file to disc use dot for the
        !           928: default address.
        !           929: .PP
        !           930: Composition may be continued indefinitely.
        !           931: .P1 2n
        !           932: , x/.*\en/ g/Peter/ v/SaltPeter/ p
        !           933: .P2
        !           934: prints those lines containing
        !           935: .CW Peter
        !           936: but
        !           937: .I not
        !           938: those containing
        !           939: .CW SaltPeter .
        !           940: .NH 2
        !           941: Structural Regular Expressions
        !           942: .LP
        !           943: Unlike other
        !           944: .UX
        !           945: text editors,
        !           946: including the non-interactive ones such as
        !           947: .I sed
        !           948: and
        !           949: .I awk ,
        !           950: |reference(kernighan pike)
        !           951: .I sam
        !           952: is good for manipulating files with multi-line `records.'
        !           953: An example is an on-line phone book composed of records,
        !           954: separated by blank lines, of the form
        !           955: .P1 2n
        !           956: Herbert Tic
        !           957: 44 Turnip Ave., Endive, NJ
        !           958: 201-5555642
        !           959: 
        !           960: Norbert Twinge
        !           961: 16 Potato St., Cabbagetown, NJ
        !           962: 201-5553145
        !           963: 
        !           964: \&...
        !           965: .P2
        !           966: The format may be encoded as a regular expression:
        !           967: .P1 2n
        !           968: (.+\en)+
        !           969: .P2
        !           970: that is, a sequence of one or more non-blank lines.
        !           971: The command to print Mr. Tic's entire record is then
        !           972: .P1 2n
        !           973: , x/(.+\en)+/ g/^Herbert Tic$/ p
        !           974: .P2
        !           975: and that to extract just the phone number is
        !           976: .P1 2n
        !           977: , x/(.+\en)+/ g/^Herbert Tic$/
        !           978:        x/^[0-9]*-[0-9]*\en/ p
        !           979: .P2
        !           980: The latter command breaks the file into records,
        !           981: chooses Mr. Tic's record,
        !           982: extracts the phone number from the record,
        !           983: and finally prints the number.
        !           984: .PP
        !           985: A more involved problem is that of
        !           986: renaming a particular variable, say
        !           987: .CW n ,
        !           988: to
        !           989: .CW num
        !           990: in a C program.
        !           991: The obvious first attempt,
        !           992: .P1 2n
        !           993: , x/n/ c/num/
        !           994: .P2
        !           995: is badly flawed: it changes not only the variable
        !           996: .CW n
        !           997: but any letter
        !           998: .CW n
        !           999: that appears.
        !          1000: We need to extract all the variables, and select those that match
        !          1001: .CW n
        !          1002: and only
        !          1003: .CW n :
        !          1004: .P1 2n
        !          1005: , x/[A-Za-z_][A-Za-z_0-9]*/ g/n/ v/../ c/num/
        !          1006: .P2
        !          1007: The pattern
        !          1008: .CW [A-Za-z_][A-Za-z_0-9]*
        !          1009: matches C identifiers.
        !          1010: Next
        !          1011: .CW g/n/
        !          1012: selects those containing an
        !          1013: .CW n .
        !          1014: Then
        !          1015: .CW v/../
        !          1016: rejects those containing two (or more) characters, and finally
        !          1017: .CW c/num/
        !          1018: changes the remainder (identifiers
        !          1019: .CW n )
        !          1020: to
        !          1021: .CW num .
        !          1022: This version clearly works much better, but there may still be problems.
        !          1023: For example, in C character and string constants, the sequence
        !          1024: .CW \en
        !          1025: is interpreted as a newline character, and we don't want to change it to
        !          1026: .CW \enum.
        !          1027: This problem can be forestalled with a
        !          1028: .CW y
        !          1029: command
        !          1030: (the following two examples are split onto multiple lines for readability):
        !          1031: .P1 2n
        !          1032: , y/\e\en/ x/[A-Za-z_][A-Za-z_0-9]*/
        !          1033:        g/n/ v/../ c/num/
        !          1034: .P2
        !          1035: (the second
        !          1036: .CW \e
        !          1037: is necessary because of lexical conventions in regular expressions),
        !          1038: or we could even reject character constants and strings outright:
        !          1039: .P1 2n
        !          1040: , y/'[^']*'/ y/"[^"]*"/
        !          1041:        x/[A-Za-z_][A-Za-z_0-9]*/
        !          1042:                g/n/ v/../ c/num/
        !          1043: .P2
        !          1044: The
        !          1045: .CW y
        !          1046: commands in this version exclude from consideration all character constants
        !          1047: and strings.
        !          1048: The only remaining problem is to deal with the possible occurrence of
        !          1049: .CW \e'
        !          1050: or
        !          1051: .CW \e"
        !          1052: within these sequences, but it's easy to see how to resolve this difficulty.
        !          1053: .PP
        !          1054: The point of these composed commands is successive refinement.
        !          1055: A simple version of the command is tried, and if it's not good enough,
        !          1056: it can be honed by adding a clause or two.
        !          1057: (Mistakes can be undone; see below.
        !          1058: Also, the mouse language makes it unnecessary to retype the command each time.)
        !          1059: The resulting chains of commands are somewhat reminiscent of
        !          1060: shell pipelines.|reference(kernighan pike)
        !          1061: Unlike pipelines, though, which pass along modified
        !          1062: .I data ,
        !          1063: .I sam
        !          1064: commands pass a
        !          1065: .I view
        !          1066: of the data.
        !          1067: The text at each step of the command is the same, but which pieces
        !          1068: are selected is refined step by step until the correct piece is
        !          1069: available to the final step of the command line, which ultimately makes the change.
        !          1070: .PP
        !          1071: In other
        !          1072: .UX
        !          1073: programs, regular expressions are used only for selection,
        !          1074: as in the
        !          1075: .I sam
        !          1076: .CW g
        !          1077: command, never for extraction as in the
        !          1078: .CW x
        !          1079: or
        !          1080: .CW y
        !          1081: command.
        !          1082: For example, patterns in
        !          1083: .I awk
        !          1084: |reference(kernighan pike)
        !          1085: are used to select lines to be operated on, but cannot be used
        !          1086: to describe the format of the input text, or to handle newline-free text.
        !          1087: The use of regular expressions to describe the structure of a piece
        !          1088: of text rather than its contents, as in the
        !          1089: .CW x
        !          1090: command, 
        !          1091: has been given a name:
        !          1092: .I
        !          1093: structural regular expressions.
        !          1094: .R
        !          1095: When they are composed, as in the above example,
        !          1096: they are pleasantly expressive.
        !          1097: Their use is discussed at greater length elsewhere.|reference(pike structural)
        !          1098: .NH 2
        !          1099: Multiple files
        !          1100: .LP
        !          1101: .I Sam
        !          1102: has a few other commands, mostly relating to input and output.
        !          1103: .P1 2n
        !          1104: e discfilename
        !          1105: .P2
        !          1106: replaces the contents and name of the current file with those of the named
        !          1107: disc file;
        !          1108: .P1 2n
        !          1109: w discfilename
        !          1110: .P2
        !          1111: writes the contents to the named disc file; and
        !          1112: .P1 2n
        !          1113: r discfilename
        !          1114: .P2
        !          1115: replaces dot with the contents of the named disc file.
        !          1116: All these commands use the current file's name if none is specified.
        !          1117: Finally,
        !          1118: .P1 2n
        !          1119: f discfilename
        !          1120: .P2
        !          1121: changes the name associated with the file and displays the result:
        !          1122: .P1 2n
        !          1123: \&'-. discfilename
        !          1124: .P2
        !          1125: This output is called the file's
        !          1126: .I
        !          1127: menu line,
        !          1128: .R
        !          1129: because it is the contents of the file's line in the button 3 menu (described
        !          1130: in the
        !          1131: next section).
        !          1132: The first three characters are a concise notation for the state of the file.
        !          1133: The apostrophe signifies that the file is modified.
        !          1134: The minus sign indicates the number of windows
        !          1135: open on the file (see the next section):
        !          1136: .CW -
        !          1137: means none,
        !          1138: .CW +
        !          1139: means one, and
        !          1140: .CW *
        !          1141: means more than one.
        !          1142: Finally, the period indicates that this is the current file.
        !          1143: These characters are useful for controlling the
        !          1144: .CW X
        !          1145: command, described shortly.
        !          1146: .PP
        !          1147: .I Sam
        !          1148: may be started with a set of disc files (such as all the source for
        !          1149: a program) by invoking it with a list of file names as arguments, and
        !          1150: more may be added or deleted on demand.
        !          1151: .P1 2n
        !          1152: B discfile1 discfile2 ...
        !          1153: .P2
        !          1154: adds the named files to
        !          1155: .I sam 's
        !          1156: list, and
        !          1157: .P1 2n
        !          1158: D discfile1 discfile2 ...
        !          1159: .P2
        !          1160: removes them from
        !          1161: .I sam 's
        !          1162: memory (without effect on associated disc files).
        !          1163: Both these commands have a syntax for using the shell|reference(kernighan pike)
        !          1164: (the
        !          1165: .UX
        !          1166: command interpreter) to generate the lists:
        !          1167: .P1 2n
        !          1168: B <echo *.c
        !          1169: .P2
        !          1170: will add all C source files, and
        !          1171: .P1 2n
        !          1172: B <grep -l variable *.c
        !          1173: .P2
        !          1174: will add all C source files referencing a particular variable
        !          1175: (the
        !          1176: .UX
        !          1177: command
        !          1178: .CW grep\ -l
        !          1179: lists all files in its arguments that contain matches of
        !          1180: the specified regular expression).
        !          1181: Finally,
        !          1182: .CW D
        !          1183: without arguments deletes the current file.
        !          1184: .PP
        !          1185: There are two ways to change which file is current:
        !          1186: .P1 2n
        !          1187: b filename
        !          1188: .P2
        !          1189: makes the named file current.
        !          1190: The
        !          1191: .CW B
        !          1192: command
        !          1193: does the same, but also adds any new files to
        !          1194: .I sam 's
        !          1195: list.
        !          1196: (In practice, of course, the current file
        !          1197: is usually chosen by mouse actions, not by textual commands.)
        !          1198: The other way is to use a form of address that refers to files:
        !          1199: .P1 2n
        !          1200: "\f2expression\fP" \f2address\fP
        !          1201: .P2
        !          1202: refers to the address evaluated in the file whose menu line
        !          1203: matches the expression (there must be exactly one match).
        !          1204: For example,
        !          1205: .P1 2n
        !          1206: "peter.c" 3
        !          1207: .P2
        !          1208: refers to the third line of the file whose name matches
        !          1209: .CW peter.c .
        !          1210: This is most useful in the move
        !          1211: .CW m ) (
        !          1212: and copy
        !          1213: .CW t ) (
        !          1214: commands:
        !          1215: .P1 2n
        !          1216: 0,$ t "peter.c" 0
        !          1217: .P2
        !          1218: makes a copy of the current file at the beginning of
        !          1219: .CW peter.c .
        !          1220: .PP
        !          1221: The
        !          1222: .CW X
        !          1223: command
        !          1224: is a looping construct, like
        !          1225: .CW x ,
        !          1226: that refers to files instead of strings:
        !          1227: .P1 2n
        !          1228: X/\f2expression\fP/ \f2command\fP
        !          1229: .P2
        !          1230: runs the command in all
        !          1231: files whose menu lines match the expression.  The best example is
        !          1232: .P1 2n
        !          1233: X/'/ w
        !          1234: .P2
        !          1235: which writes to disc all modified files.
        !          1236: .CW Y
        !          1237: is the complement of
        !          1238: .CW X :
        !          1239: it runs the command on all files whose menu lines don't match the expression:
        !          1240: .P1 2n
        !          1241: Y/\e.c/ D
        !          1242: .P2
        !          1243: deletes all files that don't have
        !          1244: .CW \&.c
        !          1245: in their names, that is, it keeps all C source files and deletes the rest.
        !          1246: .PP
        !          1247: Braces allow commands to be grouped, so
        !          1248: .P1 2n
        !          1249: {
        !          1250:        \f2command1\fP
        !          1251:        \f2command2\fP
        !          1252: }
        !          1253: .P2
        !          1254: is syntactically a single command that runs two commands.
        !          1255: Thus,
        !          1256: .P1 2n
        !          1257: X/\e.c/ ,g/variable/ {
        !          1258:        f
        !          1259:        , x/.*\en/ g/variable/ p
        !          1260: }
        !          1261: .P2
        !          1262: finds all occurrences of
        !          1263: .CW variable
        !          1264: in C source files, and prints
        !          1265: out the file names and lines of each match.
        !          1266: The precise semantics of compound operations is discussed in the implementation
        !          1267: sections below.
        !          1268: .PP
        !          1269: Finally,
        !          1270: the undo command,
        !          1271: .CW u ,
        !          1272: undoes the last command,
        !          1273: no matter how many files were affected.
        !          1274: Multiple undo operations move further back in time, so
        !          1275: .P1 2n
        !          1276: u
        !          1277: u
        !          1278: .P2
        !          1279: (which may be abbreviated
        !          1280: .CW u2 )
        !          1281: undoes the last two commands.  An undo may not be undone, however, nor
        !          1282: may any command that adds or deletes files.
        !          1283: Everything else is undoable, though, including for example
        !          1284: .CW e
        !          1285: commands:
        !          1286: .P1 2n
        !          1287: e filename
        !          1288: u
        !          1289: .P2
        !          1290: restores the state of the file completely, including its name, dot,
        !          1291: and modified bit.  Because of the undo, potentially dangerous commands
        !          1292: are not guarded by confirmations.  Only
        !          1293: .CW D ,
        !          1294: which destroys the information necessary to restore itself, is protected.
        !          1295: It will not delete a modified file, but a second
        !          1296: .CW D
        !          1297: of the same file will succeed regardless.
        !          1298: The
        !          1299: .CW q
        !          1300: command, which exits
        !          1301: .I sam ,
        !          1302: is similarly guarded.
        !          1303: .NH 2
        !          1304: Mouse Interface
        !          1305: .LP
        !          1306: .I Sam
        !          1307: is most commonly run
        !          1308: connected to a bitmap display and mouse for interactive editing.
        !          1309: The only difference in the command language
        !          1310: between regular, mouse-driven
        !          1311: .I sam
        !          1312: and
        !          1313: .CW sam\ -d
        !          1314: is that if an address
        !          1315: is provided without a command,
        !          1316: .CW sam\ -d
        !          1317: will print the text referenced by the address, but
        !          1318: regular
        !          1319: .I sam
        !          1320: will highlight it on the screen \(em in fact,
        !          1321: dot is always highlighted (see Figure 2).
        !          1322: .WS 1
        !          1323: .1C
        !          1324: .KF
        !          1325: .IB fig2.ps 1.47i 4.61i c
        !          1326: .Cs
        !          1327: Figure 2. A
        !          1328: .I sam
        !          1329: window.  The scroll bar down the left
        !          1330: represents the file, with the bubble showing the fraction
        !          1331: visible in the window.
        !          1332: The scroll bar may be manipulated by the mouse for convenient browsing.
        !          1333: The current text,
        !          1334: which is highlighted, need not fit on a line.  Here it consists of one partial
        !          1335: line, one complete line, and a final partial line.
        !          1336: .Ce
        !          1337: .KE
        !          1338: .2C
        !          1339: .PP
        !          1340: Each file may have zero or more windows open on the display.
        !          1341: At any time, only one window in all of
        !          1342: .I sam
        !          1343: is the
        !          1344: .I
        !          1345: current window,
        !          1346: .R
        !          1347: that is, the window to which typing and mouse actions refer;
        !          1348: this may be the
        !          1349: .I sam
        !          1350: window (that in which commands may be typed)
        !          1351: or one of the file windows.
        !          1352: When a file has multiple windows, the image of the file in each window
        !          1353: is always kept up to date.
        !          1354: The current file is the last file affected by a command,
        !          1355: so if the
        !          1356: .I sam
        !          1357: window is current,
        !          1358: the current window is not a window on the current file.
        !          1359: However, each window on a file has its own value of dot,
        !          1360: and when switching between windows on a single file,
        !          1361: the file's value of dot is changed to that of the window.
        !          1362: Thus, flipping between windows behaves in the obvious, convenient way.
        !          1363: .PP
        !          1364: The mouse on the Blit has three buttons, numbered left to right.
        !          1365: Button 3 has a list of commands to manipulate windows,
        !          1366: followed by a list of `menu lines' exactly as printed by the
        !          1367: .CW f
        !          1368: command, one per file (not one per window).
        !          1369: These menu lines are sorted by file name.
        !          1370: If the list is long, the Blit menu software will make it more manageable
        !          1371: by generating a scrolling menu instead of an unwieldy long list.
        !          1372: Using the menu to select a file from the list makes that file the current
        !          1373: file, and the most recently current window in that file the current window.
        !          1374: But if that file is already current, selecting it in the menu cycles through
        !          1375: the windows on the file; this simple trick avoids a special menu to
        !          1376: choose windows on a file.
        !          1377: If there is no window open on the file,
        !          1378: .I sam
        !          1379: changes the mouse cursor to prompt the user to create one.
        !          1380: .PP
        !          1381: The commands on the button 3 menu are straightforward (see Figure 3), and
        !          1382: are like the commands to manipulate windows in
        !          1383: .I mux ,
        !          1384: |reference(latest volume1)
        !          1385: the Blit's window system.
        !          1386: .CW new
        !          1387: makes a new file, and gives it one empty window, whose size is determined
        !          1388: by a rectangle swept by the mouse.
        !          1389: .CW xerox
        !          1390: prompts for a window to be selected, and
        !          1391: makes a clone of that window; this is how multiple windows are created on one file.
        !          1392: .CW reshape
        !          1393: changes the size of the indicated window, and
        !          1394: .CW close
        !          1395: deletes it.  If that is the last window open on the file,
        !          1396: .CW close
        !          1397: first does a
        !          1398: .CW D
        !          1399: command on the file.
        !          1400: .CW write
        !          1401: is identical to a
        !          1402: .CW w
        !          1403: command on the file; it is in the menu purely for convenience.
        !          1404: Finally,
        !          1405: .CW ~~sam~~
        !          1406: is a menu item that appears between the commands and the file names.
        !          1407: Selecting it makes the
        !          1408: .I sam
        !          1409: window the current window,
        !          1410: causing subsequent typing to be interpreted as commands.
        !          1411: .KF
        !          1412: .IB fig3.ps 1.98i 1.39i c
        !          1413: .Cs
        !          1414: Figure 3. The menu on button 3.
        !          1415: The black rectangle on the left is a scroll bar; the menu is limited to
        !          1416: the length shown to prevent its becoming unwieldy.
        !          1417: Above the
        !          1418: .CW ~~sam~~
        !          1419: line is a list of commands;
        !          1420: beneath it is a list of files, presented exactly as with the
        !          1421: .CW f
        !          1422: command.
        !          1423: .Ce
        !          1424: .KE
        !          1425: .PP
        !          1426: When
        !          1427: .I sam
        !          1428: requests that a window be swept, in response to
        !          1429: .CW new ,
        !          1430: .CW xerox
        !          1431: or
        !          1432: .CW reshape ,
        !          1433: it changes the mouse cursor from the usual arrow to a box with
        !          1434: a small arrow.
        !          1435: In this state, the mouse may be used to indicate an arbitrary rectangle by
        !          1436: pressing button 3 at one corner and releasing it at the opposite corner.
        !          1437: More conveniently,
        !          1438: button 3 may simply be clicked,
        !          1439: whereupon
        !          1440: .I sam
        !          1441: creates the maximal rectangle that contains the cursor
        !          1442: and abuts the
        !          1443: .I sam
        !          1444: window.
        !          1445: By placing the
        !          1446: .I sam
        !          1447: window in the middle of the screen, the user can define two regions (one above,
        !          1448: one below) in which stacked fully-overlapping
        !          1449: windows can be created with minimal fuss (see Figure 1).
        !          1450: This simple user interface trick makes window creation noticeably easier.
        !          1451: .PP
        !          1452: The cut-and-paste editor is essentially the same as that in Smalltalk-80.
        !          1453: |reference(smalltalk80 goldberg)
        !          1454: The text in dot is always highlighted on the screen.
        !          1455: When a character is typed it replaces dot, and sets dot to the null
        !          1456: string after the character.  Thus, ordinary typing inserts text.
        !          1457: Button 1 is used for selection:
        !          1458: pressing the button, moving the mouse, and lifting the button
        !          1459: selects (sets dot to) the text between the points where the
        !          1460: button was pressed and released.
        !          1461: Pressing and releasing at the same point selects a null string; this
        !          1462: is called clicking.  Clicking twice quickly, or
        !          1463: .I
        !          1464: double clicking,
        !          1465: .R
        !          1466: selects larger objects;
        !          1467: for example, double clicking in a word selects the word,
        !          1468: double clicking just inside an opening bracket selects the text
        !          1469: contained in the brackets (handling nested brackets correctly),
        !          1470: and similarly for
        !          1471: parentheses, quotes, and so on.
        !          1472: The double-clicking rules reflect a bias toward
        !          1473: programmers.
        !          1474: If
        !          1475: .I sam
        !          1476: were intended more for word processing, double-clicks would probably
        !          1477: select linguistic structures such as sentences.
        !          1478: .PP
        !          1479: If button 1 is pressed outside the current window, it makes the indicated
        !          1480: window current.
        !          1481: This is the easiest way to switch between windows and files.
        !          1482: .PP
        !          1483: Pressing button 2 brings up a menu of editing functions (see Figure 4).
        !          1484: These mostly apply to the selected text:
        !          1485: .CW cut
        !          1486: deletes the selected text, and remembers it in a hidden buffer called the
        !          1487: .I
        !          1488: snarf buffer,
        !          1489: .R
        !          1490: .CW paste
        !          1491: replaces the selected text by the contents of the snarf buffer,
        !          1492: .CW snarf
        !          1493: just copies the selected text to the snarf buffer,
        !          1494: .CW look
        !          1495: searches forward for the next literal occurrence of the selected text, and
        !          1496: .CW <mux>
        !          1497: exchanges snarf buffers with the window system in which
        !          1498: .I sam
        !          1499: is running.
        !          1500: Finally, the last regular expression used appears as a menu entry
        !          1501: to search
        !          1502: forward for the next occurrence of a match for the expression.
        !          1503: .WS 1
        !          1504: .KF
        !          1505: .IB fig4.ps 0.87i 0.81i c
        !          1506: .Cs
        !          1507: Figure 4. The menu on button 2.
        !          1508: The bottom entry tracks the most recently used regular expression, which may
        !          1509: be literal text.
        !          1510: .Ce
        !          1511: .KE
        !          1512: .PP
        !          1513: The relationship between the command language and the mouse language is
        !          1514: entirely due to the equality of dot and the selected text chosen
        !          1515: with button 1 on the mouse.
        !          1516: For example, to make a set of changes in a C subroutine, dot can be
        !          1517: set by double clicking on the left brace that begins the subroutine,
        !          1518: which sets dot for the command language.
        !          1519: An address-free command then typed in the
        !          1520: .I sam
        !          1521: window will apply only to the text between the opening and closing
        !          1522: braces of the function.
        !          1523: The idea is to select what you want, and then say what you want
        !          1524: to do with it, whether invoked by a menu selection or by a typed command.
        !          1525: And of course, the value of dot is highlighted on
        !          1526: the display after the command completes.
        !          1527: This relationship between mouse interface and command language
        !          1528: is clumsy to explain, but comfortable, even natural, in practice.
        !          1529: .NH
        !          1530: The Implementation
        !          1531: .LP
        !          1532: The next few sections describe how
        !          1533: .I sam
        !          1534: is put together, first the host part,
        !          1535: then the inter-component communication,
        !          1536: then the terminal part.
        !          1537: After explaining how the command language is implemented,
        !          1538: the discussion follows (roughly) the path of a character
        !          1539: from the temporary file on disc to the screen.
        !          1540: The presentation centers on the data structures,
        !          1541: because that is how the program was designed and because
        !          1542: the algorithms are easy to provide, given the right data
        !          1543: structures.
        !          1544: ........
        !          1545: .NH 2
        !          1546: Parsing and execution
        !          1547: .LP
        !          1548: The command language is interpreted by parsing each command with a
        !          1549: table-driven recursive
        !          1550: descent parser, and when a complete command is assembled, invoking a top-down
        !          1551: executor.
        !          1552: Most editors instead employ a simple character-at-a-time
        !          1553: lexical scanner.
        !          1554: Use of a parser makes it
        !          1555: easy and unambiguous to detect when a command is complete,
        !          1556: which has two advantages.
        !          1557: First, escape conventions such as backslashes to quote
        !          1558: multiple-line commands are unnecessary;  if the command isn't finished,
        !          1559: the parser keeps reading.  For example, a multiple-line append driven by an
        !          1560: .CW x
        !          1561: command is straightforward:
        !          1562: .P1 2n
        !          1563: x/.*\en/ g/Peter/ a
        !          1564: one line about Peter
        !          1565: another line about Peter
        !          1566: \&.
        !          1567: .P2
        !          1568: Other
        !          1569: .UX
        !          1570: editors would require a backslash after all but the last line.
        !          1571: .PP
        !          1572: The other advantage is specific to the two-process structure of
        !          1573: .I sam .
        !          1574: The host process must decide when a command is completed so the
        !          1575: command interpreter can be called.  This problem is easily resolved
        !          1576: by having the lexical analyzer read the single stream of events from the
        !          1577: terminal, directly executing all typing and mouse commands,
        !          1578: but passing to the parser characters typed to the
        !          1579: .I sam
        !          1580: command window.
        !          1581: This scheme is slightly complicated by the availability of cut-and-paste
        !          1582: editing in the
        !          1583: .I sam
        !          1584: window, but that difficulty is resolved by applying the rules
        !          1585: used in
        !          1586: .I mux :
        !          1587: when a newline is typed to the
        !          1588: .I sam
        !          1589: window, all text between the newline and the previously typed newline
        !          1590: is made available to the parser.
        !          1591: This permits arbitrary editing to be done to a command before
        !          1592: typing newline and thereby requesting execution.
        !          1593: .PP
        !          1594: The parser is driven by a table because the syntax of addresses
        !          1595: and commands is regular enough
        !          1596: to be encoded compactly.  There are few special cases, such as the
        !          1597: replacement text in a substitution, so the syntax of almost all commands
        !          1598: can be encoded with a few flags.
        !          1599: These include whether the command allows an address (for example,
        !          1600: .CW e
        !          1601: does not), whether it takes a regular expression (as in
        !          1602: .CW x
        !          1603: and
        !          1604: .CW s ),
        !          1605: whether it takes replacement text (as in
        !          1606: .CW c
        !          1607: or
        !          1608: .CW i ),
        !          1609: which may be multi-line, and so on.
        !          1610: The internal syntax of regular expressions is handled by a separate
        !          1611: parser; a regular expression is a leaf of the command parse tree.
        !          1612: Regular expressions are discussed fully in the next section.
        !          1613: .PP
        !          1614: The parser table also has information about defaults, so the interpreter
        !          1615: is always called with a complete tree.  For example, the parser fills in
        !          1616: the implicit
        !          1617: .CW 0
        !          1618: and
        !          1619: .CW $
        !          1620: in the abbreviated address
        !          1621: .CW ,
        !          1622: (comma),
        !          1623: inserts a
        !          1624: .CW +
        !          1625: to the left of an unadorned regular expression in an address,
        !          1626: and provides the usual default address
        !          1627: .CW .
        !          1628: (dot) for commands that expect an address but are not given one.
        !          1629: .PP
        !          1630: Once a complete command is parsed, the evaluation is easy.
        !          1631: The address is evaluated left-to-right starting from the value of dot,
        !          1632: with a mostly ordinary expression evaluator.
        !          1633: Addresses, like many of the data structures in
        !          1634: .I sam ,
        !          1635: are held in a C structure and passed around by value:
        !          1636: .P1 2n
        !          1637: 
        !          1638: typedef long Posn;     /* file position */
        !          1639: typedef struct Range{
        !          1640:        Posn p1, p2;
        !          1641: }Range;
        !          1642: typedef struct Address{
        !          1643:        Range r;
        !          1644:        File *f;
        !          1645: }Address;
        !          1646: .P2
        !          1647: .1C
        !          1648: .KF top
        !          1649: .SP 1
        !          1650: .P1 10n
        !          1651: Address
        !          1652: address(ap, a, sign)
        !          1653:        Addrtree *ap;
        !          1654:        Address a;
        !          1655:        int sign;
        !          1656: {
        !          1657:        Address a2;
        !          1658:        do
        !          1659:                switch(ap->type){
        !          1660:                case '.':
        !          1661:                        a=a.f->dot;
        !          1662:                        break;
        !          1663:                case '$':
        !          1664:                        a.r.p1=a.r.p2=a.f->nbytes;
        !          1665:                        break;
        !          1666:                case '"':       
        !          1667:                        a=matchfile(a, ap->aregexp)->dot; 
        !          1668:                        break;
        !          1669:                case ',':
        !          1670:                        a2=address(ap->right, a, 0);
        !          1671:                        a=address(ap->left, a, 0);
        !          1672:                        if(a.f!=a2.f || a2.r.p2<a.r.p1)
        !          1673:                                error(Eorder);
        !          1674:                        a.r.p2=a2.r.p2;
        !          1675:                        return a;
        !          1676:                /* and so on */
        !          1677:                }
        !          1678:        while((ap=ap->right)!=0);
        !          1679:        return a;
        !          1680: }
        !          1681: .P2
        !          1682: .ce
        !          1683: \fBDisplay 1.\fP The address interpreter
        !          1684: .KE
        !          1685: .2C
        !          1686: An address is encoded as a substring (character positions
        !          1687: .CW p1
        !          1688: to
        !          1689: .CW p2 )
        !          1690: in a file
        !          1691: .CW f .
        !          1692: (The data type
        !          1693: .CW File
        !          1694: is described in detail below.)
        !          1695: .PP
        !          1696: The address interpreter (shown in Display 1) is an
        !          1697: .CW Address -valued
        !          1698: function that traverses the parse tree describing an address (the
        !          1699: parse tree for the address has type
        !          1700: .CW Addrtree ).
        !          1701: .PP
        !          1702: Throughout, errors are handled by a non-local
        !          1703: .CW goto
        !          1704: (a
        !          1705: .CW setjmp/longjmp
        !          1706: in C terminology)
        !          1707: hidden in a routine called
        !          1708: .CW error
        !          1709: that immediately aborts the execution, retracts any
        !          1710: partially made changes (see the section below on `undoing'), and
        !          1711: returns to the top level of the parser.
        !          1712: The argument to
        !          1713: .CW error
        !          1714: is an enumeration type that
        !          1715: is translated to a terse but possibly helpful
        !          1716: message such as
        !          1717: .CW "?addresses out of order" .
        !          1718: Very common messages are kept short; for example the message for
        !          1719: a failed regular expression search is
        !          1720: .CW ?search .
        !          1721: .PP
        !          1722: Character addresses such as
        !          1723: .CW #3
        !          1724: are trivial to implement, as the
        !          1725: .CW File
        !          1726: data structure is accessible by character number.
        !          1727: However,
        !          1728: .I sam
        !          1729: keeps no information about the position of newlines \(em it is too
        !          1730: expensive to track dynamically \(em so line addresses are computed by reading
        !          1731: the file, counting newlines.  Except in very large files, this has proven
        !          1732: acceptable: file access is fast enough to make the technique practical,
        !          1733: and lines are not central to the structure of the command language.
        !          1734: .PP
        !          1735: The command interpreter, called
        !          1736: .CW cmdexec ,
        !          1737: is also straightforward.  The parse table includes a
        !          1738: function to call to interpret a particular command.  That function
        !          1739: receives as arguments
        !          1740: the calculated address
        !          1741: for the command
        !          1742: and the command tree (of type
        !          1743: .CW Cmdtree ),
        !          1744: which may contain information such as the subtree for compound commands.
        !          1745: For example, the function for the
        !          1746: .CW g
        !          1747: and
        !          1748: .CW v
        !          1749: commands is shown in Display 2.
        !          1750: .CW Compile "" (
        !          1751: and
        !          1752: .CW execute
        !          1753: are part of the regular expression code, described in the next section.)
        !          1754: Because the parser and the
        !          1755: .CW File
        !          1756: data structure do most of the work, most commands
        !          1757: are similarly brief.
        !          1758: .1C
        !          1759: .KF bottom
        !          1760: .P1 10n
        !          1761: int
        !          1762: g_cmd(a, cp)
        !          1763:        Address a;
        !          1764:        Cmdtree *cp;
        !          1765: {
        !          1766:        compile(cp->regexp);
        !          1767:        if(execute(a.f, a.r.p1, a.r.p2) != (cp->cmdchar=='v')){
        !          1768:                a.f->dot=a;
        !          1769:                return cmdexec(a, cp->subcmd);
        !          1770:        }
        !          1771:        return TRUE;    /* indicate that execution is to continue */
        !          1772: }
        !          1773: .P2
        !          1774: .ce
        !          1775: \fBDisplay 2. \f(CWg\fR and \f(CWv\fP commands
        !          1776: .SP
        !          1777: .KE
        !          1778: .2C
        !          1779: .NH 2
        !          1780: Regular expressions
        !          1781: .LP
        !          1782: The regular expression code in
        !          1783: .I sam
        !          1784: is an interpreted, rather than compiled on-the-fly, implementation of Thompson's
        !          1785: non-deterministic finite automaton algorithm.|reference(thompson regular expression)
        !          1786: The syntax and semantics of the expressions are as in the
        !          1787: .UX
        !          1788: program
        !          1789: .I egrep ,
        !          1790: including alternation, closures, character classes, and so on.
        !          1791: The only changes in the notation are two additions:
        !          1792: .CW \en
        !          1793: is translated to, and matches, a newline character, and
        !          1794: .CW @
        !          1795: matches any character.  In
        !          1796: .I egrep ,
        !          1797: the character
        !          1798: .CW \&.
        !          1799: matches any character except newline, and in
        !          1800: .I sam
        !          1801: the same rule seemed safest, to prevent idioms like
        !          1802: .CW \&.*
        !          1803: from spanning newlines.
        !          1804: .I Egrep
        !          1805: expressions are arguably too complicated for an interactive editor \(em
        !          1806: certainly it would make sense if all the special characters were two-character
        !          1807: sequences, so that most of the punctuation characters wouldn't have
        !          1808: peculiar meanings \(em but for an interesting command language, full
        !          1809: regular expressions are necessary, and
        !          1810: .I egrep
        !          1811: defines the full regular expression syntax for
        !          1812: .UX
        !          1813: programs.
        !          1814: Also, it seemed superfluous to define a new syntax, since various
        !          1815: .UX
        !          1816: programs
        !          1817: .I ed , (
        !          1818: .I egrep
        !          1819: and
        !          1820: .I vi )
        !          1821: define too many already.
        !          1822: .PP
        !          1823: The expressions are compiled by a routine,
        !          1824: .CW compile ,
        !          1825: that generates the description of the non-deterministic finite state machine.
        !          1826: A second routine,
        !          1827: .CW execute ,
        !          1828: interprets the machine to generate the leftmost-longest match of the
        !          1829: expression in a substring of the file.
        !          1830: The algorithm is described elsewhere.|reference(thompson regular expression)|reference(aho hopcroft ullman)
        !          1831: .CW Execute
        !          1832: reports
        !          1833: whether a match was found, and sets a global variable,
        !          1834: of type
        !          1835: .CW Range ,
        !          1836: to the substring matched.
        !          1837: .PP
        !          1838: A trick is required to evaluate the expression in reverse, such as when
        !          1839: searching backwards for an expression.
        !          1840: For example,
        !          1841: .P1 2n
        !          1842: -/P.*r/
        !          1843: .P2
        !          1844: looks backwards through the file for a match of the expression.
        !          1845: The expression, however, is defined for a forward search.
        !          1846: The solution is to construct a machine identical to the machine
        !          1847: for a forward search except for a reversal of all the concatenation
        !          1848: operators (the other operators are symmetric under direction reversal),
        !          1849: to exchange the meaning of the operators
        !          1850: .CW ^
        !          1851: and
        !          1852: .CW $ ,
        !          1853: and then to read the file backwards, looking for the
        !          1854: usual earliest longest match.
        !          1855: .PP
        !          1856: .CW Execute
        !          1857: generates only one match each time it is called.
        !          1858: To interpret looping constructs such as the
        !          1859: .CW x
        !          1860: command,
        !          1861: .I sam
        !          1862: must therefore synchronize between
        !          1863: calls of
        !          1864: .CW execute
        !          1865: to avoid
        !          1866: problems with null matches.
        !          1867: For example, even given the leftmost-longest rule,
        !          1868: the expression
        !          1869: .CW a*
        !          1870: matches three times in the string
        !          1871: .CW ab
        !          1872: (the character
        !          1873: .CW a ,
        !          1874: the null string between the
        !          1875: .CW a
        !          1876: and
        !          1877: .CW b ,
        !          1878: and the final null string).
        !          1879: After returning a match for the
        !          1880: .CW a ,
        !          1881: .I sam
        !          1882: must not match the null string before the
        !          1883: .CW b .
        !          1884: The algorithm starts
        !          1885: .CW execute
        !          1886: at the end of its previous match, and
        !          1887: if the match it returns
        !          1888: is null and abuts the previous match, rejects the match and advances
        !          1889: the initial position one character.
        !          1890: .NH 2
        !          1891: Memory allocation
        !          1892: .LP
        !          1893: The C language has no memory allocation primitives, although a standard
        !          1894: library routine,
        !          1895: .I malloc ,
        !          1896: provides adequate service for simple programs.
        !          1897: For specific uses, however,
        !          1898: it can be better to write a custom allocator.
        !          1899: The allocator (or rather, pair of allocators) described here
        !          1900: work in both the terminal and host parts of
        !          1901: .I sam .
        !          1902: They are designed for efficient manipulation of strings,
        !          1903: which are allocated and freed frequently and vary in length from essentially
        !          1904: zero to 32 Kbytes (very large strings are written to disc).
        !          1905: More important, strings may be large and change size often,
        !          1906: so to minimize memory usage it is helpful to reclaim and to coalesce the
        !          1907: unused portions of strings when they are truncated.
        !          1908: .PP
        !          1909: Objects to be allocated in
        !          1910: .I sam
        !          1911: are of two flavors:
        !          1912: the first is C
        !          1913: .CW structs ,
        !          1914: which are small and often addressed by pointer variables;
        !          1915: the second is variable-sized arrays of characters
        !          1916: or integers whose
        !          1917: base pointer is always used to access them.
        !          1918: The memory allocator in
        !          1919: .I sam
        !          1920: is therefore in two parts:
        !          1921: first, a traditional first-fit allocator that provides fixed storage for
        !          1922: .CW structs ;
        !          1923: and second, a garbage-compacting allocator that reduces storage
        !          1924: overhead for variable-sized objects, at the cost of some bookkeeping.
        !          1925: The two types of objects are allocated from adjoining arenas, with
        !          1926: the garbage-compacting allocator controlling the arena with higher addresses.
        !          1927: Separating into two arenas simplifies compaction and prevents fragmentation due
        !          1928: to immovable objects.
        !          1929: The access rules for garbage-compactable objects
        !          1930: (discussed in the next paragraph) allow them to be relocated, so when
        !          1931: the first-fit arena needs space, it moves the garbage-compacted arena
        !          1932: to higher addresses to make room.  Storage is therefore created only
        !          1933: at successively higher addresses, either when more garbage-compacted
        !          1934: space is needed or when the first-fit arena pushes up the other arena.
        !          1935: .PP
        !          1936: Objects that may be compacted declare to the
        !          1937: allocator a cell that is guaranteed to be the sole repository of the
        !          1938: address of the object whenever a compaction can occur.
        !          1939: The compactor can then update the address when the object is moved.
        !          1940: For example, the implementation of type
        !          1941: .CW List
        !          1942: (really a variable-length array)
        !          1943: is:
        !          1944: .P1 2n
        !          1945: typedef struct List{
        !          1946:        int     nused;
        !          1947:        long    *ptr;
        !          1948: }List;
        !          1949: .P2
        !          1950: The
        !          1951: .CW ptr
        !          1952: cell must always be used directly, and never copied.  When a
        !          1953: .CW List
        !          1954: is to be created the
        !          1955: .CW List
        !          1956: structure is allocated in the ordinary first-fit arena
        !          1957: and its
        !          1958: .CW ptr
        !          1959: is allocated in the garbage-compacted arena.
        !          1960: A similar data type for strings, called
        !          1961: .CW String ,
        !          1962: stores variable-length character arrays of up to 32767 elements.
        !          1963: .PP
        !          1964: A related matter of programming style:
        !          1965: .I sam
        !          1966: frequently passes structures by value, which
        !          1967: simplifies the code.
        !          1968: Traditionally, C programs have
        !          1969: passed structures by reference, but implicit allocation on
        !          1970: the stack is easier to use.
        !          1971: Structure passing is a relatively new feature of C
        !          1972: (it is not in the 
        !          1973: standard reference manual for C|reference(cbook)),
        !          1974: and is poorly supported in most
        !          1975: commercial C compilers.
        !          1976: It's convenient and expressive, though,
        !          1977: and simplifies memory management by
        !          1978: avoiding the allocator altogether
        !          1979: and eliminating pointer aliases.
        !          1980: .....
        !          1981: .NH 2
        !          1982: Data structures for manipulating files
        !          1983: .LP
        !          1984: Experience with
        !          1985: .I jim
        !          1986: showed that the requirements
        !          1987: of the file data structure were few, but strict.
        !          1988: First, files need to be read and written quickly;
        !          1989: adding a fresh file must be painless.
        !          1990: Second, the implementation must place no arbitrary upper limit on
        !          1991: the number or sizes of files.  (It should be practical to edit many files,
        !          1992: and files up to megabytes in length should be handled gracefully.)
        !          1993: This implies that files be stored on disc, not in main memory.
        !          1994: (Aficionados of virtual memory may argue otherwise, but the
        !          1995: implementation of virtual
        !          1996: memory in our system is not something to depend on
        !          1997: for good performance.)
        !          1998: Third, changes to files need be made by only two primitives:
        !          1999: deletion and insertion.
        !          2000: These are inverses of each other,
        !          2001: which simplifies the implementation of the undo operation.
        !          2002: Finally,
        !          2003: it must be easy and efficient to access the file, either
        !          2004: forwards or backwards, a byte at a time.
        !          2005: .PP
        !          2006: The
        !          2007: .CW File
        !          2008: data type is constructed from three simpler data structures that hold arrays
        !          2009: of characters.
        !          2010: Each of these types has an insertion and deletion operator, and the
        !          2011: insertion and deletion operators of the
        !          2012: .CW File
        !          2013: type itself are constructed from them.
        !          2014: .PP
        !          2015: The simplest type is the
        !          2016: .CW String ,
        !          2017: which is used to hold strings in main memory.
        !          2018: The code that manages
        !          2019: .CW Strings
        !          2020: guarantees that they will never be longer
        !          2021: than some moderate size, and in practice they are rarely larger than 8 Kbytes.
        !          2022: .CW Strings
        !          2023: have two purposes: they hold short strings like file names with little overhead,
        !          2024: and because they are deliberately small, they are efficient to modify.
        !          2025: They are therefore used as the data structure for in-memory caches.
        !          2026: .PP
        !          2027: The disc copy of the file is managed by a data structure called a
        !          2028: .CW Disc ,
        !          2029: which corresponds to a temporary file.  A
        !          2030: .CW Disc
        !          2031: has no storage in main memory other than bookkeeping information;
        !          2032: the actual data being held is all on the disc.
        !          2033: To reduce the number of open files needed,
        !          2034: .I sam
        !          2035: opens a dozen temporary
        !          2036: .UX
        !          2037: files and multiplexes the
        !          2038: .CW Discs
        !          2039: upon them.
        !          2040: This permits many files to
        !          2041: be edited; the entire
        !          2042: .I sam
        !          2043: source (48 files) may be edited comfortably with a single
        !          2044: instance of
        !          2045: .I sam .
        !          2046: Allocating one temporary file per
        !          2047: .CW Disc
        !          2048: would strain the operating system's limit on the number of open files.
        !          2049: Also, spreading the traffic among temporary files keeps the files shorter,
        !          2050: and shorter files are more efficiently implemented by the
        !          2051: .UX
        !          2052: I/O subsystem.
        !          2053: .PP
        !          2054: A
        !          2055: .CW Disc
        !          2056: is an array of fixed-length blocks, each of which contains
        !          2057: between 1 and 4096 characters of active data.
        !          2058: (The block size of our
        !          2059: .UX
        !          2060: file system is 4096 bytes.)
        !          2061: The block addresses within the temporary file and the length of each
        !          2062: block are stored in a
        !          2063: .CW List .
        !          2064: When changes are made the live part of blocks may change size.
        !          2065: Blocks are created and coalesced when necessary to try to keep the sizes
        !          2066: between 2048 and 4096 bytes.
        !          2067: An actively changing part of the
        !          2068: .CW Disc
        !          2069: therefore typically has about a kilobyte of slop that can be
        !          2070: inserted or deleted
        !          2071: without changing more than one block or affecting the block order.
        !          2072: When an insertion would overflow a block, the block is split, a new one
        !          2073: is allocated to receive the overflow, and the memory-resident list of blocks
        !          2074: is rearranged to reflect the insertion of the new block.
        !          2075: .PP
        !          2076: Obviously, going to the disc for every modification to the file is
        !          2077: prohibitively expensive.
        !          2078: The data type
        !          2079: .CW Buffer
        !          2080: consists of a
        !          2081: .CW Disc
        !          2082: to hold the data and a
        !          2083: .CW String
        !          2084: that acts as a cache.
        !          2085: This is the first of a series of caches throughout the data structures in
        !          2086: .I sam.
        !          2087: The caches not only improve performance, they provide a way to organize
        !          2088: the flow of data, particularly in the communication between the host
        !          2089: and terminal.
        !          2090: This idea is developed below, in the section on communications.
        !          2091: .PP
        !          2092: To reduce disc traffic, changes to a
        !          2093: .CW Buffer
        !          2094: are mediated by a variable-length string, in memory, that acts as a cache.
        !          2095: When an insertion or deletion is made to a
        !          2096: .CW Buffer ,
        !          2097: if the change can be accommodated by the cache, it is done there.
        !          2098: If the cache becomes bigger than a block because of an insertion,
        !          2099: some of it is written to the
        !          2100: .CW Disc
        !          2101: and deleted from the cache.
        !          2102: If the change does not intersect the cache, the cache is flushed.
        !          2103: The cache is only loaded at the new position if the change is smaller than a block;
        !          2104: otherwise, it is sent directly to the
        !          2105: .CW Disc .
        !          2106: This is because
        !          2107: large changes are typically sequential,
        !          2108: whereupon the next change is unlikely to overlap the current one.
        !          2109: .PP
        !          2110: A
        !          2111: .CW File
        !          2112: comprises a
        !          2113: .CW String
        !          2114: to hold the file name and some ancillary data such as dot and the modified bit.
        !          2115: The most important components, though, are a pair of
        !          2116: .CW Buffers ,
        !          2117: one called the transcript and the other the contents.
        !          2118: Their use is described in the next section.
        !          2119: .PP
        !          2120: The overall structure is shown in Figure 5.
        !          2121: Although it may seem that the data is touched many times on its
        !          2122: way from the
        !          2123: .CW Disc ,
        !          2124: it is read (by one
        !          2125: .UX
        !          2126: system call) directly into the cache of the
        !          2127: associated
        !          2128: .CW Buffer ;
        !          2129: no extra copy is done.
        !          2130: Similarly, when flushing the cache, the text is written
        !          2131: directly from the cache to disc.
        !          2132: Most operations act directly on the text in the cache.
        !          2133: A principle applied throughout
        !          2134: .I sam
        !          2135: is that the fewer times the data is copied, the faster the program will run
        !          2136: (see also the paper by Waite|reference(waite lexical analysis)).
        !          2137: .1C
        !          2138: .KF
        !          2139: .PS
        !          2140: copy "fig5.pic"
        !          2141: .PE
        !          2142: .Cs
        !          2143: Figure 5. File data structures.
        !          2144: The temporary files are stored in the standard repository for such files
        !          2145: on the host system.
        !          2146: .Ce
        !          2147: .KE
        !          2148: .2C
        !          2149: .PP
        !          2150: The contents of a
        !          2151: .CW File
        !          2152: are accessed by a routine that
        !          2153: copies to a buffer a substring of a file starting at a specified offset.
        !          2154: To read a byte at a time, a
        !          2155: .CW File "" per-
        !          2156: array is loaded starting from a specified initial position,
        !          2157: and bytes may then be read from the array.
        !          2158: The implementation is done by a macro similar to the C standard I/O
        !          2159: .I getc
        !          2160: macro.|reference(cbook)
        !          2161: Because the reading may be done at any address, a minor change to the
        !          2162: macro allows the file to be read backwards.
        !          2163: This array is read-only; there is no
        !          2164: .I putc .
        !          2165: .NH 2
        !          2166: Doing and undoing
        !          2167: .LP
        !          2168: .I Sam
        !          2169: has an unusual method for managing changes to files.
        !          2170: The command language makes it easy to specify multiple variable-length changes
        !          2171: to a file millions of bytes long, and such changes
        !          2172: must be made efficiently if the editor is to be practical.
        !          2173: The usual techniques for inserting and deleting strings
        !          2174: are inadequate under these conditions.
        !          2175: The
        !          2176: .CW Buffer
        !          2177: and
        !          2178: .CW Disc
        !          2179: data structures are designed for efficient random access to long strings,
        !          2180: but care must be taken to avoid super-linear behavior when making
        !          2181: many changes simultaneously.
        !          2182: .PP
        !          2183: .I Sam
        !          2184: uses a two-pass algorithm for making changes, and treats each file as a database
        !          2185: against which transactions are registered.
        !          2186: Changes are not made directly to the contents.
        !          2187: Instead, when a command is started, a `mark' containing
        !          2188: a sequence number is placed in the transcript
        !          2189: .CW Buffer ,
        !          2190: and each change made to the file, either an insertion or deletion
        !          2191: or a change to the file name,
        !          2192: is appended to the end of the transcript.
        !          2193: When the command is complete, the transcript is rewound to the
        !          2194: mark and applied to the contents.
        !          2195: .PP
        !          2196: One reason for separating evaluation from
        !          2197: application in this way is to simplify tracking the addresses of changes
        !          2198: made in the middle of a long sequence.
        !          2199: The two-pass algorithm also allows all changes to apply to the
        !          2200: .I original
        !          2201: data: no change can affect another change made in the same command.
        !          2202: This is particularly important when evaluating an
        !          2203: .CW x
        !          2204: command because it prevents regular expression matches
        !          2205: from stumbling over changes made earlier in the execution.
        !          2206: Also, the two-pass
        !          2207: algorithm is cleaner than the way other
        !          2208: .UX
        !          2209: editors allow changes to
        !          2210: affect each other;
        !          2211: for example,
        !          2212: .I ed 's
        !          2213: idioms to do things like delete every other line
        !          2214: depend critically on the implementation.
        !          2215: Instead,
        !          2216: .I sam 's
        !          2217: simple model, in which all changes in a command occur effectively
        !          2218: simultaneously, is easy to explain and to understand.
        !          2219: .PP
        !          2220: The records in the transcript are of the form ``delete substring from
        !          2221: locations
        !          2222: 123 to 456'' and ``insert 11 characters `hello there' at location 789.''
        !          2223: (It is an error if the changes are not at monotonically greater
        !          2224: positions through the file.)
        !          2225: While the update is occurring, these numbers must be
        !          2226: offset by earlier changes, but that is straightforward and
        !          2227: local to the update routine;
        !          2228: moreover, all the numbers have been computed
        !          2229: before the first is examined.
        !          2230: .PP
        !          2231: Treating the file as a transaction system has another advantage:
        !          2232: undo is trivial.
        !          2233: All it takes is to invert the transcript after it has been
        !          2234: implemented, converting insertions
        !          2235: into deletions and vice versa, and saving them in a holding
        !          2236: .CW Buffer .
        !          2237: The `do' transcript can then be deleted from
        !          2238: the transcript
        !          2239: .CW Buffer
        !          2240: and replaced by the `undo' transcript.
        !          2241: If an undo is requested, the transcript is rewound and the undo transcript
        !          2242: executed.
        !          2243: Because the transcript
        !          2244: .CW Buffer
        !          2245: is not truncated after each command, it accumulates
        !          2246: successive changes.
        !          2247: A sequence of undo commands
        !          2248: can therefore back up the file arbitrarily,
        !          2249: which is more helpful than the more commonly implemented self-inverse form of undo.
        !          2250: .I Sam "" (
        !          2251: provides no way to undo an undo, but if it were desired,
        !          2252: it would be easy to provide by re-interpreting the `do' transcript.)
        !          2253: Each mark in the transcript contains a sequence number and the offset into
        !          2254: the transcript of the previous mark, to aid in unwinding the transcript.
        !          2255: Marks also contain the value of dot and the modified bit so these can be
        !          2256: restored easily.
        !          2257: Undoing multiple files is easy; it merely demands undoing all files whose
        !          2258: latest change has the same sequence number as the current file.
        !          2259: .PP
        !          2260: Another benefit of having a transcript is that errors encountered in the middle
        !          2261: of a complicated command need not leave the files in an intermediate state.
        !          2262: By rewinding the transcript to the mark beginning the command,
        !          2263: the partial command can be trivially undone.
        !          2264: .PP
        !          2265: When the update algorithm was first implemented, it was unacceptably slow,
        !          2266: so a cache was added to coalesce nearby changes,
        !          2267: replacing multiple small changes by a single larger one.
        !          2268: This reduced the number
        !          2269: of insertions into the transaction
        !          2270: .CW Buffer ,
        !          2271: and made a dramatic improvement in performance,
        !          2272: but made it impossible
        !          2273: to handle changes in non-monotonic order in the file; the caching method
        !          2274: only works if changes don't overlap.
        !          2275: Before the cache was added, the transaction could in principle be sorted
        !          2276: if the changes were out of order, although
        !          2277: this was never done.
        !          2278: The current status is therefore acceptable performance with a minor
        !          2279: restriction on global changes, which is sometimes, but rarely, an annoyance.
        !          2280: .PP
        !          2281: The update algorithm obviously paws the data more than simpler
        !          2282: algorithms, but it is not prohibitively expensive;
        !          2283: the caches help.
        !          2284: (The principle of avoiding copying the data is still honored here,
        !          2285: although not as piously:
        !          2286: the data is moved from contents' cache to
        !          2287: the transcript's all at once and through only one internal buffer.)
        !          2288: Performance figures confirm the efficiency.
        !          2289: To read from a dead start a hundred kilobyte file on a VAX-11/750
        !          2290: takes 1.4 seconds of user time, 2.5 seconds of system time,
        !          2291: and 5 seconds of real time.
        !          2292: Reading the same file in
        !          2293: .I ed
        !          2294: takes 6.0 seconds of user time, 1.7 seconds of system time,
        !          2295: and 8 seconds of real time.
        !          2296: .I Sam
        !          2297: uses about half the CPU time.
        !          2298: A more interesting example is the one stated above:
        !          2299: inserting a character between every pair of characters in the file.
        !          2300: The
        !          2301: .I sam
        !          2302: command is
        !          2303: .P1 2n
        !          2304: ,y/@/ a/x/
        !          2305: .P2
        !          2306: and takes 3 CPU seconds per kilobyte of input file, of which
        !          2307: about a third is spent in the regular expression code.
        !          2308: This translates to about 500 changes per second.
        !          2309: .I Ed
        !          2310: takes 1.5 seconds per kilobyte to make a similar change (ignoring newlines),
        !          2311: but cannot undo it.
        !          2312: The same example in
        !          2313: .I ex ,|reference(bsdmanual)
        !          2314: a variant of
        !          2315: .I ed
        !          2316: done at the University of California at Berkeley,
        !          2317: which allows one level of undoing, again takes 3 seconds.
        !          2318: In summary,
        !          2319: .I sam 's
        !          2320: performance is comparable to that of other
        !          2321: .UX
        !          2322: editors, although it solves
        !          2323: a harder problem.
        !          2324: .NH 2
        !          2325: Communications
        !          2326: .LP
        !          2327: The discussion so far has described the implementation of the host part of
        !          2328: .I sam ;
        !          2329: the next few sections explain how a machine with mouse and bitmap display
        !          2330: can be engaged to improve interaction.
        !          2331: .I Sam
        !          2332: is not the first editor to be written as two processes,|reference(fraser text editor)
        !          2333: but its implementation
        !          2334: has some unusual aspects.
        !          2335: .PP
        !          2336: There are several ways
        !          2337: .I sam 's
        !          2338: host and terminal parts may be connected.
        !          2339: The first and simplest is to forgo the terminal part and use the host
        !          2340: part's command language to edit text on an ordinary terminal.
        !          2341: This mode is invoked by starting
        !          2342: .I sam
        !          2343: with the
        !          2344: .CW -d
        !          2345: option.
        !          2346: With no options,
        !          2347: .I sam
        !          2348: runs separate host and terminal programs,
        !          2349: communicating with a message protocol over the physical
        !          2350: connection that joins them.
        !          2351: Typically, the connection is an RS-232 link between a Blit
        !          2352: (the prototypical display for
        !          2353: .I sam )
        !          2354: and a host running
        !          2355: the Research Edition of the
        !          2356: .UX
        !          2357: operating system.|reference(latest volume1)
        !          2358: (This is the version of the system used in the Computing Sciences Research
        !          2359: Center at AT&T Bell Laboratories, where I work.  Its relevant
        !          2360: aspects are discussed in the Blit paper.|reference(bstj blit))
        !          2361: The implementation of
        !          2362: .I sam
        !          2363: for the Sun computer runs both processes on the same machine and
        !          2364: connects them by a pipe.
        !          2365: .PP
        !          2366: The low bandwidth of an RS-232 link
        !          2367: necessitated the split between
        !          2368: the two programs.
        !          2369: The division is a mixed blessing:
        !          2370: a program in two parts is much harder to write and to debug
        !          2371: than a self-contained one,
        !          2372: but the split makes several unusual configurations possible.
        !          2373: The terminal may be physically separated from the host, allowing the conveniences
        !          2374: of a mouse and bitmap display to be taken home while leaving the files at work.
        !          2375: It is also possible to run the host part on a remote machine:
        !          2376: .P1 2n
        !          2377: sam -r host
        !          2378: .P2
        !          2379: connects to the terminal in the usual way, and then makes a call
        !          2380: across the network to establish the host part of
        !          2381: .I sam
        !          2382: on the named machine.
        !          2383: Finally, it cross-connects the I/O to join the two parts.
        !          2384: This allows
        !          2385: .I sam
        !          2386: to be run on machines that do not support bitmap displays;
        !          2387: for example,
        !          2388: .I sam
        !          2389: is the editor of choice on our Cray X-MP/24.
        !          2390: .I Sam
        !          2391: .CW -r
        !          2392: involves
        !          2393: .I three
        !          2394: machines: the remote host, the terminal, and the local host.
        !          2395: The local host's job is simple but vital: it passes the data
        !          2396: between the remote host and terminal.
        !          2397: .PP
        !          2398: The host and terminal exchange messages asynchronously
        !          2399: (rather than, say, as remote procedure calls) but there is no
        !          2400: error detection or correction
        !          2401: because, whatever the configuration, the connection is reliable.
        !          2402: Because the terminal handles mundane interaction tasks such as
        !          2403: popping up menus and interpreting the responses, the messages are about
        !          2404: data, not actions.
        !          2405: For example, the host knows nothing about what is displayed on the screen,
        !          2406: and when the user types a character, the message sent to the host says
        !          2407: ``insert a one-byte string at location 123 in file 7,'' not ``a character
        !          2408: was typed at the current position in the current file.''
        !          2409: In other words, the messages look very much like the transaction records
        !          2410: in the transcripts.
        !          2411: .PP
        !          2412: Either the host or terminal part of
        !          2413: .I sam
        !          2414: may initiate a change to a file.
        !          2415: The command language operates on the host, while typing and some
        !          2416: mouse operations are executed directly in the terminal to optimize response.
        !          2417: Changes initiated by the host program must be transmitted to the terminal,
        !          2418: and
        !          2419: vice versa.
        !          2420: (A token is exchanged to determine which end is in control,
        !          2421: which means that characters typed while a time-consuming command runs
        !          2422: must be buffered and do not appear until the command is complete.)
        !          2423: To maintain consistent information,
        !          2424: the host and terminal track changes through a per-file
        !          2425: data structure that records what portions of the file
        !          2426: the terminal has received.
        !          2427: The data structure, called a
        !          2428: .CW Rasp
        !          2429: (a weak pun: it's a file with holes)
        !          2430: is held and updated by both the host and terminal.
        !          2431: A
        !          2432: .CW Rasp
        !          2433: is a list of
        !          2434: .CW Strings
        !          2435: holding those parts of the file known to the terminal,
        !          2436: separated by counts of the number of bytes in the interstices.
        !          2437: Of course, the host doesn't keep a separate copy of the data (it only needs
        !          2438: the lengths of the various pieces),
        !          2439: but the structure is the same on both ends.
        !          2440: .PP
        !          2441: The
        !          2442: .CW Rasp
        !          2443: in the terminal doubles as a cache.
        !          2444: Since the terminal keeps the text for portions of the file it has displayed,
        !          2445: it need not request data from the host when revisiting old parts of the file
        !          2446: or redrawing obscured windows, which speeds things up considerably
        !          2447: over low-speed links.
        !          2448: .PP
        !          2449: It's trivial for the terminal to maintain its
        !          2450: .CW Rasp ,
        !          2451: because all changes made on the terminal apply to parts of the file
        !          2452: already loaded there.
        !          2453: Changes made by the host are compared against the
        !          2454: .CW Rasp
        !          2455: during the update sequence after each command.
        !          2456: Small changes to pieces of the file loaded in the terminal
        !          2457: are sent in their entirety.
        !          2458: Larger changes, and changes that fall entirely in the holes,
        !          2459: are transmitted as messages without literal data:
        !          2460: only the lengths of the deleted and inserted strings are transmitted.
        !          2461: When a command is completed, the terminal examines its visible
        !          2462: windows to see if any holes in their
        !          2463: .CW Rasps
        !          2464: intersect the visible portion of the file.
        !          2465: It then requests the missing data from the host,
        !          2466: along with up to 512 bytes of surrounding data, to minimize
        !          2467: the number of messages when visiting a new portion of the file.
        !          2468: This technique provides a kind of two-level lazy evaluation for the terminal.
        !          2469: The first level sends a minimum of information about
        !          2470: parts of the file not being edited interactively;
        !          2471: the second level waits until a change is displayed before
        !          2472: transmitting the new data.
        !          2473: Of course,
        !          2474: performance is also helped by having the terminal respond immediately to typing
        !          2475: and simple mouse requests.
        !          2476: Except for small changes to active pieces of the file, which are
        !          2477: transmitted to the terminal without negotiation,
        !          2478: the terminal is wholly responsible for deciding what is displayed;
        !          2479: the host uses the
        !          2480: .CW Rasp
        !          2481: only to tell the terminal what might be relevant.
        !          2482: .PP
        !          2483: When a change is initiated by the host,
        !          2484: the messages to the terminal describing the change
        !          2485: are generated by the routine that applies the transcript of the changes
        !          2486: to the contents of the
        !          2487: .CW File .
        !          2488: Since changes are undone by the same update routine,
        !          2489: undoing requires
        !          2490: no extra code in the communications;
        !          2491: the usual messages describing changes to the file are sufficient
        !          2492: to back up the screen image.
        !          2493: .PP
        !          2494: The
        !          2495: .CW Rasp
        !          2496: is a particularly good example of the way caches are used in
        !          2497: .I sam .
        !          2498: First, it facilitates access to the active portion of the text by placing
        !          2499: the busy text in main memory.
        !          2500: In so doing, it provides efficient access
        !          2501: to a large data structure that does not fit in memory.
        !          2502: Since the form of data is to be imposed by the user, not by the program,
        !          2503: and because characters will frequently be scanned sequentially,
        !          2504: files are stored as flat objects.
        !          2505: Caches help keep performance good and linear when working with such
        !          2506: data.
        !          2507: .PP
        !          2508: Second, the
        !          2509: .CW Rasp
        !          2510: and several of the other caches have some
        !          2511: .I read-ahead;
        !          2512: that is, the cache is loaded with more information than is needed for
        !          2513: the job immediately at hand.
        !          2514: When manipulating linear structures, the accesses are usually sequential,
        !          2515: and read-ahead can significantly reduce the average time to access the
        !          2516: next element of the object.
        !          2517: Sequential access is a common mode for people as well as programs;
        !          2518: consider scrolling through a document while looking for something.
        !          2519: .PP
        !          2520: Finally, like any good data structure,
        !          2521: the cache guides the algorithm, or at least the implementation.
        !          2522: The
        !          2523: .CW Rasp
        !          2524: was actually invented to control the communications between the host and
        !          2525: terminal parts, but I realized very early that it was also a form of
        !          2526: cache.  Other caches were more explicitly intended to serve a double
        !          2527: purpose: for example, the caches in
        !          2528: .CW Files
        !          2529: that coalesce updates not only reduce traffic to the
        !          2530: transcript and contents
        !          2531: .CW Buffers ,
        !          2532: they also clump screen updates so that complicated changes to the
        !          2533: screen are achieved in
        !          2534: just a few messages to the terminal.
        !          2535: This saved me considerable work: I did not need to write special
        !          2536: code to optimize the message traffic to the
        !          2537: terminal.
        !          2538: Caches pay off in surprising ways.
        !          2539: Also, they tend to be independent, so their performance improvements
        !          2540: are multiplicative.
        !          2541: .NH 2
        !          2542: Data structures in the terminal
        !          2543: .LP
        !          2544: The terminal's job is to display and to maintain a consistent image of
        !          2545: pieces of the files being edited.
        !          2546: Because the text is always in memory, the data structures are
        !          2547: considerably simpler than those in the host part.
        !          2548: .PP
        !          2549: .I Sam
        !          2550: typically has far more windows than does
        !          2551: .I mux ,
        !          2552: the window system within which its Blit implementation runs.
        !          2553: .I Mux
        !          2554: has a fairly small number of asynchronously updated windows;
        !          2555: .I sam
        !          2556: needs a large number of synchronously updated windows that are
        !          2557: usually static and often fully obscured.
        !          2558: The different tradeoffs guided
        !          2559: .I sam
        !          2560: away from the memory-intensive implementation of windows, called
        !          2561: .CW Layers ,|reference(pike overlapping)
        !          2562: used in
        !          2563: .I mux.
        !          2564: Rather than depending on a complete bitmap image of the display for each window,
        !          2565: .I sam
        !          2566: regenerates the image from its in-memory text
        !          2567: (stored in the
        !          2568: .CW Rasp )
        !          2569: when necessary, although it will use such an image if it is available.
        !          2570: Like
        !          2571: .CW Layers ,
        !          2572: though,
        !          2573: .I sam
        !          2574: uses the screen bitmap as active storage in which to update the image using
        !          2575: .I bitblt .|reference(guibas stolfi)|reference(pike locanthi blit bitmap)
        !          2576: The resulting organization, pictured in Figure 6,
        !          2577: has a global array of windows, called
        !          2578: .CW Flayers ,
        !          2579: each of which holds an image of a piece of text held in a data structure
        !          2580: called a
        !          2581: .CW Frame ,
        !          2582: which in turn represents
        !          2583: a rectangular window full of text displayed in some
        !          2584: .CW Bitmap .
        !          2585: Each
        !          2586: .CW Flayer
        !          2587: appears in a global list that orders them all front-to-back
        !          2588: on the display, and simultaneously as an element of a per-file array
        !          2589: that holds all the open windows for that file.
        !          2590: The complement in the terminal of the
        !          2591: .CW File
        !          2592: on the host is called a
        !          2593: .CW Text ;
        !          2594: each connects its
        !          2595: .CW Flayers
        !          2596: to the associated
        !          2597: .CW Rasp .
        !          2598: .1C
        !          2599: .KF
        !          2600: .PS
        !          2601: copy "fig6.pic"
        !          2602: .PE
        !          2603: .Cs
        !          2604: Figure 6. Data structures in the terminal.
        !          2605: .CW Flayers
        !          2606: are also linked together into a front-to-back list.
        !          2607: .CW Boxes
        !          2608: are discussed in the next section.
        !          2609: .Ce
        !          2610: .KE
        !          2611: .2C
        !          2612: .PP
        !          2613: The
        !          2614: .CW Bitmap
        !          2615: for a
        !          2616: .CW Frame
        !          2617: contains the image of the text.
        !          2618: For a fully visible window, the
        !          2619: .CW Bitmap
        !          2620: will be the screen (or at least the
        !          2621: .CW Layer
        !          2622: in which
        !          2623: .I sam
        !          2624: is being run),
        !          2625: while for partially obscured windows the
        !          2626: .CW Bitmap
        !          2627: will be off-screen.
        !          2628: If the window is fully obscured, the
        !          2629: .CW Bitmap
        !          2630: will be null.
        !          2631: .PP
        !          2632: The
        !          2633: .CW Bitmap
        !          2634: is a kind of cache.
        !          2635: When making changes to the display, most of the original image will
        !          2636: look the same in the final image, and the update algorithms exploit this.
        !          2637: The
        !          2638: .CW Frame
        !          2639: software updates the image in the
        !          2640: .CW Bitmap
        !          2641: incrementally; the
        !          2642: .CW Bitmap
        !          2643: is not just an image, it is a data structure.|reference(guibas stolfi)|reference(pike locanthi blit bitmap)
        !          2644: The job of the software that updates the display is therefore
        !          2645: to use as much as possible of the existing image (converting the
        !          2646: text from ASCII characters to pixels is expensive) in a sort of two-dimensional
        !          2647: string insertion algorithm.
        !          2648: The details of this process are described in the next section.
        !          2649: .PP
        !          2650: The
        !          2651: .CW Frame
        !          2652: software has no code to support overlapping windows;
        !          2653: its job is to keep a single
        !          2654: .CW Bitmap
        !          2655: up to date.
        !          2656: It falls to the
        !          2657: .CW Flayer
        !          2658: software to multiplex the various
        !          2659: .CW Bitmaps
        !          2660: onto the screen.
        !          2661: The problem of maintaining overlapping
        !          2662: .CW Flayers
        !          2663: is easier than for
        !          2664: .CW Layers |reference(pike overlapping)
        !          2665: because changes are made synchronously and because the contents of the window
        !          2666: can be reconstructed from the data stored in the
        !          2667: .CW Frame ;
        !          2668: the
        !          2669: .CW Layers
        !          2670: software
        !          2671: makes no such assumptions.
        !          2672: In
        !          2673: .I sam ,
        !          2674: the window being changed is almost always fully visible, because the current
        !          2675: window is always fully visible, by construction.
        !          2676: However, when multi-file changes are being made, or when
        !          2677: more than one window is open on a file,
        !          2678: it may be necessary to update partially obscured windows.
        !          2679: .PP
        !          2680: There are three cases: the window is 
        !          2681: fully visible, invisible (fully obscured), or partially visible.
        !          2682: If fully visible, the
        !          2683: .CW Bitmap
        !          2684: is part of the screen, so when the
        !          2685: .CW Flayer
        !          2686: update routine calls the
        !          2687: .CW Frame
        !          2688: update routine, the screen will be updated directly.
        !          2689: If the window is invisible,
        !          2690: there is no associated
        !          2691: .CW Bitmap ,
        !          2692: and all that is necessary is to update the
        !          2693: .CW Frame
        !          2694: data structure, not the image.
        !          2695: If the window is partially visible, the
        !          2696: .CW Frame
        !          2697: routine is called to update the image in the off-screen
        !          2698: .CW Bitmap ,
        !          2699: which may require regenerating it from the text of the window.
        !          2700: The
        !          2701: .CW Flayer
        !          2702: code then clips this
        !          2703: .CW Bitmap
        !          2704: against the
        !          2705: .CW Bitmaps
        !          2706: of all
        !          2707: .CW Frames
        !          2708: in front of the
        !          2709: .CW Frame
        !          2710: being modified, and the remainder is copied to the display.
        !          2711: .PP
        !          2712: This is much faster than recreating the image off-screen
        !          2713: for every change, or clipping all the changes made to the image
        !          2714: during its update.
        !          2715: Unfortunately, these caches can also consume prohibitive amounts of
        !          2716: memory, so they are freed fairly liberally \(em after every change to the
        !          2717: front-to-back order of the
        !          2718: .CW Flayers .
        !          2719: The result is that
        !          2720: the off-screen
        !          2721: .CW Bitmaps
        !          2722: exist only while multi-window changes are occurring,
        !          2723: which is the only time the performance improvement they provide is needed.
        !          2724: Also, the user interface causes fully-obscured windows to be the
        !          2725: easiest to make \(em
        !          2726: creating a canonically sized and placed window requires only a button click
        !          2727: \(em which reduces the need for caching still further.
        !          2728: .NH 2
        !          2729: Screen update
        !          2730: .LP
        !          2731: Only two low-level primitives are needed for incremental update:
        !          2732: .I bitblt ,
        !          2733: which copies rectangles of pixels, and
        !          2734: .I string
        !          2735: (which in turn calls
        !          2736: .I bitblt ),
        !          2737: which draws a null-terminated character string in a
        !          2738: .CW Bitmap .
        !          2739: A
        !          2740: .CW Frame
        !          2741: contains a list of
        !          2742: .CW Boxes ,
        !          2743: each of which defines a horizontal strip of text in the window
        !          2744: (see Figure 7).
        !          2745: A
        !          2746: .CW Box
        !          2747: has a character string
        !          2748: .CW str ,
        !          2749: and a
        !          2750: .CW Rectangle
        !          2751: .CW rect
        !          2752: that defines the location of the strip in the window.
        !          2753: (The text in
        !          2754: .CW str
        !          2755: is stored in the
        !          2756: .CW Box
        !          2757: separately from the
        !          2758: .CW Rasp
        !          2759: associated with the window's file, so
        !          2760: .CW Boxes
        !          2761: are self-contained.)
        !          2762: The invariant is that
        !          2763: the image of the
        !          2764: .CW Box
        !          2765: can be reproduced by calling
        !          2766: .CW string
        !          2767: with argument
        !          2768: .CW str
        !          2769: to draw the string in
        !          2770: .CW rect ,
        !          2771: and the resulting picture fits perfectly within
        !          2772: .CW rect .
        !          2773: In other words, the
        !          2774: .CW Boxes
        !          2775: define the tiling of the window.
        !          2776: The tiling may be complicated by long lines of text, which
        !          2777: are folded onto the next line.
        !          2778: Some editors use horizontal scrolling to avoid this complication,
        !          2779: but to be comfortable this technique requires that lines not be
        !          2780: .I too
        !          2781: long;
        !          2782: .I sam
        !          2783: has no such restriction.
        !          2784: Also, and perhaps more importantly,
        !          2785: .UX
        !          2786: programs and terminals traditionally fold
        !          2787: long lines to make their contents fully visible.
        !          2788: .PP
        !          2789: Two special kinds of
        !          2790: .CW Boxes
        !          2791: contain a single
        !          2792: character: either a newline or a tab.
        !          2793: Newlines and tabs are white space.
        !          2794: A newline
        !          2795: .CW Box
        !          2796: always extends to the right edge of the window,
        !          2797: forcing the following
        !          2798: .CW Box
        !          2799: to the next line.
        !          2800: The width of a tab depends on where it is located:
        !          2801: it forces the next
        !          2802: .CW Box
        !          2803: to begin at a tab location.
        !          2804: Tabs also
        !          2805: have a minimum width equivalent to a blank (blanks are
        !          2806: drawn by
        !          2807: .CW string
        !          2808: and are not treated specially); newlines have a minimum width of zero.
        !          2809: .1C
        !          2810: .KF
        !          2811: .PS
        !          2812: copy "fig7.pic"
        !          2813: .PE
        !          2814: .sp .5
        !          2815: .Cs
        !          2816: Figure 7. A line of text showing its
        !          2817: .CW Boxes .
        !          2818: The first two blank
        !          2819: .CW Boxes
        !          2820: contain tabs; the last contains a newline.
        !          2821: Spaces are handled as ordinary characters.
        !          2822: .Ce
        !          2823: .KE
        !          2824: .2C
        !          2825: .PP
        !          2826: The update algorithms always use the
        !          2827: .CW Bitmap
        !          2828: image of the text (either the display or cache
        !          2829: .CW Bitmap );
        !          2830: they never examine the characters within a
        !          2831: .CW Box
        !          2832: except when the
        !          2833: .CW Box
        !          2834: needs to be split in two.
        !          2835: Before a change, the window consists of a tiling of
        !          2836: .CW Boxes ;
        !          2837: after the change the window is tiled differently.
        !          2838: The update algorithms rearrange the tiles in place, without
        !          2839: backup storage.
        !          2840: The algorithms are not strictly optimal \(em for example, they can
        !          2841: clear a pixel that is later going to be written upon \(em
        !          2842: but they never move a tile that doesn't need to be moved,
        !          2843: and they move each tile at most once.
        !          2844: .CW Frinsert
        !          2845: on a Blit can absorb over a thousand characters a second if the strings
        !          2846: being inserted are a few tens of characters long.
        !          2847: .PP
        !          2848: Consider
        !          2849: .CW frdelete .
        !          2850: Its job is to delete a substring from a
        !          2851: .CW Frame
        !          2852: and restore the image of the
        !          2853: .CW Frame .
        !          2854: The image of a substring has a peculiar shape (see Figure 2) comprising
        !          2855: possibly a partial line,
        !          2856: zero or more full lines,
        !          2857: and possibly a final partial line.
        !          2858: For reference, call this the
        !          2859: .I
        !          2860: Z-shape.
        !          2861: .R
        !          2862: .CW Frdelete
        !          2863: begins by splitting, if necessary, the
        !          2864: .CW Boxes
        !          2865: containing the ends of
        !          2866: the substring so the substring begins and ends on
        !          2867: .CW Box
        !          2868: boundaries.
        !          2869: Because the substring is being deleted, its image is not needed,
        !          2870: so the Z-shape is then cleared.
        !          2871: Then, tiles (that is, the images of
        !          2872: .CW Boxes )
        !          2873: are copied, using
        !          2874: .CW bitblt ,
        !          2875: from immediately after the Z-shape to
        !          2876: the beginning of the Z-shape,
        !          2877: resulting in a new Z-shape.
        !          2878: .CW Boxes "" (
        !          2879: whose contents would span two lines in the new position must first be split.)
        !          2880: .PP
        !          2881: Copying the remainder of the
        !          2882: .CW Frame
        !          2883: tile by tile
        !          2884: this way will clearly accomplish the deletion but eventually,
        !          2885: typically when the copying algorithm encounters a tab or newline,
        !          2886: the old and new
        !          2887: .CW x
        !          2888: coordinates of the tile
        !          2889: to be copied are the same.
        !          2890: This correspondence implies
        !          2891: that the Z-shape has its beginning and ending edges aligned
        !          2892: vertically, and a sequence of at most two
        !          2893: .I bitblt s
        !          2894: can be used to copy the remaining tiles.
        !          2895: The last step is to clear out the resulting empty space at the bottom
        !          2896: of the window;
        !          2897: the number of lines to be cleared is the number of complete lines in the
        !          2898: Z-shape closed by the final
        !          2899: .I bitblt s.
        !          2900: The final step is to merge horizontally adjacent
        !          2901: .CW Boxes
        !          2902: of plain text.
        !          2903: The complete source to
        !          2904: .CW frdelete
        !          2905: is less than 100 lines of C.
        !          2906: .PP
        !          2907: .CW frinsert
        !          2908: is more complicated because it must do four passes:
        !          2909: one to construct the
        !          2910: .CW Box
        !          2911: list for the inserted string,
        !          2912: one to reconnoitre,
        !          2913: one to copy (in opposite order to
        !          2914: .CW frdelete )
        !          2915: the
        !          2916: .CW Boxes
        !          2917: to make the hole for the new text,
        !          2918: and finally one to copy the new text into place.
        !          2919: Overall, though,
        !          2920: .CW frinsert
        !          2921: has a similar flavor to
        !          2922: .CW frdelete ,
        !          2923: and needn't be described further.
        !          2924: .CW Frinsert
        !          2925: and its subsidiary routines comprise 211 lines of C.
        !          2926: .PP
        !          2927: The terminal source code is 3024 lines of C,
        !          2928: and the host source is 5797 lines.
        !          2929: .NH
        !          2930: Discussion
        !          2931: .NH 2
        !          2932: History
        !          2933: .LP
        !          2934: The immediate ancestor of
        !          2935: .I sam
        !          2936: was the original text editor for the Blit, called
        !          2937: .I jim .
        !          2938: .I Sam
        !          2939: inherited
        !          2940: .I jim 's
        !          2941: two-process structure and mouse language almost unchanged, but
        !          2942: .I jim
        !          2943: suffered from several drawbacks that were addressed in the design of
        !          2944: .I sam .
        !          2945: The most important of these was the lack of a command language.
        !          2946: Although
        !          2947: .I jim
        !          2948: was easy to use for simple editing, it provided no direct help with
        !          2949: large or repetitive editing tasks.  Instead, it provided a command to pass
        !          2950: selected text through a shell pipeline,
        !          2951: but this was no more satisfactory than could be expected of a stopgap measure.
        !          2952: .PP
        !          2953: .I Jim
        !          2954: was written primarily as a vehicle for experimenting with a mouse-based
        !          2955: interface to text, and the experiment was successful.
        !          2956: .I Jim
        !          2957: had some spin-offs:
        !          2958: .I mux ,
        !          2959: the second window system for the Blit, is essentially a multiplexed
        !          2960: version of the terminal part of
        !          2961: .I jim ;
        !          2962: and the debugger
        !          2963: .I pi 's
        !          2964: user interface|reference(cargill feel pi) was closely modeled on
        !          2965: .I jim 's.
        !          2966: But after a couple of years,
        !          2967: .I jim
        !          2968: had become difficult to maintain and limiting to use,
        !          2969: and its replacement was overdue.
        !          2970: .PP
        !          2971: I began the design of
        !          2972: .I sam
        !          2973: by asking
        !          2974: .I jim
        !          2975: customers what they wanted.
        !          2976: This was probably a mistake; the answers were essentially a list of features
        !          2977: to be found in other editors, which did not provide any of the
        !          2978: guiding principles I was seeking.
        !          2979: For instance, one common request was for a ``global substitute,''
        !          2980: but no one suggested how to provide it within a cut-and-paste editor.
        !          2981: I was looking for a scheme that would
        !          2982: support such specialized features comfortably in the context of some
        !          2983: general command language.
        !          2984: Ideas were not forthcoming, though, particularly given my insistence
        !          2985: on removing all limits on file sizes, line lengths and so on.
        !          2986: Even worse, I recognized that, since the mouse could easily
        !          2987: indicate a region of the screen that was not an integral number of lines,
        !          2988: the command language would best forget about newlines altogether,
        !          2989: and that meant the command language had to treat the file as a single
        !          2990: string, not an array of lines.
        !          2991: .PP
        !          2992: Eventually, I decided that thinking was not getting me very far and it was
        !          2993: time to try building.
        !          2994: I knew that the terminal part could be built easily \(em
        !          2995: that part of
        !          2996: .I jim
        !          2997: behaved acceptably well \(em and that most of the hard work was going
        !          2998: to be in the host part: the file interface, command interpreter and so on.
        !          2999: Moreover, I had some ideas about how the architecture of
        !          3000: .I jim
        !          3001: could be improved without destroying its basic structure, which I liked
        !          3002: in principle but which hadn't worked out as well as I had hoped.
        !          3003: So I began by designing the file data structure,
        !          3004: starting with the way
        !          3005: .I jim
        !          3006: worked \(em comparable to a single structure merging
        !          3007: .CW Disc
        !          3008: and
        !          3009: .CW Buffer ,
        !          3010: which I split to make the cache more general
        !          3011: \(em and thinking about how global substitute could be implemented.
        !          3012: The answer was clearly that it had to be done in two passes,
        !          3013: and the transcript-oriented implementation fell out naturally.
        !          3014: .PP
        !          3015: .I Sam
        !          3016: was written bottom-up,
        !          3017: starting from the data structures and algorithms for manipulating text,
        !          3018: through the command language and up to the code for maintaining
        !          3019: the display.
        !          3020: In retrospect, it turned out well, but this implementation method is
        !          3021: not recommended in general.
        !          3022: There were several times when I had a large body of interesting code
        !          3023: assembled and no clue how to proceed with it.
        !          3024: The command language, in particular, took almost a year to figure out,
        !          3025: but can be implemented (given what was there at the beginning of that year)
        !          3026: in a day or two.  Similarly, inventing the
        !          3027: .CW Rasp
        !          3028: data structure delayed the
        !          3029: connection of the host and terminal pieces by another few months.
        !          3030: .I Sam
        !          3031: took about two years to write, although only about four months were
        !          3032: spent actually working on it.
        !          3033: .PP
        !          3034: Part of the design process was unusual:
        !          3035: the subset of the protocol that maintains the
        !          3036: .CW Rasp
        !          3037: was simulated, debugged
        !          3038: and verified by an automatic protocol analyzer,|reference(holzmann atttj) and was bug-free
        !          3039: from the start.
        !          3040: The rest of the protocol, concerned mostly
        !          3041: with keeping menus up to date,
        !          3042: was unfortunately too unwieldy for such analysis,
        !          3043: and was debugged by more traditional methods, primarily
        !          3044: by logging in a file all messages in and out of the host.
        !          3045: .NH 2
        !          3046: Reflections
        !          3047: .LP
        !          3048: .I Sam
        !          3049: is essentially the only interactive editor used by the sixty or so members of
        !          3050: the computing science research center in which I work.
        !          3051: The same could not be said of
        !          3052: .I jim ;
        !          3053: the lack of a command language kept some people from adopting it.
        !          3054: The union of a user interface as comfortable as
        !          3055: .I jim 's
        !          3056: with a command language as powerful as
        !          3057: .CW ed 's\(dg
        !          3058: .FS
        !          3059: .vs 9
        !          3060: \(dg
        !          3061: The people who criticize
        !          3062: .I ed
        !          3063: as an interactive program often forget that it and its close relative
        !          3064: .I sed |reference(kernighan pike)
        !          3065: still thrive as programmable editors.  The strength of these programs is
        !          3066: independent of their convenience for interactive editing.
        !          3067: .br
        !          3068: .vs
        !          3069: .FE
        !          3070: is essential to
        !          3071: .I sam 's
        !          3072: success.
        !          3073: When
        !          3074: .I sam
        !          3075: was first made available to the
        !          3076: .I jim
        !          3077: community,
        !          3078: almost everyone switched to it within two or three days.
        !          3079: In the months that followed, even people who had never adopted
        !          3080: .I jim
        !          3081: started using
        !          3082: .I sam
        !          3083: exclusively.
        !          3084: .PP
        !          3085: To be honest,
        !          3086: .I ed
        !          3087: still gets occasional use, but usually when
        !          3088: something quick needs to be done and the overhead of
        !          3089: downloading the terminal part of
        !          3090: .I sam
        !          3091: isn't worth the trouble.
        !          3092: Also, as a `line' editor,
        !          3093: .CW "sam -d"
        !          3094: is a bit odd;
        !          3095: when using a good old ASCII terminal, it's comforting to have
        !          3096: a true line editor.
        !          3097: But it is fair to say that
        !          3098: .I sam 's
        !          3099: command language has displaced
        !          3100: .I ed 's
        !          3101: for most of the complicated editing that has kept line editors
        !          3102: (that is, command-driven editors) with us.
        !          3103: .PP
        !          3104: .I Sam 's
        !          3105: command language is even fancier than
        !          3106: .I ed 's,
        !          3107: and most
        !          3108: .I sam
        !          3109: customers don't come near to using all its capabilities.
        !          3110: Does it need to be so sophisticated?
        !          3111: I think the answer is yes, for two reasons.
        !          3112: .PP
        !          3113: First, the
        !          3114: .I model
        !          3115: for
        !          3116: .I sam 's
        !          3117: command language is really relatively simple, and certainly simpler than that of
        !          3118: .I ed .
        !          3119: For instance, there is only one kind of textual loop in
        !          3120: .I sam
        !          3121: \(em the
        !          3122: .CW x
        !          3123: command \(em
        !          3124: while
        !          3125: .I ed
        !          3126: has three (the
        !          3127: .CW g
        !          3128: command, the global flag on substitutions, and the implicit loop over
        !          3129: lines in multi-line substitutions).
        !          3130: Also,
        !          3131: .I ed 's
        !          3132: substitute command is necessary to make changes within lines, but in
        !          3133: .I sam
        !          3134: the
        !          3135: .CW s
        !          3136: command is more of a familiar convenience than a necessity;
        !          3137: .CW c
        !          3138: and
        !          3139: .CW t
        !          3140: can do all the work.
        !          3141: .PP
        !          3142: Second,
        !          3143: given a community that expects an editor to be about as powerful as
        !          3144: .I ed ,
        !          3145: it's hard to see how
        !          3146: .I sam
        !          3147: could really be much simpler and still satisfy that expectation.
        !          3148: People want to do ``global substitutes,'' and most are content
        !          3149: to have the recipe for that and a few other fancy changes.
        !          3150: The sophistication of the command language is really just a veneer
        !          3151: over a design that makes it possible to do global substitutes
        !          3152: in a screen editor.
        !          3153: Some people will always want something more, however, and it's gratifying to
        !          3154: be able to provide it.
        !          3155: The real power of
        !          3156: .I sam 's
        !          3157: command language comes from composability of the operators, which is by
        !          3158: nature orthogonal to the underlying model.
        !          3159: In other words,
        !          3160: .I sam
        !          3161: is not itself complex, but it makes complex things possible.
        !          3162: If you don't want to do anything complex, you can ignore the
        !          3163: complexity altogether, and many people do so.
        !          3164: .PP
        !          3165: Sometimes I am asked the opposite question: why didn't I just make
        !          3166: .I sam
        !          3167: a real programmable editor, with macros and variables and so on?
        !          3168: The main reason is a matter of taste: I like the editor
        !          3169: to be the same every time I use it.
        !          3170: There is one technical reason, though:
        !          3171: programmability in editors is largely a workaround for insufficient
        !          3172: interactivity.
        !          3173: Programmable editors are used to make particular, usually short-term,
        !          3174: things easy to do, such as by providing shorthands for common actions.
        !          3175: If things are generally easy to do in the first place,
        !          3176: shorthands are not as helpful.
        !          3177: .I Sam
        !          3178: makes common editing operations very easy, and the solutions to
        !          3179: complex editing problems seem commensurate with the problems themselves.
        !          3180: Also, the ability to edit the
        !          3181: .I sam
        !          3182: window makes it easy to repeat commands \(em it only takes a mouse button click
        !          3183: to execute a command again.
        !          3184: .NH 2
        !          3185: Pros and cons
        !          3186: .LP
        !          3187: .I Sam
        !          3188: has several other good points,
        !          3189: and its share of problems.
        !          3190: Among the good things is the idea of
        !          3191: structural regular expressions,
        !          3192: whose usefulness has only begun to be explored.
        !          3193: They were arrived at serendipitously when I attempted to distill the essence of
        !          3194: .I ed 's
        !          3195: way of doing global substitution and recognized that the looping command in
        !          3196: .I ed
        !          3197: was implicitly imposing a structure (an array of lines) on the file.
        !          3198: .PP
        !          3199: Another of
        !          3200: .I sam 's
        !          3201: good things is its undo capability.
        !          3202: I had never before used an editor with a true undo,
        !          3203: but I would never go back now.
        !          3204: Undo
        !          3205: .I must
        !          3206: be done well, but if it is, it can be relied on.
        !          3207: For example,
        !          3208: it's safe to experiment if you're not sure how to write some intricate command,
        !          3209: because if you make a mistake, it can be fixed simply and reliably.
        !          3210: I learned two things about undo from writing
        !          3211: .I sam :
        !          3212: first, it's easy to provide if you design it in from the beginning, and
        !          3213: second, it's necessary, particularly if the system has some subtle
        !          3214: properties that may be unfamiliar or error-prone for users.
        !          3215: .PP
        !          3216: .I Sam 's
        !          3217: lack of internal limits and sizes is a virtue.
        !          3218: Because it avoids all fixed-size tables and data structures,
        !          3219: .I sam
        !          3220: is able to make global changes to files that some of our other
        !          3221: tools cannot even read.
        !          3222: Moreover, the design keeps the performance linear when doing such
        !          3223: operations, although I must admit
        !          3224: .I sam
        !          3225: does get slow when editing a huge file.
        !          3226: .PP
        !          3227: Now, the problems.
        !          3228: Externally, the most obvious is that it is poorly integrated into the
        !          3229: surrounding window system.
        !          3230: By design, the user interface in
        !          3231: .I sam
        !          3232: feels almost identical to that of
        !          3233: .I mux ,
        !          3234: but a thick wall separates text in
        !          3235: .I sam
        !          3236: from the programs running in
        !          3237: .I mux .
        !          3238: For instance, the `snarf buffer' in
        !          3239: .I sam
        !          3240: must be maintained separately from that in
        !          3241: .I mux .
        !          3242: This is regrettable, but probably necessary given the unusual configuration
        !          3243: of the system, with a programmable terminal on the far end of an RS-232 link.
        !          3244: .PP
        !          3245: .I Sam
        !          3246: is reliable; otherwise, people wouldn't use it.
        !          3247: But it was written over such a long time, and has so many new (to me)
        !          3248: ideas in it, that I would like to see it done over again to clean
        !          3249: up the code and remove many of the lingering problems in the implementation.
        !          3250: The worst part is in the interconnection of the host and terminal parts,
        !          3251: which might even be able to go away in a redesign for a more
        !          3252: conventional window system.
        !          3253: The program must be split in two to use the terminal effectively,
        !          3254: but the low bandwidth of the connection forces the separation to
        !          3255: occur in an inconvenient part of the design if performance is to be acceptable.
        !          3256: A simple remote procedure call
        !          3257: protocol driven by the host, emitting only graphics
        !          3258: commands, would be easy to write but wouldn't have nearly the
        !          3259: necessary responsiveness.  On the other hand, if the terminal were in control
        !          3260: and requested much simpler file services from the host, regular expression
        !          3261: searches would require that the terminal read the entire file over its RS-232
        !          3262: link, which would be unreasonably slow.
        !          3263: A compromise in which either end can take control is necessary.
        !          3264: In retrospect, the communications protocol should have been
        !          3265: designed and verified formally, although I do not know of any tool
        !          3266: that can adequately relate the protocol to
        !          3267: its implementation.
        !          3268: .PP
        !          3269: Not all of
        !          3270: .I sam 's
        !          3271: users are comfortable with its command language, and few are adept.
        !          3272: Some (venerable) people use a sort of
        !          3273: .I ed \& ``
        !          3274: subset'' of
        !          3275: .I sam 's
        !          3276: command language,
        !          3277: and even ask why
        !          3278: .I sam 's
        !          3279: command language is not exactly
        !          3280: .I ed 's.
        !          3281: (The reason, of course, is that
        !          3282: .I sam 's
        !          3283: model for text does not include newlines, which are central to
        !          3284: .I ed .
        !          3285: Making the text an array of newlines to the command language would
        !          3286: be too much of a break from the seamless model provided by the mouse.
        !          3287: Some editors, such as
        !          3288: .I vi ,
        !          3289: are willing to make this break, though.)
        !          3290: The difficulty is that
        !          3291: .I sam 's
        !          3292: syntax is so close to
        !          3293: .I ed 's
        !          3294: that people believe it
        !          3295: .I should
        !          3296: be the same.
        !          3297: I thought, with some justification in hindsight,
        !          3298: that making
        !          3299: .I sam
        !          3300: similar to
        !          3301: .I ed
        !          3302: would make it easier to learn and to accept.
        !          3303: But I may have overstepped and raised the users'
        !          3304: expectations too much.
        !          3305: It's hard to decide which way to resolve this problem.
        !          3306: .PP
        !          3307: Finally, there is a tradeoff in
        !          3308: .I sam
        !          3309: that was decided by the environment in which it runs:
        !          3310: .I sam
        !          3311: is a multi-file editor, although in a different system there might instead be
        !          3312: multiple single-file editors.
        !          3313: The decision was made primarily because starting a new program in a Blit is
        !          3314: time-consuming.
        !          3315: If the choice could be made freely, however, I would
        !          3316: still choose the multi-file architecture, because it allows
        !          3317: groups of files to be handled as a unit;
        !          3318: the usefulness of the multi-file commands is incontrovertible.
        !          3319: It is delightful to have the source to an entire program
        !          3320: available at your fingertips.
        !          3321: .NH
        !          3322: Acknowledgements
        !          3323: .LP
        !          3324: Tom Cargill suggested the idea behind the
        !          3325: .CW Rasp
        !          3326: data structure.
        !          3327: Norman Wilson and Ken Thompson influenced the command language.
        !          3328: This paper was improved by comments from
        !          3329: Al Aho,
        !          3330: Jon Bentley,
        !          3331: Chris Fraser,
        !          3332: Gerard Holzmann,
        !          3333: Brian Kernighan,
        !          3334: Ted Kowalski,
        !          3335: Doug McIlroy
        !          3336: and
        !          3337: Dennis Ritchie.
        !          3338: .NH
        !          3339: References
        !          3340: .LP
        !          3341: |reference_placement

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.