|
|
1.1 root 1: .so ../ADM/mac
2: .XX upas 557 "Upas \(em A Simpler Approach to Network Mail"
3: .TL
4: Upas \(em a Simpler Approach to Network Mail
5: .AU
6: David L. Presotto
7: William R. Cheswick
8: .AI
9: .MH
10: .AB
11: .I Upas*
12: is a mail interface that routes messages between existing
13: network-specific mailers, users, and user mailboxes.
14: It uses a language based on regular expressions describe
15: how to convert mail
16: addresses into the commands needed to route the mail to the intended
17: destination.
18: Upas is the mail interface for the Tenth Edition
19: .UX
20: system.
21: .AE
22: .2C
23: .FS
24: *
25: .B upas ,
26: .I "u\(aapas, n" .
27: (in full
28: .B u\(aapas-tree\(aa ),
29: a fabulous Javanese tree that poisoned everything for miles
30: around; Javanese tree (\c
31: .I "Antiaris toxicara" ,
32: of the mulberry family): the poison of its latex.
33: [Malay, poison.]
34: |reference(dictionary chambers)
35: .FE
36: .NH 1
37: Introduction
38: .PP
39: Our entry in the `mail race' sprang from events similar to those
40: motivating the development of many mail systems.
41: For many years a short and simple mailer was used to deliver local mail
42: and to route mail via our home-grown networks.
43: Although its user interface left a little to be desired, its reliability
44: was so high that great trust was put into it.
45: However, as we gained access to more and more networks, particularly ones
46: over which we had no control, the situation quickly deteriorated.
47: Each of these networks had their own mail `standards' and addressing conventions.
48: With some trepidation, we absorbed these standards into our mailer.
49: Its simplicity was quickly lost along with its fabled reliability.
50: Realizing our danger, we decided to step back and see if there was a
51: way to get back to a simple, well-understood, and thereby reliable mail system.
52: .PP
53: The job to be performed by a network mail system is illustrated by Figure 1.
54: A mail system is essentially a large switch for handling the routing and
55: delivery of messages.
56: As a router it must be conversant in the various network protocols,
57: be able to decipher destination addresses, and pass messages along
58: to the next network.
59: Sometimes it actually gets to deliver a piece of mail to a mailbox.
60: Also, since there is no common mail format, the mail system must
61: convert messages from one format to another as it routes them from
62: network to network.
63: Because of the number of networks and mail formats, this can easily lead to
64: thousands of lines of code.
65: Our task was to decide how to partition the task in order to create a
66: manageable yet efficient mail system.
67: .1C
68: .KF
69: .PS
70: define net |
71: [ellipse "network" "$1";
72: arrow -> from last ellipse.s down boxht/3;
73: box invis ht boxht/3 "protocol";
74: arrow -> from last box.s down boxht/3;
75: box invis ht boxht/3 "convert";
76: arrow -> from last box.s down boxht/3;
77: box invis;
78: arrow -> down boxht/3;
79: box invis ht boxht/3 "queue";
80: arrow -> from last box.s down boxht/3;
81: box invis ht boxht/3 "convert";
82: arrow -> from last box.s down boxht/3;
83: box invis ht boxht/3 "protocol";
84: arrow -> from last box.s down boxht/3;
85: ellipse "network" "$1";
86: ] |
87:
88: # network mail
89: NetA: net(A)
90: move right boxwid/4
91: NetB: net(B)
92: move right boxwid/4
93: NetC: net(C)
94:
95: # local mail
96: move to NetA.w + 0,boxht/3;
97: arrow <- left;
98: ellipse "user";
99: move to NetC.e + 0,boxht/3;
100: arrow -> right;
101: ellipse "mail" "box"
102:
103: # the router
104: box dashed wid 3*boxwid at NetB + 0,boxht/3 "routing"
105: .PE
106: .sp .5
107: .ce
108: \fBFigure 1.\fP The functions to be performed to route network mail.
109: .sp
110: .KE
111: .2C
112: .NH 1
113: Some Observations
114: .PP
115: The task of interfacing to a particular network is often
116: messy and arbitrary.
117: Fortunately, most entities (corporations, governments, committees)
118: that design network protocols also provide code (i.e. mail programs)
119: that understand these protocols.
120: In our experience, it has
121: always been easier to interface one of these mailers to our
122: mail system than to incorporate the new protocols
123: into our existing mailer.
124: Also, code provided by someone else is supported by someone else.
125: As network protocols change it is easier to pick up the new version of the
126: network mailer than to rewrite our mailer.
127: .PP
128: Although there are many networks, there are far fewer message formats.
129: It is clear that a message needs a destination address and possibly even
130: a reply address.
131: However, the imposition of further structure on the message is at best
132: distasteful, at worst obstructive.
133: Imagine what postal delivery would be like if the Postal Service opened
134: each piece of mail to ensure that it is correctly dated and signed,
135: that the form of address is correct, and that the company letterhead
136: obeys some preconceived format,
137: refusing delivery if any of these conditions are not met.
138: Unfortunately, some networks impose such requirements.
139: For a message to obey one standard is difficult enough.
140: To expect it to survive a number of conversions between
141: restrictive standards constitutes wishful thinking.
142: Because of this, most networks adopt standards established by
143: older or larger networks.
144: Therefore, although there are many networks, there are relatively few
145: message formats.
146: .PP
147: A network address describes a path through a number of machines
148: and networks.
149: This path may be rather simple, consisting of a single machine
150: and user name.
151: Often, however, the path crosses a number of administrative domains.
152: Each such domain imposes some rules for structuring paths within the
153: domain.
154: Unfortunately, there is no adhered-to standard
155: for binding the path segments from each domain into a single
156: address.
157: The networks differ on direction of binding (person@machine vs. machine!person),
158: delimiters (`.' vs. `@' vs. `%'), quotation marks, and even case sensitivity.
159: Therefore, there is no fixed way to correctly parse and understand a
160: network address.
161: Instead, there are conventions which tend to be very short-lived,
162: usually until someone issues a new RFC or a new network appears.
163: As a relatively simple example, consider a message sent from one \fIuucp\fP
164: |reference(uucp v7man network)
165: network, through ARPAnet, to another \fIuucp\fP network.
166: The address format might be something like:
167: .P1
168: A!B!person%E%D@C
169: .P2
170: The rules for parsing such an address are easily defined.
171: Unfortunately, the conventions underlying the rules change from day to day.
172: Once you've managed to write your code, the administrator
173: at B may decide that he won't accept percent signs in an address
174: and would really like the address to look like:
175: .P1
176: A!B!@d,@e:person@c
177: .P2
178: A new set of parsing rules now have to be defined.
179: In our experience these changes happen with maddening frequency.
180: They are the direct result of there being no single comprehensive
181: standard or administrative authority.
182: Therefore, we have to treat address parsing rules as
183: ephemeral.
184: Any network mailer should be able to change its address parsing rules
185: frequently and with little difficulty.
186: Tying them to one particular standard such as this week's Internet rules is
187: equivalent to planned obsolescence.
188: .PP
189: Finally, we should make a point about reversibility that many other
190: mail designers seem to have missed.
191: In addition to parsing destination addresses, mailers are expected to
192: maintain some form of return address attached to the message.
193: This often involves changing the current return address to one
194: that the mailer will accept as a reply destination.
195: A mailer should parse and modify return addresses using the same rules
196: as it does for destination addresses.
197: Otherwise, as is too often the case, the mailer will reject the very addresses that
198: it has provided for replies.
199: .NH 1
200: A Solution
201: .PP
202: The best solution would have been to throw out all the so-called standards and
203: create a single coherent scheme for formatting and addressing mail.|reference(hideous pike weinberger)
204: However, since we have no power to impose such a scheme,
205: we have tried to use the above-stated requirements and observations
206: to build a mail system that makes the best of a bad situation.
207: .PP
208: The structure of our mail system is depicted in figure 2.
209: Each network has its own interface program for message reception
210: and transmission.
211: In general these are the network-specific mailers provided
212: with the networks.
213: When a message enters from a network, the network
214: specific mailer gives it to Upas.
215: Upas then either deposits the mail in a local mail box or routes the
216: mail to the next network.
217: A format-specific filter may be called to convert the message
218: from network format to one Upas understands or vice-versa;
219: The
220: .UX
221: format is built in.
222: .1C
223: .KF
224: .PS 5i
225: copy "over.cip"
226: .PE
227: .sp .5
228: .ce 2
229: \fBFigure 2.\fP The structure of Upas.
230: .sp
231: .KE
232: .2C
233: .NH 1
234: Message Routing
235: .PP
236: The routing of messages is determined by a destination address
237: and by a set of rewriting rules kept in the file
238: .CW /usr/lib/upas/rewrite .
239: Each line of the file is a rule.
240: Blank lines and lines beginning with
241: .CW #
242: are ignored.
243: .nr ss \w'conversion '
244: .PP
245: Each rewriting rule consists of four fields:
246: .IP \fIpattern\fR \n(ssu
247: An
248: .I ed (1)-like
249: regular expression, with simple parentheses playing the role
250: of
251: .CW \e(
252: and
253: .CW \e)
254: and with the
255: .CW +
256: and
257: .CW ?
258: operators of
259: .I egrep (1).
260: This regular expression must match the entire destination
261: address. Case is ignored.
262: .IP \fIcommand\fR \n(ssu
263: One of the following rewrite commands:
264: .I alias ,
265: .I auth ,
266: .I translate .
267: .I | ,
268: or
269: .I >> .
270: .IP \fIparameter\fR \n(ssu
271: An
272: .I ed (1)-style
273: replacement string to generate a
274: parameter to the
275: .I command .
276: .IP \fIaddress-list\fR \n(ssu
277: A list of addresses that might be shipped with a single command.
278: .PP
279: The
280: .I pattern ,
281: .I parameter ,
282: and
283: .I address-list
284: fields may use the following:
285: .KS
286: .IP \f(CW\es\fP \n(ssu
287: The address of the sender.
288: .IP \f(CW\el\fP \n(ssu
289: The name of the local machine.
290: .IP \f(CW&\fP \n(ssu
291: The entire destination address.
292: .KE
293: .PP
294: The
295: .I parameters
296: and
297: .I address-list
298: fields may use
299: .CW \e0
300: through
301: .CW \e9
302: to match the first ten parenthesized groups matched in the
303: .I pattern
304: field.
305: .PP
306: When rewriting a destination address,
307: Upas starts with the first rule and continues
308: down the list until a pattern
309: matches the destination address.
310: The command on that line is executed.
311: If no match is found, the
312: mail is returned to sender with an error.
313: If the command does not result in mail delivery
314: (i.e is not
315: .CW |
316: or
317: .CW >> ),
318: Upas scans the rules again with the latest version of
319: the destination address, starting from the first rule.
320: .1C
321: .KF
322: .P1
323: # local mail
324: [^!@%]+ translate "exec translate '&'"
325: local!([^!@%]+) >> /usr/spool/mail/\e1
326: \el!(.+) alias \e1
327:
328: # convert %@ format to ! format
329: (_822_)!((.+)!)?([^!]+)[%@]([^!%@]+) alias \e1!\e2\e5!\e4
330: ([^!]+)[%@]([^!@%]+) alias _822_!\e2!\e1
331: _822_!(.+) alias \e1
332:
333: # special domain names
334: ([^!.]+)\e.(att\e.com|uucp)!(.+) alias \e1!\e3
335:
336: ([^!]+)!(.+) | "/usr/lib/upas/route '\es' '\e1'" "'\e2'"
337: .P2
338: .sp .5
339: .ce 2
340: \fBFigure 3.\fP Sample rewrite file for a machine using \fIuucp\fP only.
341: .sp
342: .KE
343: .2C
344: .PP
345: There are five rewrite commands:
346: .IP \fIalias\fR \n(ssu
347: Rewrites the address with the pattern
348: in the
349: .I parameter
350: field.
351: .IP \fIauth\fR \n(ssu
352: Calls
353: .I parameter
354: to authorize the mail.
355: A zero exit status approves the mail, non-zero rejects it.
356: The
357: .I auth
358: command is called only once per message.
359: If it is never called, the mail is approved.
360: .IP \fItranslate\fR \n(ssu
361: Calls
362: .I parameter
363: to rewrite the address. The program must write the new
364: address(es) to standard output. This command is used to implement
365: mailing lists.
366: .IP \f(CW|\fP \n(ssu
367: Pipe the message to the mail delivery agent
368: .I parameter .
369: The
370: .I address-list
371: parameter is a list of recipients with the same destination
372: machine. If the delivery agent fails, the message is
373: returned to the sender with the error message from the
374: delivery program's standard error file.
375: .IP \f(CW>>\fP \n(ssu
376: Deliver the message to a local mailbox. The file given in
377: .I parameter
378: must either exist and appear to be a valid mailbox,
379: or the last name in the path must be a user name found in
380: .CW /etc/passwd .
381: .PP
382: Rules for most networks can be specified in one or two lines.
383: In addition, the rules are in a language familiar to most
384: experienced
385: .UX
386: programmers: the regular expressions
387: seen in many editors, languages, and utilities.
388: By using such a mini-language, it becomes an easy task to build or
389: modify Upas configuration files.
390: The result is that configuration files rarely contain gross
391: mistakes and take very little time to create
392: and to edit when addressing conventions change.
393: Further, the rewrite file is reread for each new mail delivery, so a
394: change to the rewrite file will take effect immediately.
395: .NH 1
396: SMTP Message Format Conversion
397: .PP
398: Upas uses only the
399: .I uucp -style
400: addressing internally.
401: The mail delivery program must convert between this
402: form and its own, if different. For example, the
403: .I smtpd
404: daemon must convert incoming RFC822 addresses to
405: .I uucp
406: form when calling
407: Upas, and the
408: .I smtp
409: program generates header lines on outgoing mail.
410: .PP
411: The outbound conversion to SMTP format is required by RFC822. Specifically,
412: three header lines are required:
413: .CW Date: ,
414: .CW To: ,
415: and one of several variants of
416: .CW From: .
417: If the message appears to have these header lines, and the lines are
418: formatted properly, the message is sent unaltered.
419: For example, if there is an original
420: .CW From:
421: line with an address in the requested domain, it is left alone. Otherwise, we
422: generate a
423: .CW From:
424: line and turn any existing one into
425: .CW Original-From: .
426: Missing information is filled in from the Unix-style
427: .CW From
428: line.
429: .PP
430: We do not add other header lines to mail.
431: These provide extra bulk (over ten percent
432: in one of our surveys) with little added utility.
433: In particular,
434: .CW Received:
435: lines are only rarely useful, and the information they provide appears
436: in our log files.
437: .PP
438: Incoming SMTP destination addresses are derived from the
439: envelope addresses and header information.
440: The senders address is extracted from the first of the following header lines found:
441: .UX
442: .CW From ,
443: .CW Reply-to: ,
444: .CW Sender: ,
445: .CW From: ,
446: and the sender given in the SMTP
447: .CW "MAIL FROM:"
448: command.
449: .PP
450: The early versions
451: handled uucp and SMTP addressing internally. Later, SMTP was broken
452: out into two pairs of filters:
453: .I smtpd
454: and
455: .I fromsmtp ,
456: and
457: .I tosmtp
458: and
459: .I smtp .
460: .I Fromsmtp
461: and
462: .I tosmtp
463: were filters that extracted and created RFC822 addressing and headers,
464: respectively. Recently, these filters were folded into
465: .I smtpd
466: and
467: .I smtp
468: for efficiency reasons.
469: .NH 1
470: User Control
471: .PP
472: Users often wish to specify alternate ways to dispose of their mail.
473: Upas offers two choices.
474: The first line of a user's mail
475: file is interpreted as a command to the mail system.
476: If the line is of the format
477: .P1
478: Forward to \fIlist-of-addresses
479: .P2
480: the mail is forwarded to each recipient in
481: .I list-of-addresses.
482: While this can be used to forward a single user's mail, it
483: can be also be used to create mailing lists.
484: To do this, one creates a file in the mail directory whose name is
485: that of the mailing list and which consists of
486: .CW "Forward to"
487: followed by the list of recipients.
488: .PP
489: If the first line is of the format is
490: .P1
491: Pipe to \fIshell-command
492: .P2
493: .I shell-command
494: is executed when mail is delivered, with the message as standard input.
495: .NH 1
496: Concealing Machine Names
497: .PP
498: It is often useful to hide several machines behind a single mail machine.
499: For example, our center has over 50 machines, but all mail is directed
500: through the machine named
501: .CW research .
502: The files
503: .CW /usr/lib/upas/names.*
504: contain routing information for each user. A sample entry
505: might be:
506: .P1
507: andrew pipe!andrew
508: .P2
509: Mail sent to
510: .CW research!andrew
511: will be directed to
512: .CW pipe ,
513: .CW andrew 's
514: home machine. But mail from
515: .CW andrew
516: should appear to come from
517: .CW research ,
518: not
519: .CW pipe .
520: .PP
521: To hide names, Upas attempts to translate the last field of the
522: sender's address. If the translation exactly matches the entire
523: sending address, the sending address is truncated to the last field.
524: .NH 1
525: Loop Detection
526: .PP
527: Detecting forward loops, like those provoked by
528: .CW "Forward to"
529: is difficult.
530: It involves combining the forwarding lists of all involved machines
531: into a single directed graph and then performing a search or
532: partitioning to detect cycles.
533: However, if we allow a detection algorithm to reject some legal although
534: highly-unlikely cases along with real loops, we greatly simplify the problem.
535: .PP
536: In the case of a single machine,
537: an infinite forwarding loop corresponds to
538: infinite recursion of the mailer.
539: If a mailer rejects any message that results in recursion past a
540: certain depth, it will reject all loops and some small number of legal
541: but very long mail redirections.
542: In our case a depth is 32 and to date, no legal forwarding loop has been
543: more than 3 steps long.
544: .PP
545: In the case of a multi-machine loop, the recursion technique is not valid.
546: However, we can still use a similar method.
547: Instead of counting recursion, we scan the
548: .CW From
549: line to see the number
550: of times the local machine name occurs in the path.
551: If this exceeds a limit (in our case 8), the mail is returned to the sender.
552: .NH 1
553: Installation
554: .PP
555: Upas has been ported to most major versions of the
556: .UX
557: system.
558: The source contains a
559: .CW config
560: directory where the working Upas directories are specified.
561: Each variant of Upas is made from a separate directory with
562: .I make (1).
563: The
564: .CW makefile
565: may require some editing to select the needed programs.
566: The
567: .CW config
568: directory contains a number of sample
569: rewrite and routing files.
570: .NH 1
571: A Comparison With Sendmail
572: .PP
573: Upas is an attempt to solve the same problem previously attacked by Sendmail
574: |reference(sendmail).
575: Upas owes much of its design and success to Sendmail.
576: The idea of designing Upas as a central switcher
577: communicating with network-specific mailers comes directly from Sendmail.
578: The reasons we wrote Upas and didn't just adopt Sendmail are:
579: .IP \(bu
580: We strongly favor messages whose only formatted portion are the
581: destination and reply addresses.
582: Sendmail has an unfortunate predilection for verbose and rigidly-structured
583: messages that we would like to avoid.
584: .IP \(bu
585: Sendmail configuration files are famous for their inscrutability.
586: We wanted a system that had simpler and therefore more easily
587: verifiable rewriting rules.
588: .IP \(bu
589: Sendmail combines the functions of routing, queuing, aliasing,
590: transmission, header
591: processing, delivery, translation, etc., into a single large program.
592: This extra design makes Sendmail more complicated and harder
593: to understand and support. Upas's modular design simplifies these
594: tasks.
595: .IP \(bu
596: The size of sendmail has left it prone to several security
597: problems, some intentional. It is easier to understand and
598: check a smaller, more modular program.
599: .NH 1
600: Lessons
601: .PP
602: The philosophy behind Upas has not changed much since its original
603: description in |reference(upas presotto),
604: but there have been many implementation changes.
605: The rewrite file now has five commands compared to the original
606: generalized command execution. Removing case sensitivity, and
607: anchoring the pattern matches by default has made them more versatile
608: and easier to read.
609: .PP
610: The early Upas understood
611: .I uucp
612: and SMTP addresses and formatting.
613: The SMTP portions are now broken out in separate programs,
614: simplifying the processing. The
615: .I uucp -style
616: address has
617: proven quite easy to teach and to use. For example,
618: .P1 2n
619: bitnet!templevm!rdk
620: .P2
621: is much easier to teach and use than
622: .P1 2n
623: rdk%[email protected]
624: .P2
625: .PP
626: Authorization was implemented with a file lookup of trusted machines.
627: Now, a command can implement arbitrary policies.
628: .NH 1
629: Summary
630: .PP
631: We have presented a simple yet flexible network mail system.
632: It gains its simplicity from a number of assumptions which are
633: valid in most networked computers.
634: By using existing network-specific mailers as expert systems
635: that deal with network details, Upas itself remains relatively
636: simple and understandable.
637: Finally, by using a mini-language already
638: familiar to most
639: .UX
640: programmers, Upas is easily modified
641: to respond to changes in the name space and topology of the
642: network.
643: .PP
644: Upas has run at Research and on the AT&T Internet gateway for
645: nearly two years now. It has performed well in these demanding
646: environments, adjusting nicely to the changes. Its flexibility
647: comes at the cost of efficiency. Even so, we have handled nearly
648: four thousand messages per day on a VAX 750 with reasonable, if
649: not spectacular, throughput.
650: .NH 1
651: Acknowledgements
652: .PP
653: Many people have contributed to the success of Upas. MIT supplied the
654: original SMTP code, which was improved by many people.
655: Bill Cheswick, Geoff Collyer, Ian Darwin, Peter Honeyman, Dave Presotto,
656: and Dennis Ritchie have all had a hand in the code. We have received
657: helpful feedback from
658: Steven Bellovin,
659: Jonathan Clark,
660: and
661: Marcel Frank-Simon.
662: .NH 1
663: References
664: .LP
665: |reference_placement
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.