|
|
1.1 root 1: .ds
2: .po 10
3: .de ph
4: .sp 1
5: ..
6: .de se
7: .sp 1
8: ..
9: .de bb
10: .in+8
11: .sp
12: ..
13: .de be
14: .in -8
15: .sp
16: ..
17: .tr &
18: .m3 3
19: .m1 +1
20: .he ''''
21: .bp
22: .ce14
23: THE DESIGN AND IMPLEMENTATION
24: OF INGRES
25:
26: by
27:
28: MICHAEL STONEBRAKER, EUGENE WONG AND PETER KREPS
29: DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
30: UNIVERSITY OF CALIFORNIA, BERKELEY, CA.
31:
32: and
33:
34: GERALD HELD
35: TANDEM COMPUTERS, INC.
36: CUPERTINO, CA.
37: .sp 5
38:
39:
40:
41:
42: ABSTRACT
43: .ph
44: This paper describes the CURRENTLY OPERATIONAL
45: version of the INGRES data base management system.
46: This multi-user system gives a relational view of
47: data, supports two high level non-procedural data
48: sublanguages and runs as a collection of user
49: processes on top of the UNIX operating system for
50: Digital Equipment Corporation PDP 11/40, 11/45 and 11/70
51: computers.
52: Stressed here are the design decisions and
53: tradeoffs related to 1) structuring the
54: system into processes, 2) embedding one
55: command language in a general purpose programming language,
56: 3) the algorithms implemented to process interactions,
57: 4) the access methods implemented 5) the concurrency
58: and recovery control provided 6) support for views, protection
59: and integrity constraints and 7) the data
60: structures used for system catalogs and role
61: of the data base administrator.
62: .bp
63: .fo 'DESIGN OF INGRES (I)'-%-'12/5/75'
64: 1 INTRODUCTION
65:
66: INGRES (Interactive Graphics and Retrieval System) is
67: a relational data base system which is
68: implemented
69: on top of
70: the UNIX operating system developed at Bell Telephone
71: Laboratories [RITC74] for Digital Equipment
72: Corporation PDP 11/40, 11/45 and 11/70 computer systems.
73: The implementation of INGRES is primarily programmed in "C", a high
74: level language in which UNIX itself is written. Parsing is done
75: with the assistance of YACC, a compiler-compiler available
76: on UNIX [JOHN74].
77:
78: The advantages of a relational model for data
79: base management systems have been extensively discussed
80: in the literature, [CODD70,CODD74,DATE74] and hardly require further elaboration.
81: In choosing the relational model, we were particularly
82: motivated by (a) the high degree of data independence that such
83: a model affords, and (b) the possibility
84: of providing a high level and entirely procedure-free
85: facility for data definition, retrieval, update, access
86: control, support of views, and integrity
87: verification.
88:
89: In this paper we will describe the design decisions made in INGRES.
90: In particular, we will stress the design and implementation of:
91:
92: .ss
93: a) the embedding of all INGRES commands in the general purpose
94: programming language "C"
95:
96: b) the access methods implemented
97:
98: c) the catalog structure and the role of the data base
99: administrator
100:
101: d) support for views, protection and integrity constraints
102:
103: e) the decomposition procedure implemented
104:
105: f) implementation of updates and consistency of secondary indices
106:
107: g) recovery and concurrency control
108:
109: .ds
110: Except where noted to the contrary, this paper describes the
111: INGRES system operational in January, 1976.
112:
113: To this end we first briefly describe in Section 1.2
114: the primary query language supported, QUEL,
115: and the utility commands accepted by the current system.
116: The second user interface, CUPID, is a graphics
117: oriented, casual user language which is also
118: operational and described in [MCDO75a, MCDO75b].
119: It will not be discussed further in this paper.
120: Then in Section
121: 1.3 we describe the relevant factors in the UNIX environment which have affected our
122: design decisions.
123:
124: In Section 2 we discuss the structure of the four
125: processes (see Section 1.3 for a discussion of
126: this UNIX notion)
127: into which INGRES is divided and the reasoning behind the
128: choice implemented. The EQUEL (Embedded QUEL) precompiler, which allows the substitution of
129: a user-supplied C program for the "front end" process is also
130: discussed.
131: This program has the effect of embedding all
132: of INGRES in the general purpose
133: programming language "C".
134: Then in Section 3 we indicate the data structures which are implemented in
135: INGRES, the catalog (system) relations which exist and the role
136: of the data base administrator with respect to all relations in a data base.
137: The implemented access methods, their calling conventions, and the
138: actual layout of data pages in secondary storage
139: where appropriate, are also presented.
140:
141: Sections 4, 5 and 6 discuss respectively the various functions of each of the
142: three "core" processes in the system. Also discussed are the design and
143: implementation strategy of each process. Lastly, Section 7 draws
144: conclusions, suggests future extensions and indicates the nature of the
145: current applications run on INGRES.
146:
147: 1.2 QUEL AND THE OTHER INGRES UTILITY COMMANDS
148:
149: QUEL (QUEry Language) has points in common with Data Language/ALPHA
150: [CODD71], SQUARE
151: [BOYC73] and SEQUEL [CHAM74] in that it is a complete [CODD72] query language which frees the
152: programmer from concern for how data structures are implemented and what
153: algorithms are operating on stored data. As such it facilitates a considerable
154: degree of data independence [STON74a].
155:
156: The QUEL examples in this section all concern
157: the following relation.
158:
159: .nf
160: .ne 7
161: NAME DEPT SALARY MANAGER AGE
162: .ss
163:
164: Smith toy 10000 Jones 25
165: EMPLOYEE Jones toy 15000 Johnson 32
166: Adams candy 12000 Baker 36
167: Johnson toy 14000 Harding 29
168: Baker admin 20000 Harding 47
169: Harding admin 40000 none 58
170: .ds
171: .fi
172:
173: Indicated here is an EMPLOYEE relation with domains NAME, DEPT, SALARY,
174: MANAGER and AGE. Each employee has a manager (except for Harding who
175: is presumably the company president), a salary, an age, and is in a department.
176:
177: A QUEL interaction includes at least one RANGE statement of the form:
178:
179: RANGE OF variable-list IS relation-name
180:
181: The symbols declared in the range statement are variables which will be used
182: as arguments for tuples. These are called TUPLE VARIABLES. The purpose
183: of this statement is to specify the relation over which each variable ranges.
184:
185: Moreover, an interaction includes one or more statements of the form:
186:
187: Command [Result-name] ( Target-list )
188: [ WHERE Qualification ]
189:
190: Here, Command is either RETRIEVE, APPEND, REPLACE,
191: or DELETE.
192: For RETRIEVE and APPEND,
193: Result-name is the name of the relation which qualifying tuples
194: will be retrieved into or appended to.
195: For REPLACE and DELETE, Result-name is the name of a tuple variable
196: which, through the qualification, identifies tuples to be modified or
197: deleted.
198: The Target-list is a list of the form
199:
200: Result-domain = Function ...
201:
202: Here, the Result-domain's are domain names in the result relation
203: which are to be assigned the value of the corresponding
204: function.
205:
206: The following suggest valid QUEL interactions. A complete description of the
207: language is presented in [HELD75a].
208:
209: Example 1.1 Find the birth year of employee Jones
210:
211: .ss
212: RANGE OF E IS EMPLOYEE
213: RETRIEVE INTO W (BYEAR = 1975 - E.AGE)
214: WHERE E.NAME = "Jones"
215: .ds
216:
217: Here, E is a tuple variable which ranges over the EMPLOYEE relation and all tuples
218: in that relation are found which satisfy the qualification E.NAME = "Jones".
219: The result of the query is a new relation, W, which has a single domain,
220: BYEAR, that has been calculated for each qualifying tuple.
221: If the result relation is omitted, qualifying tuples are
222: written in display format on the user's
223: terminal or returned to a calling
224: program in a prescribed format as discussed in Section 2.
225: Also, in the Target-list, the "Result-domain =" may be omitted
226: if Function is the right hand side is an existing domain (i.e. NAME = E.NAME may
227: be written as E.NAME -- see example 1.6).
228:
229: Example 1.2 Insert the tuple (Jackson,candy,13000,Baker,30) into EMPLOYEE.
230:
231: .ss
232: APPEND TO EMPLOYEE(NAME = "Jackson", DEPT = "candy",
233: SALARY = 13000, MGR = "Baker", AGE = 30)
234: .ds
235:
236: Here, the result relation EMPLOYEE is modified by adding the indicated tuple
237: to the relation.
238: If not all domains are specified,
239: the remainder default to zero for numeric domains
240: and null for character strings.
241:
242: Example 1.3 If a second relation DEPT(DEPT, FLOOR#) contains
243: the floor# of each department that an
244: employee might work in, then one can fire
245: everybody on the first floor as follows:
246:
247: .ss
248: RANGE OF E IS EMPLOYEE
249: RANGE OF D IS DEPT
250: DELETE E WHERE E.DEPT = D.DEPT
251: AND D.FLOOR# = 1
252: .ds
253:
254: Here E specifies that the EMPLOYEE relation is to be modified.
255: All tuples are to be removed which have a value for DEPT
256: which is the same as some department of the first floor.
257:
258: Example 1.4 Give a 10 percent raise to Jones if he works on the first floor
259:
260: .ss
261:
262: RANGE OF E IS EMPLOYEE
263: RANGE of D is DEPT
264: REPLACE E(SALARY BY 1.1 * E.SALARY)
265: WHERE E.NAME = "Jones" AND
266: E.DEPT = D.DEPT AND D.FLOOR# = 1
267:
268: .ds
269:
270: Here, E.SALARY is to be replaced by 1.1*E.SALARY for those tuples in EMPLOYEE
271: where the qualification is true.
272: (Note that the keywords IS and BY may be used interchangeably with "=" in
273: any QUEL statement.)
274:
275: Also, QUEL contains aggregation operators including COUNT, SUM, MAX, MIN,
276: and AVG.
277: Two examples of the use of aggregation follow.
278:
279: Example 1.5 Replace the salary of all toy department employees by the average
280: toy department salary.
281:
282: .ss
283: RANGE OF E IS EMPLOYEE
284: REPLACE E(SALARY BY AVG(E.SALARY WHERE E.DEPT = "toy") )
285: WHERE E.DEPT = "toy"
286: .ds
287:
288: Here, AVG is to be taken of the salary domain for those tuples satisfying
289: the qualification E.DEPT = "toy". Note that AVG(E.SALARY WHERE E.DEPT= "toy") is
290: scalar valued (in this instance, $13,000)
291: and consequently will be called an AGGREGATE. More general
292: aggregations are possible as suggested by the following example.
293:
294: Example 1.6 Find those departments whose average salary exceeds the company
295: wide average salary, both averages to be taken only for those
296: employees whose salary exceeds $10000.
297:
298: .ss
299: RANGE OF E IS EMPLOYEE
300: RETRIEVE INTO HIGHPAY (E.DEPT)
301: WHERE AVG(E.SALARY BY E.DEPT WHERE E.SALARY > 10000)
302: >
303: AVG(E.SALARY WHERE E.SALARY > 10000)
304: .ds
305:
306: Here, AVG(E.SALARY BY E.DEPT WHERE E.SALARY>10000) is an AGGREGATE FUNCTION and takes
307: a value for each value of E.DEPT. This value is the aggregate AVG(E.SALARY
308: WHERE E.SALARY>10000 AND E.DEPT = value). (For the toy, candy
309: and admin departments this value is respectively 14,500, 12,000 and 30,000.)
310: The qualification expression for the
311: statement is then true for departments for which this aggregate function exceeds
312: the aggregate AVG(E.SALARY WHERE E.SALARY>10000).
313:
314: In addition to the above QUEL commands
315: INGRES also supports a variety of utility commands
316: These utility commands can be classified into
317: seven
318: major categories.
319:
320: a) invocation of INGRES
321:
322: INGRES data-base-name
323:
324: This command executed from UNIX "logs in" a user to a given data base.
325: (A data base is simply a named collection of
326: relations with a given data base administrator who has
327: powers not available to ordinary users.)
328: Thereafter, the user may issue all other commands
329: (except those executed directly from UNIX)
330: within the environment of the invoked data base.
331:
332: b) creation and destruction of data bases
333:
334: CREATEDB data-base-name
335: DESTROYDB data-base-name
336:
337: These two commands are called from UNIX. The invoker of CREATEDB
338: must be authorized to create data bases (in a manner to be described
339: presently) and he automatically becomes the data base administrator.
340: DESTROYDB successfully destroys a data base only if invoked by the data
341: base administrator.
342:
343: c) creation and destruction of relations
344:
345: .nf
346: CREATE relname(domain-name IS format, domain-name IS format,...)
347: DESTROY relname
348: .fi
349:
350: These commands create and destroy relations within the current data base.
351: The invoker of the CREATE command becomes the "owner" of the relation
352: created. A user may only destroy a relation that he owns.
353: The current formats accepted by INGRES
354: are 1, 2 and 4 byte integers, 4 and 8
355: byte floating point numbers and fixed length ASCII character strings between
356: 1 and 255 bytes.
357:
358: d) bulk copy of data
359:
360: .ss
361: COPY relname(domain-name IS format, domain-name IS format,...)
362: direction "filename"
363:
364: PRINT relname
365: .ds
366:
367: The command COPY transfers an entire relation to or from a UNIX file whose
368: name is "filename". Direction is either "TO" or "FROM". The format for
369: each domain is a description of how it appears (or is to appear) in the
370: UNIX file. The relation relname must exist and have domain names identical
371: to the ones appearing in the COPY command. However, the formats need not agree and
372: COPY will automatically convert data types. Also, support is provided
373: for dummy and variable length fields in a UNIX file.
374:
375: PRINT copies a relation onto the user's terminal
376: formatting it
377: as a report. In this sense, it is stylized version of COPY.
378:
379: e) storage structure modification
380:
381: MODIFY relname TO storage-structure ON (key1, key2,...)
382: INDEX ON relname IS indexname(key1, key2,...)
383:
384: The MODIFY command changes the storage structure of a relation from one
385: access method to another. The five access methods currently supported are
386: discussed in Section 2. The indicated keys are domains in relname which are
387: concatenated left to right to form a combined key which is used in the
388: organization of tuples in all but one of the access methods.
389: Only the owner of a relation may modify its storage structure.
390:
391: INDEX creates a secondary index for a relation . It has domains of
392: key1, key2,...,pointer. The domain, pointer, is the address of a tuple
393: in the indexed relation having the given values for key1, key2,....
394: An index named AGEINDEX for EMPLOYEE would be the
395: following binary relation
396:
397: .ss
398: .nf
399: AGE POINTER
400:
401: 25 address of Smith's tuple
402: 32 address of Jones' tuple
403: AGEINDEX 36 address of Adams' tuple
404: 29 address of Johnson's tuple
405: 47 address of Baker's tuple
406: 58 address of Harding's tuple
407:
408:
409: .ds
410: .fi
411: The relation indexname is in turn treated and accessed just like any other
412: relation except that it is automatically updated when the relation it indexes
413: is updated. This is discussed further in Section 6.
414: Naturally, only the owner of a relation may create and destroy secondary
415: indexes for it.
416:
417: f) consistency and integrity control
418:
419: INTEGRITY CONSTRAINT is qualification
420: INTEGRITY CONSTRAINT LIST relname
421: INTEGRITY CONSTRAINT OFF relname
422: INTEGRITY CONSTRAINT OFF (integer, ... ,integer)
423: RESTORE data-base-name
424:
425: The first four commands support the insertion, listing, deletion
426: and selective deletion of integrity constraints which are to be enforced for all
427: interactions with a relation. The mechanism for handling this enforcement
428: is dicussed in Section 4. The last command restores a data base to a
429: consistent state after a system crash. It must be executed from UNIX and its
430: operation is discussed in Section 6.
431: The RESTORE command is only available
432: to the data-base administrator.
433:
434: g) miscellaneous
435:
436: HELP [relname|manual-section]
437: SAVE relname UNTIL expiration-date
438: RELKILLER data-base-name
439:
440: HELP provides information about the system or the data base invoked. When
441: called with an optional argument which is a command name, HELP will
442: return the appropriate page from the INGRES reference manual [ZOOK75].
443: When called with a relation name as an argument, it returns all information
444: about that relation. With no argument at all it returns information about
445: all relations in the current data base.
446:
447: SAVE is the mechanism by which a user can declare his intention to keep a
448: relation until a specified time. RELKILLER is a UNIX command which can be
449: invoked by a data base administrator to delete all relations
450: whose "expiration-dates" have passed. This should be done when
451: space in a data base is exhausted.
452: (The data base administrator can also remove any
453: relations from his data base using the DESTROY command,
454: regardless of who their owners are.)
455:
456: Two comments should be noted at this time.
457:
458: a) The system currently accepts the language specified as QUEL1 in [HELD75a].
459: Extension is in progress to accept QUELn.
460:
461: b) The system currently does not accept views or protection statements. Although
462: the algorithms have been specified [STON74b,STON75], they have not yet been implemented.
463: For this reason, no syntax for these statements is given in this section;
464: however the subject is dicussed further in Section 4.
465:
466: 1.3 THE UNIX ENVIRONMENT
467:
468: Two points concerning UNIX are worthy of mention in this section.
469:
470: a) The UNIX file system
471:
472: UNIX supports a tree structured file system similar to that of MULTICS.
473: Each file is either a directory (containing references to descendant
474: files in the file system) or a data file. Each data file can be viewed
475: as an array 1 byte wide and 2**24 bytes long.
476: (It is expected that this maximum length
477: will be increased by the UNIX implementors.)
478: Addressing in a file is similar
479: to referencing such an array. Physically, each file is divided into 512 byte
480: blocks (pages). In response to a read request, UNIX moves one or more
481: pages from secondary memory to UNIX core buffers then returns to the user the
482: the actual byte string desired. If the same page is referenced again
483: (by the same or another user) while
484: it is still in a core buffer, no disk I/O takes place.
485:
486: It is important to
487: note that UNIX pages data from
488: the file system into and out of system buffers
489: using a "least
490: recently used" replacement algorithm. In this way the entire file system
491: is managed as a large virtual store.
492:
493: In part because the INGRES designers believe that a data base system
494: should appear as a user job to UNIX and in part because they believe
495: that the operating system should deal with all space management issues
496: for the mix of jobs being run, INGRES contains NO facilities to
497: do its own memory management.
498:
499: Each file in UNIX can be granted by its owner any combination of the
500: following protection clauses:
501:
502: .ss
503: a) owner read
504: b) owner write
505: c) non owner read
506: d) non owner write
507: e) execute
508: f) special execute
509: .ds
510:
511: When INGRES is initially generated, a UNIX user named INGRES is
512: created. All data files managed by the INGRES system are
513: owned by this "super-user" and have their protection status
514: set to "owner read, owner write, no other access". Consequently,
515: only the INGRES super-user can directly tamper with INGRES
516: files. (The protection system is currently being altered to
517: optionally require the consent of the data base administrator
518: before unrestricted access by the super-user is allowed.)
519:
520: The INGRES object code is stored in files whose protection status is
521: set to "special execute, no other access". When a user invokes the
522: INGRES system (by executing command a) above), UNIX creates the
523: INGRES processes operating temporarily with a user-id of INGRES. When
524: a user exits from INGRES these processes are destroyed and the user is
525: restored to operating with his own user-id.
526:
527: Using this mechanism, the only way
528: a user may access an INGRES data base is to execute
529: INGRES object code. This "safety latch" effectively isolates users from
530: tampering directly with INGRES data.
531:
532: b) The UNIX process structure
533:
534: A process in UNIX is an address space (64K bytes or less on an 11/40,
535: 128K bytes or less on 11/45's and 11/70's) which is associated with a user-id
536: and is the unit of work scheduled by the UNIX scheduler. Processes
537: may "fork" subprocesses; consequently, a parent process can be the root of
538: a process subtree.
539: Furthermore, a process can request that UNIX
540: execute a file in a descendant process.
541: Such processes may communicate with each other
542: via an inter-process communication facility called
543: "pipes". A pipe
544: may be declared as a one direction
545: communication link which is written into
546: by one process and read by a second one.
547: UNIX maintains synchronization
548: of pipes so no messages are lost.
549: Each process has a "standard input device" and a "standard output device".
550: These are usually the user's terminal but may be redirected by
551: the user to be files, pipes to other
552: processes, or other devices.
553: .ph
554: Lastly UNIX provides a facility for processes executing
555: re-entrant code to share
556: procedure segments if possible.
557: INGRES takes advantage of this facility
558: so the core space overhead of multiple
559: concurrent users is only that
560: required by data segments.
561: .ph
562: We turn in the next section to the process structure in which INGRES runs.
563:
564: .bp
565: .fo 'DESIGN OF INGRES (II)'-%-'12/5/75'
566: 2 THE INGRES PROCESS STRUCTURE
567:
568: INGRES can be invoked in two ways: First, it can be directly invoked
569: from UNIX by executing INGRES data-base-name; second it can be invoked by
570: executing a program written using the EQUEL precompiler. We discuss each
571: in turn and then comment briefly on why two mechanisms exist.
572:
573: 2.1 INVOCATION FROM UNIX
574:
575: Issuing INGRES as a UNIX command
576: causes the process structure shown in Figure 1 to
577: be created.
578:
579: .ne 15
580: .ls 1
581: .nf
582: ________ ________ ________ ________ ________
583: | | | | A | | B | | C | |
584: | user |---->| |---->| |---->| |---->| |
585: | term-| | | | | | | | |
586: | inal |<----| |<----| |<----| |<----| |
587: | | | | F | | E | | D | |
588: ________ ________ ________ ________ ________
589:
590: process process process process
591: 1 2 3 4
592:
593:
594: INGRES Process Structure
595:
596: Figure 1
597: .ls 2
598:
599: .fi
600: Process 1 is an interactive terminal monitor which allows the user to
601: formulate, print, edit and execute collections of INGRES commands. It
602: maintains a workspace with which the user interacts until he is
603: satisfied with his interaction.
604: The contents of this workspace are passed
605: down pipe A as a string of ASCII characters
606: when execution is desired.
607: .ph
608: As noted above,
609: UNIX allows a user to alter the standard input and output devices for
610: his processes when executing a command. As a result the invoker of INGRES
611: may direct the terminal monitor to take input from a user file
612: (in which case he runs a "canned" collection of interactions)
613: and direct
614: output to another device (such as the line printer) or a file.
615:
616: The current terminal monitor accepts the following commands. Anything else
617: is simply appended to the user's workspace.
618:
619: .nf
620: .ss
621: # : Erase the previous character. Successive uses of this
622: instruction will erase back to, but not beyond, the
623: beginning of the current line.
624:
625: @ : Erase the current line. Successive uses of this in-
626: struction are ignored.
627:
628: \\r : Erase the entire interaction (reset the workspace). The
629: former contents of the workspace are irretrieveably lost.
630:
631: \\p : Print the current workspace. Its contents are printed
632: on the user's terminal.
633:
634: \\e : Enter the UNIX text editor and begin accepting editor
635: commands. The editor allows sophisticated editing of the
636: user's workspace. This command is executed by simply
637: "forking" a subprocess and executing the UNIX editor in it.
638:
639: \\g : Process the current query (go). The contents of the
640: workspace are transmitted to process 2.
641:
642: \\q : Exit from INGRES.
643:
644: .fi
645:
646: .ds
647: Process 2 contains a lexical analyzer, a parser, query modification routines
648: for integrity control (and in the future support of views and protection)
649: and concurrency control. When process 2 finishes, it passes a string of tokens
650: to process 3 through pipe B.
651: Process 2 is discussed in Section 4.
652:
653: Process 3 accepts this token string and contains execution routines for
654: the commands
655: RETRIEVE, REPLACE, DELETE and APPEND. Any update is
656: turned into a RETRIEVE command to isolate
657: tuples to be changed. Revised copies of modified tuples
658: are spooled into a special file.
659: This
660: file is then processed by a
661: "deferred update processor" in
662: process 4 which is discussed in Section 6.
663: .ph
664: Basically process 3 performs two functions for RETRIEVE commands.
665: a) A multivariable query is DECOMPOSED into a sequence of
666: interactions involving only a single variable.
667: b) A one variable query is executed by a one variable query processor (OVQP).
668: OVQP in turn performs its function by making calls on the access methods.
669: These two functions are discussed in Section 5;
670: the access method are indicated in Section 3.
671:
672: In process 4 resides all code to support utility commands (CREATE, DESTROY,
673: INDEX, etc.). Process 3 simply passes to process 4 any commands which process 4
674: will execute.
675: Process
676: 4 is organized as a collection of overlays which accomplish the various
677: functions. The structure of this process will be discussed in Section 6.
678:
679: Error messages are passed back through pipes D, E and F
680: to process 1 which returns
681: them to the user. If the command is a RETRIEVE with no result
682: relation specified, process 3 returns qualifying tuples
683: in a stylized format directly to the
684: "standard output device" of process 1.
685: Unless redirected, this is the user's terminal.
686:
687: We now turn to the operation of INGRES when invoked by code from the
688: precompiler.
689:
690: 2.2 EQUEL
691:
692: Although QUEL alone provides the flexibility for most data
693: management requirements, there are many
694: applications which require a customized user
695: interface in place of the QUEL language.
696: For this as well as other reasons,
697: it is often useful to have the flexibility of a general purpose
698: programming language in addition to the
699: data base facilities of QUEL.
700: To this end, a new language, EQUEL (Embedded QUEL), has
701: been implemented which consists of QUEL embedded in the general purpose
702: programming language "C".
703:
704: In this section we describe the EQUEL language and indicate how it
705: operates in the INGRES environment.
706:
707: In the design of EQUEL, the following goals were set:
708: .in+5
709: .ti-3
710: 1)&The new language must have the full capabilities of both
711: "C" and QUEL.
712: .ti-3
713: 2)&The C program should have the capability for processing
714: each tuple individually which satisfies the qualification
715: in a QUEL RETRIEVE statement.
716: (this is the "piped" return facility described in
717: Data Language/ALPHA [CODD71]).
718: .ti-3
719: 3)&The implementation should make as much use as possible
720: of the existing C and QUEL language processors.
721: (The implementation cost of EQUEL should be small).
722: .in-5
723:
724: With these goals in mind, EQUEL was defined as follows:
725: .in+5
726: .ti-3
727: 1)&Any C language statement is a valid EQUEL statement.
728: .ti-3
729: 2)&Any QUEL statement (or INGRES utility command)
730: is a valid EQUEL statement as long as it is
731: prefixed by two number signs ("##").
732: .ti-3
733: 3)&C program variables may be used in QUEL statements in place
734: of relation names, domain names, target list elements,
735: or domain values.
736: The declaration statements of C variables used in this manner
737: must also be prefixed by double number signs.
738: .ti-3
739: 4)&RETRIEVE statements without a result relation have the form
740: RETRIEVE (Target-list)
741: [WHERE Qualification] C-Block
742: .br
743: which results in the C-Block
744: being executed once for each qualifying tuple.
745: .in-5
746:
747: Two short examples illustrate EQUEL syntax.
748:
749: Example 2.1 The following section of code implements a small
750: front end to INGRES which performs only one query.
751: It reads in the name of an employee and prints out the
752: employee's salary in a suitable format.
753: It continues to do this as long as there are more names
754: to be read in.
755: The functions READ and PRINT are assumed to have the obvious meaning.
756:
757: .nf
758: .ss
759: .in +4
760: main()
761: {
762: ## char NAME[20];
763: ## int SAL;
764: while (READ(NAME))
765: {
766: ## RANGE OF X IS EMP
767: ## RETRIEVE (SAL = X.SALARY)
768: ## WHERE X.NAME = NAME
769: {
770: PRINT("The salary of ",NAME," is ",SAL);
771: }
772: }
773: }
774:
775: .fi
776: .ds
777: In this example the C-variable NAME is used in the qualification of
778: the QUEL statement and for each qualifying tuple,
779: the C-variable SAL is set to the appropriate value and then
780: the Print statement is executed.
781: (note: in C "{" and "}" are equivalent to BEGIN and END in ALGOL).
782:
783: .in -4
784:
785:
786: Example 2.2 Read in a relation name and two domain names.
787: Then for each of a collection of values which the second
788: domain is to assume,
789: do some processing on all values
790: which the first domain assumes.
791: (We assume the functions READ and PROCESS exist and have the
792: obvious meanings.)
793: A more elaborate version of this program could serve as a simple
794: report generator.
795:
796: .nf
797: .ss
798: .in +4
799: ## int VALUE;
800: ## char RELNAME[13], DOMNAME[13], DOMVAL[80];
801: ## char DOMNAME_2[13];
802: READ(RELNAME);
803: READ(DOMNAME);
804: READ(DOMNAME_2);
805: ## RANGE OF X IS RELNAME
806: while (READ(DOMVAL))
807: {
808: ## RETRIEVE (VALUE = X.DOMNAME)
809: ## WHERE X.DOMNAME_2 = DOMVAL
810: {
811: PROCESS(VALUE);
812: }
813: }
814: .fi
815: .in -4
816: .ds
817:
818: Any RANGE declaration (in this case the one for X) is assumed by
819: INGRES to hold until redefined.
820: Hence, only one RANGE statement is required regardless of the number
821: of times the RETRIEVE statement is executed.
822:
823: In order to implement EQUEL, a translator (pre-compiler)
824: was written which converts an EQUEL program into a valid C-program
825: with QUEL statements converted to appropriate C-code and calls to
826: INGRES.
827: The resulting C-program is then compiled by the normal
828: C-compiler producing an executable module.
829: Moreover, when an EQUEL program is run, the executable module
830: produced by the C-compiler is used as the
831: front end process in place of the interactive terminal monitor as noted
832: in Figure 2.
833: .ne 15
834: .ls 1
835:
836: ________ ________ ________ ________
837: | | A | | B | | C | |
838: | |---->| |---->| |---->| |
839: | | | | | | | |
840: | |<----| |<----| |<----| |
841: | | F | | E | | D | |
842: ________ ________ ________ ________
843:
844: C process process process
845: program 2 3 4
846:
847:
848: The Forked Process Structure
849:
850: Figure 2
851:
852: .ls 2
853:
854:
855: During execution of the front-end program,
856: data base requests (QUEL statements in the EQUEL program)
857: are passed through pipe A and processed by INGRES.
858: If tuples must be returned for tuple at a time processing,
859: then they are returned through a special data pipe set up between process 3 and the C program.
860: A condition code is also returned
861: through pipe F to indicate success or the type of error encountered.
862:
863: Consequently, the EQUEL translator must perform the following five
864: functions:
865:
866: .in+5
867: 1) insert system calls to "spawn" at run time the process structure
868: shown in Figure 2.
869:
870: 2) note C-variable declarations prefaced by ## as legal for inclusion
871: in INGRES commands.
872:
873: 3) process other lines prefaced by ##. These are parsed to isolate
874: C-variables. In adddition, C statements are inserted to write the
875: line down pipe A in ASCII format, modified so that
876: values are substituted for any C-variables.
877: The rationale for not completely parsing a QUEL statement in EQUEL
878: is given in [ALLM76].
879:
880: 4) insert C statements to read pipe F for completion information
881: and call the procedure IIerror.
882: The user may define IIerror himself or have EQUEL include a standard
883: version which prints the error message (for abnormal terminations)
884: and continues.
885:
886: 5) If data is to be returned through the data pipe (by a RETRIEVE with no result
887: relation specified), EQUEL must also:
888:
889: .in+5
890: a) insert C statements to read the data pipe for a tuple formatted as type/value
891: pairs.
892:
893: b) insert C statements to substitute values into C-variables declared in the
894: target list.
895: If necessary, values are converted to the types of the declared C-variables.
896:
897: c) insert C statements to pass control to the C-block following the RETRIEVE.
898:
899: d) insert C statements following
900: the block to return to step a) if there are more tuples.
901: .in-10
902:
903:
904: 2.3 COMMENTS ON THE PROCESS STRUCTURE
905:
906: The process structure shown in Figures 1 and 2 is the fourth different process
907: structure implemented. The following considerations suggested this final choice:
908:
909: a) Simple control flow. Previous process structures had a more complex
910: interconnection of processes which made debugging harder.
911:
912: b) Commands are passed to the right only. Process 3 must issue commands
913: to various overlays in process 4 to execute interactions as discussed in
914: Section 5. Hence, process 3 must be to the left of process 4.
915:
916: c) The utility commands are expected to be called relatively infrequently
917: compared to the activity in process 2 and 3. Hence, it appears appropriate
918: to overlay little used code in a single process.
919: The alternative is to create additional processes (and pipes)
920: which are quiescent most of the time. This would
921: require added space in UNIX core tables for no particular advantage.
922:
923: d) The first 3 processes are used frequently.
924: Overlaying code in these processes was
925: tried in a previous version and slowed the
926: system considerably.
927:
928: e) To run on an 11/40, the 64K address space limitation must be adhered to. Processes
929: 2 and 3 are nearly their maximum size and hence cannot be combined.
930: (For 11/45 and 11/70 versions we may experiment with such a combination.)
931:
932: f) The C program which replaces the terminal monitor as a front
933: end must run with a user-id different from that of INGRES for protection reasons.
934: (Otherwise it could tamper directly with data
935: managed by INGRES.)
936: Hence, either it must be overlayed into a process or run in its
937: own process. For efficiency and convenience, the latter was chosen.
938:
939: g) The interactive terminal monitor could have been written (albeit
940: clumsily) in EQUEL. Such a strategy would have avoided the
941: existence of two process structures which differ only by the
942: treatment of the data pipe. This was not done because response time
943: would have degraded and because EQUEL does type
944: conversion to predefined types. This feature would unnecessarily
945: complicate the terminal monitor.
946:
947: h) The processes are all synchronized (i.e. each waits for an
948: error return from the next process to the right before continuing to accept
949: input from the process to the left.
950: This is done because it simplifies the flow of
951: control. Moreover in many instances the
952: various processes MUST be synchronized.
953: Future versions of INGRES may attempt to exploit parallelism
954: where possible.
955:
956: .fo 'DESIGN OF INGRES (III)'-%-'12/5/75'
957: .bp
958: 3 DATA STRUCTURES AND ACCESS METHODS
959: .ph
960: We begin this section with a discussion of the files
961: that INGRES manipulates and their contents.
962: Then we sketch the language used to access all
963: non directory files.
964: Finally, the five possible file formats are indicated.
965: .se
966: 3.1 THE INGRES FILE STRUCTURE
967: .ph
968: Figure 3 indicates the subtree of the UNIX
969: file system that INGRES manipulates.
970:
971: .ss
972:
973:
974:
975:
976:
977:
978:
979:
980:
981:
982:
983:
984:
985:
986:
987:
988:
989:
990:
991: .ce 3
992: The INGRES Subtree
993:
994: Figure 3
995:
996: .ds
997:
998: The root of this subtree
999: is a directory made for the UNIX user "INGRES".
1000: It has six descendant directories. The AUX
1001: directory contains descendant files containing tables which
1002: control the spawning of processes shown in
1003: Figures 1 and 2, and an authorization list of users who are allowed to
1004: create data bases. Only the INGRES "super-user"
1005: may modify these files (by using the UNIX editor).
1006: BIN and SOURCE are directories indicating descendant
1007: files of respectively object and source code.
1008: TMP contains temporary files containing
1009: the workspaces used by the interactive
1010: terminal monitor.
1011: DOC is the root of a subtree with system documentation and the reference manual.
1012: Lastly there is a directory entry in DATADIR for each data base that
1013: exists in INGRES.
1014: These directories contain the data base files
1015: in a given data base as descendants.
1016: .ph
1017: These data base files are of four types:
1018: .ph
1019: a) an administration file. This contains the user-id
1020: of the data base administrator (DBA) and
1021: initialization information.
1022: .ph
1023: b) System relations. These relations have
1024: predefined names
1025: and are created for every data base.
1026: They are owned by the DBA and
1027: constitute the system catalogs.
1028: They may be queried by a knowledgeable user
1029: issuing RETRIEVE statements,
1030: however, they may be updated only by the INGRES utility commands
1031: (or directly by the INGRES "super-user" in an emergency).
1032: (When protection statements are implemented the DBA
1033: will be able to selectively restrict RETRIEVE access
1034: to these relations if he wishes.)
1035: The form and content of some of these relations will be presently discussed.
1036: .ph
1037: c) DBA relations. These are relations owned by the DBA and are
1038: shared in that any user may
1039: access them.
1040: When protection
1041: is implemented the DBA can "authorize" other users by inserting protection
1042: predicates (which will be in one of the system relations) and
1043: "deauthorize" them by removing such predicates.
1044: .ph
1045: d) Other relations. These are relations
1046: created by other users (by RETRIEVE into W or CREATE)
1047: and are NOT SHARED.
1048: .ph
1049: Three comments should be made at this time.
1050: .ph
1051: a) The DBA has the following power not available
1052: to ordinary users:
1053: .bb
1054: 1) the ability to create shared relations and to specify access control
1055: for them
1056: .ph
1057: 2) the ability to run RELKILLER
1058: .ph
1059: 3) the ability to destroy any relations in his data base (except the
1060: system catalogs)
1061: .be
1062: This system allows "one level sharing" in that only the DBA has
1063: the powers in a) and he cannot delegate any of these
1064: powers to others (as in the file systems of most
1065: time-sharing systems). This strategy was implemented for
1066: three reasons:
1067: .bb
1068: 1) The need for added generality was not perceived.
1069: Moreover, added generality would have created
1070: tedious problems (such as making revocation
1071: of access privileges
1072: non trivial).
1073: .ph
1074: 2) It seems appropriate to entrust to the DBA the duty (and power) to resolve
1075: the policy decision which must be made when
1076: space is exhausted and some relations must be destroyed (or archived).
1077: This policy decision becomes much harder (or impossible)
1078: if a data base is not in the control of one user.
1079: .ph
1080: 3) Someone must be entrusted with the policy
1081: decision concerning which relations to physically store and which to
1082: define as "views". This "data base design" problem is
1083: best centralized in a single DBA.
1084: .be
1085: b) Except for the single administration file in each data base
1086: every file
1087: is treated as a relation.
1088: Storing system catalogs as relations
1089: has the following advantages:
1090: .bb
1091: 1) Code is economized by
1092: sharing routines for accessing both catalog and data relations.
1093: .ph
1094: 2) Since several storage structures are supported for
1095: accessing
1096: data relations quickly and flexibly under various interaction mixes, these same
1097: storage choices may be utilized to enhance access to catalog information.
1098: .ph
1099: 3) The ability to execute QUEL statements to examine (and patch)
1100: system relations where necessary has greatly
1101: aided system debugging.
1102: .be
1103: c) Each relation is stored in a separate file, i.e., no attempt
1104: is made to "cluster" tuples from different relations
1105: which may be accessed together
1106: on the same
1107: (or a nearby) page.
1108: This decision is based on the following reasoning.
1109: .bb
1110: 1) The access methods would be more complicated if clustering were
1111: supported.
1112: .ph
1113: 2) UNIX has a small (512 byte) page size.
1114: Hence it is expected that the number of tuples which can be grouped
1115: on the same page is small.
1116: Moreover, logically adjacent pages in a UNIX file are NOT
1117: NECESSARILY physically adjacent.
1118: Hence clustering tuples on "nearby" pages has no meaning in
1119: UNIX; the next logical page in a file
1120: may be further away (in terms of disk arm motion) than a page in a
1121: different file. In keeping with the design decision of NOT
1122: modifying UNIX, these considerations were incorporated
1123: in the design decision not to support clustering.
1124: .ph
1125: 3) Clustering of tuples only makes sense if associated
1126: tuples can be linked together using "sets" [CODA71] or
1127: "links" [TSIC75]. Incorporating these access paths into the
1128: decomposition scheme would have greatly increased its complexity.
1129: .be
1130: .se
1131: 3.2 SYSTEM CATALOGS
1132: .ph
1133: We turn now to a discussion of the system catalogs.
1134: We discuss two relations in detail and
1135: indicate briefly the contents of the others.
1136: .ph
1137: The RELATION relation contains one tuple
1138: for every relation in the data base (including all the system
1139: relations.) The domains
1140: of this relation are:
1141: .in +24
1142: .na
1143: .ti8
1144: relid the name of the relation
1145: .ti8
1146: owner the UNIX user-id of the relation owner;
1147: when appended to relid it produces a unique file name for storing
1148: the relation.
1149: .ti8
1150: spec indicates one of 5 possible storage
1151: schemes or else a special code indicating a virtual
1152: relation (or "view").
1153: .ti8
1154: indexd flag set if secondary index exists for this
1155: relation. (This flag and the following two are
1156: present to improve performance by avoiding catalog
1157: lookup's when possible during query modification
1158: and one variable query processing.)
1159: .ti8
1160: protect flag set if this relation has protection predicates.
1161: .ti 8
1162: integ flag set if there are integrity constraints.
1163: .ti 8
1164: save scheduled life time of relation.
1165: .ti 8
1166: tuples number of tuples in relation.
1167: .ti 8
1168: atts number of domains in relation.
1169: .ti 8
1170: width width (in bytes) of a tuple.
1171: .ti 8
1172: prim number of primary file pages for this relation.
1173: .in -24
1174: .ad
1175: .ph
1176: The ATTRIBUTE catalog contains information relating to
1177: individual domains of relations.
1178: Tuples of the ATTRIBUTE catalog contain the following items
1179: for each domain of every relation in the data base:
1180:
1181: .in +24
1182: .na
1183: .ti8
1184: relid name of relation in which attribute appears
1185: .ti8
1186: owner relation owner
1187: .ti8
1188: domain-name domain name
1189: .ti8
1190: domainno domain number (position) in relation. In processing
1191: interactions INGRES uses this number to reference this domain.
1192: .ti8
1193: offset offset in bytes from beginning of tuple to beginning of domain.
1194: .ti8
1195: type data type of domain (integer, floating point or character string).
1196: .ti8
1197: length length (in bytes) of domain;
1198: .ti8
1199: keyno if this domain is part of a key, then "keyno"
1200: indicates the ordering of this domain within the key.
1201: .in -24
1202: .ad
1203: .ph
1204: These two catalogs together provide information about the structure and
1205: content of each relation in the data base.
1206: No doubt items will continue to be added or deleted as the system undergoes
1207: further development.
1208: The first planned extensions are the minimum and maximum values
1209: assumed by the domain.
1210: These will be used by a more sophisticated decomposition scheme being
1211: developed, which is discussed
1212: briefly in the next section and in detail in [WONG76].
1213: The representation of the catalogs as relations
1214: has allowed this restructuring to occur very easily.
1215: .ph
1216: Several other system relations exist which provide auxiliary
1217: information about relations.
1218: The INDEX catalog contains a tuple for every
1219: secondary index in the data base.
1220: Since secondary indices are themselves relations
1221: they are independently cataloged in the RELATION and ATTRIBUTE
1222: relations. However, the INDEX catalog provides the association
1223: between a primary relation and the secondary indices for it
1224: including which domains of the primary relation are in the index.
1225: .ph
1226: The PROTECTION and INTEGRITY catalogs contain respectively
1227: the protection and integrity predicates for each relation in the
1228: data base.
1229: These predicates are stored in a partially processed
1230: form as character strings.
1231: (This mechanism exists for INTEGRITY and will be implemented
1232: in the same way for PROTECTION.)
1233: The VIEW catalog will contain, for each virtual relation, a partially
1234: processed QUEL-like description which can be used to
1235: construct the view from its component physical relations.
1236: The use of these last three catalogs will be described in
1237: Section 4.
1238: The existence of any of this auxiliary information for
1239: a given relation is signalled by the appropriate flag(s) in
1240: the RELATION catalog.
1241: .ph
1242: Yet another set of system relations are those used by the
1243: graphics sub-system to catalog and process maps, which (like
1244: everything else) are stored as relations in the data base.
1245: This topic has been discussed separately in [GO75].
1246: .se
1247: 3.3 ACCESS METHODS INTERFACE (AMI).
1248: .ph
1249: We will now discuss in more detail the AMI which handles
1250: all actual accessing of data from relations.
1251: The AMI language is implemented as a set of functions whose calling
1252: conventions are indicated below.
1253:
1254: .ph
1255: Each access method must do two things to support the
1256: following calls. First it must provide SOME
1257: linear ordering of the tuples in a relation so
1258: that the concept of "next tuple" is well defined.
1259: Second it must assign to each tuple a tuple-id (TID) which
1260: uniquely identifies a tuple.
1261: .ph
1262: The nine implemented calls are as follows:
1263: .ph
1264: a) openr(descriptor, mode, relation_name)
1265: .ph
1266: Before a relation may be accessed it must be "opened".
1267: This function opens the UNIX file for the relation and fills in
1268: a "descriptor"
1269: with information about the relation from the RELATION and ATTRIBUTE
1270: catalogs.
1271: The descriptor, which must be declared in the calling program, is used
1272: in subsequent calls on AMI routines as an input parameter
1273: to indicate what relation is involved.
1274: Consequently, the AMI data accessing routines need not themselves check the
1275: system catalogs for the description of a relation.
1276: "Mode" specifies whether the relation is being opened for update or
1277: for retrieval only.
1278: .ph
1279: b) get(descriptor, tid, limit_tid, tuple, next_flag)
1280: .ph
1281: This function retrieves into 'tuple' a single tuple from the relation indicated by 'descriptor'.
1282: 'tid' and 'limit_tid' are tuple-identifiers.
1283: There are two modes of retrieval, "scan" and "direct".
1284: In "scan" mode "get" is intended to be called successively to
1285: retrieve all tuples within a range of tuple-id's.
1286: An initial value of 'tid' sets the low end of the range desired
1287: and 'limit_tid' sets the high end.
1288: Each time "get" is called with 'next_flag' = TRUE, the
1289: tuple following
1290: 'tid' is retrieved and its tuple-id placed into 'tid' in readiness
1291: for the next call.
1292: Reaching 'limit_tid' is indicated by a special return code.
1293: The initial setting of 'tid' and 'limit_tid' is done
1294: by the "find" function.
1295: In "direct" mode ('next_flag' = FALSE) the function retrieves
1296: the tuple with tuple-id = 'tid'.
1297: .ph
1298: c) find(descriptor, key, tid, match_mode)
1299: .ph
1300: "Find" places in 'tid' the tuple-id at the low or high end
1301: of the range of tuples which match the key value supplied.
1302: The matching condition to be applied depends on 'match-mode'.
1303: .ph
1304: If the relation does not have a keyed storage structure
1305: or if the key supplied does not correspond to the correct key domains,
1306: the 'tid' returned will be as if no key were supplied.
1307: The objective of "find" is to restrict the scan of a relation
1308: by eliminating from consideration those tuples known from
1309: their placement in the relation not to satisfy the
1310: matching condition with the key.
1311: Calls to "find" occur in pairs, one to set the low end of a scan,
1312: the other for the high end,
1313: and the two tuple-id's obtained are used in subsequent calls on "get".
1314: .ph
1315: Two functions are available for determining the access characteristics
1316: of the storage structure
1317: of a primary data relation or secondary index,
1318: respectively.
1319: .ph
1320: d) paramd(descriptor, access_characteristics_structure)
1321: .br
1322: e) parami(descriptor, access_characteristics_structure)
1323: .ph
1324: These functions fill in the 'access_characteristics_structure' with
1325: information regarding the type of key which may be constructed
1326: to optimize access to the given relation.
1327: This includes whether exact key values or ranges of key values can be used,
1328: and whether a partially specified key may be used.
1329: This will determine the 'match-mode' used in a subsequent call to "find".
1330: The ordering of domains in the key is also indicated.
1331: These functions relieve optimization routines executed
1332: during the processing of an interaction
1333: of the need to know directly about specific storage structures.
1334: .ph
1335: Other AMI functions provide a facility for updating relations.
1336: .ph
1337: f) insert(descriptor, tuple)
1338: .ph
1339: The tuple is added to the relation in its "proper" place
1340: according to its key value and the storage mode of the relation.
1341: .ph
1342: g) replace(descriptor, tid, new_tuple)
1343: .br
1344: h) delete(descriptor, tid)
1345: .ph
1346: The tuple indicated by 'tid' is either replaced by new values
1347: or deleted from the relation altogether.
1348: The tuple-id of the affected tuple will have been obtained by a previous
1349: "get".
1350: .ph
1351: Finally, when all access to a relation is complete it must be closed:
1352: .ph
1353: i) closer(descriptor)
1354: .ph
1355: This closes the relation's UNIX file and rewrites the information
1356: in the descriptor back into the system catalogs if there has been
1357: any change.
1358: .se
1359: 3.4 STORAGE STRUCTURES AVAILABLE.
1360: .ph
1361: We will now describe the five storage structures currently available in INGRES.
1362: Four of the schemes are keyed, i.e., the storage location of a tuple within the
1363: file is a function of the value of the tuple's key domains.
1364: These schemes allow rapid access to specific portions of a relation
1365: when key values are supplied. The remaining non-keyed scheme stores tuples
1366: in the file independently of their values
1367: and provides a low-overhead storage structure, especially attractive when
1368: the entire relation must be accessed anyway.
1369: .ph
1370: The non-keyed storage structure in INGRES is a randomly ordered sequential
1371: file.
1372: Fixed-length tuples are
1373: simply placed sequentially in the file in the order supplied.
1374: New tuples added to the relation are merely appended to the end
1375: of the file.
1376: The unique tuple-identifier for each tuple is its byte-offset within
1377: the file.
1378: This mode is intended mainly for
1379: .ph
1380: a) very small relations, for which the overhead of other schemes
1381: is unwarranted
1382: .ph
1383: b) transitional storage of data being moved into or out of the system by COPY;
1384: .ph
1385: c) certain temporary relations created as intermediate results during query
1386: processing.
1387: .be
1388: In the remaining schemes, the key-value of a tuple determines
1389: the page of the file on which the tuple will
1390: be placed.
1391: The schemes share a common "page-structure" for managing
1392: tuples on file pages as shown in Figure 4.
1393: .ne 15
1394:
1395:
1396:
1397:
1398:
1399:
1400:
1401:
1402:
1403:
1404:
1405:
1406:
1407:
1408:
1409: .ce 2
1410: Page Layout for Keyed Storage Structures
1411: Figure 4
1412: .ph
1413: A tuple must fit entirely on a single page.
1414: Its unique identifier (TID) consists of a page number
1415: (the ordering of its page in the UNIX file)
1416: plus a "line
1417: number" indicating its position on the page.
1418: A "line table", which grows upwards from the bottom of the page,
1419: contains as an entry for each tuple, a pointer to the beginning of
1420: the tuple.
1421: In this way a page can be reorganized without affecting TID's.
1422: .ph
1423: Initially the file will contain all its tuples on a number of
1424: "primary" pages.
1425: If the relation grows and these pages fill, "overflow"
1426: pages are allocated and chained by pointers to the primary pages
1427: with which they are associated.
1428: Within a chained group of pages no
1429: special ordering of tuples is maintained.
1430: Thus in a keyed access which locates a particular primary page,
1431: tuples matching the key may be on any page in the chain.
1432: .ph
1433: As discussed in [HELD75b] two modes of key-to-address transformation
1434: are used -- randomizing and order preserving.
1435: In a "hash" file tuples are distributed randomly throughout
1436: the primary pages of the file according to a hashing function
1437: on a key.
1438: This mode is well suited for situations in which access
1439: is to be conditioned on a specific
1440: key value.
1441: .ph
1442: As an order-preserving mode, a scheme similar to IBM's ISAM [IBM66]
1443: is used.
1444: The relation is sorted to produce the ordering on a particular key.
1445: A multi-level directory is created which records the high key on each
1446: primary page.
1447: The directory, which is static, resides on several pages within the file itself,
1448: following the primary pages.
1449: A primary page and its overflow pages are not maintained in sort order. This
1450: decision is discussed in the section on concurrency.
1451: The "ISAM-like" mode is useful in cases where the key value is likely
1452: to be specified as falling within a range of values, since
1453: a near ordering of the keys is preserved.
1454: The index compression scheme discussed in [HELD75b] is
1455: currently under implementation.
1456: .ph
1457: In the above mentioned keyed modes, fixed length tuples are stored.
1458: In addition, both schemes can be used in conjunction with data compression
1459: techniques [GOTT75] in cases
1460: where increased storage utilization outweighs the added cost of
1461: encoding and decoding data during access.
1462: These modes are known as "compressed hash" and "compressed ISAM".
1463:
1464: The current compression scheme suppresses blanks and portions of a
1465: tuple which match the preceding tuple. This compression is applied
1466: to each page independently. Other schemes are being experimented with.
1467: .ph
1468:
1469: 3.5 ADDITION OF NEW ACCESS METHODS
1470:
1471: .ph
1472: One of the goals of the AMI design was to insulate
1473: higher level software from the actual functioning
1474: of the access methods and thereby to make it
1475: easy to add different ones.
1476: .ph
1477: In order to add a new access method one need
1478: only extend the AMI routines to handle the new case.
1479: If the new method uses the same page layout and TID scheme,
1480: only find, parami, and paramd need to be extended.
1481: Otherwise new procedures to perform these functions must be
1482: supplied for use by get, insert, replace and delete.
1483: .bp
1484: .fo 'DESIGN OF INGRES (IV)'-%-'12/5/75'
1485: 4 THE STRUCTURE OF PROCESS 2
1486:
1487: Process 2 contains code to perform four main functions
1488: .br
1489: a) a lexical analyzer
1490: .br
1491: b) a parser (written in YACC [JOHN74])
1492: .br
1493: c) query modification routines to support
1494: protection, views and integrity control
1495: .br
1496: d) concurrency control
1497: .in 0
1498:
1499: These are discussed in turn
1500:
1501: 4.1 LEXICAL ANALYSIS AND PARSING
1502: .ph
1503: The lexical analysis and parsing phases of
1504: INGRES have been organized around the
1505: YACC translator writing system
1506: available in UNIX [JOHN74]. YACC takes
1507: as input a description of a grammar consisting
1508: of BNF-like parsing rules (productions) and
1509: precedence rules,
1510: plus action statements associated with
1511: each production.
1512: It produces a set of tables to be
1513: interpreted by a parse table interpreter which is combined
1514: with locally-supplied lexical analysis
1515: and parsing action routines to produce a complete translator.
1516: .ph
1517: The interpreter uses a bottom-up
1518: LR(1) parsing approach.
1519: The lexical analyzer is called to obtain successive symbols
1520: from the input stream as the interpreter attempts to
1521: match the input with productions in the grammar.
1522: When a production is matched YACC performs a reduction and
1523: executes the action statement associated with
1524: the production. YACC has a mechanism for recovering
1525: from errors to continue parsing input in its
1526: entirety.
1527: .ph
1528: While the YACC parse table interpreter checks
1529: the syntactic correctness of the input
1530: commands, the action statements check for sementic
1531: consistency and correctness and prepare the commands for
1532: further processing. The system
1533: catalogs are used to check that relation and
1534: domain names, formats, and so on, are
1535: specified appropriately.
1536: .ph
1537: For utility commands a command indicator
1538: and the parameters for the
1539: command are sent directly to process 3
1540: for transmission to process 4. Section 6 discusses these
1541: commands and their implementation.
1542: .ph
1543: For QUEL commands, the input is translated to a tree-structured
1544: internal form which will be used in the
1545: remaining analysis and processing.
1546: Moreover, the qualification part
1547: is converted to conjunctive normal form.
1548: The parse tree is now ready to undergo
1549: what has been termed "query-modification," to be described in
1550: Section 4.2 and 4.3.
1551: .br
1552:
1553: 4.2 INTEGRITY
1554: .ph
1555: Query modification includes adding integrity and
1556: protection predicates to the original query
1557: and changing references to virtual relations into references
1558: to the appropriate physical relations.
1559: At the present time only a simple integrity
1560: scheme has been implemented.
1561: .ph
1562: In [STON75] algorithms of
1563: several levels of complexity are presented for
1564: performing integrity control on updates.
1565: In the present system only the simplest case,
1566: involving single-variable, aggregate-free
1567: integrity assertions, has been implemented and is
1568: described in detail in [SCHO75].
1569: .ph
1570: Briefly, integrity assertions are entered
1571: in the form of QUEL qualification
1572: clauses to be applied to interactions updating
1573: the relation over which the variable in the
1574: assertion ranges. A parse
1575: tree is created for the qualification and a representation of this
1576: tree stored in the INTEGRITY catalog together with
1577: an indication of the relation and specific domains
1578: involved. At query
1579: modification time, updates are checked
1580: for any possible integrity assertions
1581: on the affected domains. Relevant assertions are
1582: retrieved, re-built into tree
1583: form and grafted onto the update tree
1584: so as to AND the assertions with the existing
1585: qualification of the interaction.
1586: .ph
1587: 4.3 PROTECTION AND VIEWS
1588: .ph
1589: Algorithms for the support of views are also given
1590: in [STON75]. Basically a view is any relation
1591: which could be created from existing relations by
1592: the use of a RETRIEVE command. Such view definitions
1593: will be treated in a manner
1594: somewhat analogous to that used for integrity
1595: control. They will be allowed in INGRES
1596: to support EQUEL programs written for obsolete
1597: versions of the data base and for user
1598: convenience.
1599: .ph
1600: Protection will be handled according to the
1601: algorithm described in [STON74b]. Like integrity control
1602: this algorithm involves adding
1603: qualifications to the user's interaction.
1604: In the remainder of this section we distinguish this protection
1605: scheme from the one in [CHAM75] and indicate the
1606: rationale behind its use.
1607: .ph
1608: Consider the following two views:
1609:
1610: .in 10
1611: .ss
1612: RANGE OF E IS EMPLOYEE
1613: .br
1614: DEFINE RESTRICTION-1 (E.NAME, E. SALARY, E.AGE)
1615: .br
1616: WHERE E.DEPT = "toy"
1617:
1618: DEFINE RESTRICTION-2 (E.NAME, E.DEPT, E.SALARY)
1619: .br
1620: WHERE E.AGE < 50
1621: .ds
1622:
1623: .in 0
1624: and the following two access control statements:
1625:
1626: .in 10
1627: .ss
1628: RANGE OF E IS EMPLOYEE
1629: .br
1630: PROTECT (E.NAME, E.SALARY, E.AGE)
1631: .br
1632: WHERE E.DEPT = "toy"
1633:
1634: PROTECT (E.NAME, E.SALARY, E.DEPT)
1635: .br
1636: WHERE E.AGE < 50
1637: .ds
1638:
1639: .in 0
1640: .ph
1641: Acess control could be based on
1642: views as suggested in [CHAM75]
1643: and a given user could be authorized to use
1644: views RESTRICTION-1 and RESTRICTION-2.
1645: To find
1646: the salary of Harding he could interrogate RESTRICTION-1 as
1647: follows:
1648:
1649: .in 10
1650: .ss
1651: RANGE OF R IS RESTRICTION-1
1652: .br
1653: RETRIEVE (R.SALARY) WHERE
1654: .br
1655: R.NAME = "Harding"
1656: .ds
1657: .in 0
1658:
1659: .ph
1660: Failing to find Harding in RESTRICTION-1 he would
1661: have to then interrogate RESTRICTION-2. After two queries
1662: be would be returned the appropriate salary if
1663: Harding was under 50 or in the toy department.
1664: .ph
1665: Under the INGRES scheme the user can issue
1666: .in 10
1667:
1668: .ss
1669: RANGE OF E IS EMPLOYEE
1670: .br
1671: RETRIEVE (E.SALARY) WHERE
1672: .br
1673: E.NAME = "Harding"
1674: .ds
1675: .in 0
1676: .ph
1677: which will be modified by the access control
1678: algorithm to
1679: .in 10
1680:
1681: .ss
1682: RANGE OF E IS EMPLOYEE
1683: .br
1684: RETRIEVE (E.SALARY) WHERE
1685: .br
1686: E.NAME = "Brown"
1687: AND
1688: .br
1689: (E.AGE < 50 OR E.DEPT = "toy")
1690: .ds
1691: .in 0
1692: .ph
1693: In this way the user need not manually sequence through
1694: his views to obtain desired data but automatically
1695: obtains such data if permitted. Note
1696: clearly that the portion of EMPLOYEE to which the
1697: user has access (the union of RESTRICTION-1 and RESTRICTION-2) is
1698: not a relation and hence
1699: cannot be defined as a single view.
1700:
1701: To summarize, access control restrictions are handled automatically by the
1702: INGRES algorithms but must be dealt with by a
1703: user sequencing through his views in a "view
1704: oriented" access control scheme.
1705:
1706: 4.4 CONCURRENCY CONTROL
1707:
1708: In any multiuser system provisions must be included
1709: to ensure that multiple concurrent updates are
1710: executed in a manner such that some level
1711: of data integrity can be guaranteed. The
1712: following two (somewhat facetious) updates illustrate the problem.
1713:
1714: .nf
1715: .ss
1716:
1717: RANGE OF E is EMPLOYEE
1718: U1 REPLACE E(DEPT = "toy") WHERE
1719: E.DEPT = "candy"
1720:
1721: RANGE OF F is EMPLOYEE
1722: U2 REPLACE F(DEPT = "candy") WHERE
1723: F.DEPT = "toy"
1724:
1725: .ds
1726: .fi
1727: If U1 and U2 are executed concurrently with
1728: no controls, some employees may end up in
1729: each department and the particular result
1730: may not be repeatable if the data base is
1731: backed up and the interactions reexecuted.
1732:
1733: The control which must be provided is to guarantee
1734: that some data base operation is "atomic" (i.e. occurs
1735: in such a fashion that it appears instantaneous and before
1736: or after any other data base operation).
1737: This atomic unit will be called a transaction.
1738: .ph
1739: In INGRES there are three basic choices available for
1740: defining a transaction.
1741:
1742: .br
1743: a) something smaller than one INGRES command
1744: .br
1745: b) one INGRES command
1746: .br
1747: c) a collection of INGRES commands.
1748: .in 0
1749: .br
1750:
1751: If a) is chosen INGRES could not guarantee that two concurrently executing update
1752: commands gave the same result as if they were executed sequentially (in either order)
1753: in one collection of INGRES processes.
1754: In fact the outcome could fail to be repeatable,
1755: as noted in the example above.
1756: This situation is clearly undesirable.
1757:
1758: Option c) is in the opinion of the INGRES designers
1759: impossible to support.
1760: The following transaction could be declared in an EQUEL program.
1761:
1762: .in 5
1763: .br
1764: .ss
1765: BEGIN TRANSACTION
1766: .in 10
1767: FIRST QUEL UPDATE
1768: .br
1769: SYSTEM CALLS TO CREATE AND DESTROY FILES
1770: .br
1771: SYSTEM CALLS TO FORK A SECOND COLLECTION OF INGRES PROCESSES TO WHICH
1772: COMMANDS ARE PASSED
1773: .br
1774: SYSTEM CALLS TO READ FROM A TERMINAL
1775: .br
1776: SYSTEM CALLS TO READ FROM A TAPE
1777: .br
1778: SECOND QUEL UPDATE (whose form depends on previous two
1779: system calls)
1780: .br
1781: .in 5
1782: END TRANSACTION
1783: .ds
1784:
1785: .in 0
1786: Suppose T1 is the above transaction and runs concurrently with
1787: a transaction T2 involving commands of the same
1788: form. The second update of each transaction may well "conflict" with the first
1789: update of the other.
1790: Note that there is no way to tell apriori that T1 and T2 conflict
1791: because the form of the second update is
1792: not known in advance. Hence a deadlock situation
1793: can arise which can
1794: only be resolved by aborting one
1795: transaction (an undesirable policy in the eyes of the INGRES designers)
1796: or attempting to back out one transaction.
1797: The overhead of backing out through the intermediate system
1798: calls appears prohibitive (if it is possible at all).
1799: Restricting a transaction to have no system calls (and hence
1800: no I/0) cripples the power of a transaction in order to make deadlock resolution
1801: possible and was judged undesirable.
1802: Thus option b) was chosen.
1803:
1804: The implementation of b) can be achieved by physical
1805: locks on data items, pages, tuples, domains, relations, etc.
1806: [GRAY75] or by predicate locks [STON74c].
1807: The current implementation is by relatively crude physical
1808: locks (on domains of a relation) and avoids deadlock by not
1809: allowing an interaction to proceed to process 3 until it can lock all
1810: required resources.
1811: Because of a problem with the current design of certain access
1812: method calls, all domains
1813: of a relation must currently be locked (i.e. a whole relation is locked)
1814: to perform an update. This situation will soon be rectified.
1815:
1816: The choice of avoiding deadlock rather than
1817: detecting and resolving it is made primarily for implementation
1818: simplicity. The choice of a crude locking unit
1819: reflects a minicomputer environment where core storage for
1820: a large lock lable is not available. In the future we
1821: plan to experimentally implement a crude and (thereby low CPU overhead)
1822: version of a predicate locking scheme previously described in [STON74c].
1823: Such an approach may provide considerable concurrency at an
1824: acceptable overhead in lock table space and CPU time,
1825: although such a statement is highly speculative.
1826:
1827: Once the concurrency processor has
1828: assembled locks on all domains needed by an interaction, it may
1829: proceed to process 3 for unemcumbered execution.
1830:
1831: To conclude this section we briefly indicate the reasoning behind
1832: not sorting a page and its overflow pages in the "ISAM-like" access method. This
1833: topic is also discussed in [HELD75c].
1834:
1835: Basically, maintenance of the sort order of these pages may require the access
1836: method to lock more than one page when it inserts a tuple.
1837: Clearly deadlock might be possible given concurrent updates and
1838: locks for physical pages would be required
1839: (at least once
1840: a more sophisticated predicate locking scheme is tried such as [STON74c]).
1841: To avoid both problems these pages remain unsorted and
1842: the access method need only be able to read-modify-write a single
1843: page as an atomic operation.
1844: Although such an atomic operation is not currently in UNIX
1845: (and not needed by the current primitive scheme) it is a minor
1846: addition.
1847:
1848:
1849: .bp
1850: .fo 'DESIGN OF INGRES (V)'-%-'12/12/75'
1851: 5 PROCESS 3
1852:
1853: .ph
1854: As noted in Section 2 this process performs the following two
1855: functions which will be discussed in turn:
1856: .ph
1857: a) the DECOMPOSITION of queries involving more than one variable
1858: into sequences of one-variable queries. Partial results are
1859: accumulated until the entire query is evaluated. This program is called DECOMP.
1860: It also turns any updates into the appropriate queries to isolate
1861: qualifying tuples and spools new values into a special file
1862: for deferred update.
1863: .ph
1864: b) the processing of a single variable queries.
1865: The program is called
1866: the one variable query processor (OVQP).
1867:
1868: .ph
1869: 5.1 DECOMP
1870:
1871: Because INGRES allows interactions which are defined on the
1872: cross product of perhaps several relations, efficient execution of
1873: this step is of crucial importance in order to search as small a
1874: portion of the appropriate cross product space as
1875: possible.
1876: DECOMP uses three techniques in processing
1877: interactions. We describe each technique then
1878: give the actual algorithm implemented. Finally, we
1879: indicate the role of a more sophisticated decomposition scheme under
1880: design.
1881: .ph
1882: a) Tuple substitution
1883:
1884: The basic technique used by DECOMP to reduce a query
1885: to fewer variables is tuple substitution.
1886: One variable (out of possibly many) in the
1887: query is selected for substitution. The AMI
1888: language is used to scan the relation
1889: associated with the variable one tuple at a time.
1890: For each tuple, the values of domains in that relation
1891: are substituted into the query.
1892: In the resulting modified query, all
1893: previous references to the substituted variable have now
1894: been replaced by values (constants), and the query has
1895: thus been reduced to one less variable. Precomposition
1896: is repeated (recursively) on the modified query until
1897: only one variable remains, at which point the OVQP is called to continue processing.
1898: .ph
1899: b) One-Variable Detachment
1900:
1901: If the qualification Q of the query is of the form
1902: .br
1903: .ce
1904: Q1(V1) AND Q2(V1,...,Vn)
1905: .br
1906: for some
1907: tuple variable V1, the following two steps can be
1908: executed:
1909:
1910: 1) Issue the query
1911: .in 5
1912: .ss
1913:
1914: RETRIEVE INTO W (TL[V1])
1915: .br
1916: WHERE Q1[V1]
1917: .ds
1918: .in 0
1919: .ph
1920: Here TL[V1] are those domains required in the remainder
1921: of the query. Note that this is a one variable query and may
1922: be passed directly to OVQP.
1923:
1924: 2) Replace R1, the relation over which V1 ranges,
1925: by W in the range declaration and delete
1926: Q1[V1] from Q.
1927:
1928: .ph
1929: The query formed in 1) is called a "one-variable,
1930: detachable sub-query" (OVDSQ) and the technique
1931: for forming and executing it "one-variable detachment" (OVD).
1932: This step has the effect of reducing the size
1933: of the relation over which V1 ranges
1934: by restriction and projection.
1935: Hence, it may reduce the complexity of the
1936: processing to follow.
1937: .ph
1938: Moreover, the opportunity exists in the process
1939: of creating new relations through OVD, to choose storage structures
1940: (and particularly keys) which will prove
1941: helpful in further processing.
1942: .ph
1943:
1944: c) Reformatting
1945:
1946: When a tuple variable is selected for substitution, a large
1947: number of queries each with one less variable will
1948: be executed. If b) is a possible operation after
1949: the substitution for some
1950: remaining variable, V1, then the relation over which
1951: V1 ranges, R1, can be reformatted to have domains
1952: used in Q1(V1) as a key. This will expedite
1953: b) each time it is executed during tuple substitution.
1954:
1955: We can now state the complete decomposition algorithm.
1956: .ph
1957: a) If number of variables
1958: in query is 0 or 1 call OVQP and stop; else go on.
1959:
1960: b) Find all variables,
1961: {V1,...Vn},
1962: for which the query contains a
1963: one-variable clause. Perform
1964: OVD to create new ranges for each of these
1965: variables.
1966: The new relation for each variable, Vi, is stored
1967: as a hash file with key, Ki, chosen as follows.
1968: .bb
1969: 1) For each j select from the remaining multi-variable clauses in the query
1970: the collection, C(ij), which have the form
1971: .ti +8
1972: Vi.di = Vj.dj
1973: .br
1974: where di,dj are domains of Vi and Vj.
1975: .br
1976: 2) Form the key Ki to be the concatenation
1977: of domains di1,di2,...of Vi appearing in clauses
1978: in C(ij).
1979: .br
1980: 3) If more than one j exists, for which C(ij) is non empty, one C(ij)
1981: is chosen arbitrarily for forming the key.
1982: If C(ij) is empty for all j,
1983: the relation is stored as an unsorted table.
1984: .ph
1985: .be
1986: c) Choose the variable, Vs, with the smallest number of tuples
1987: as the next one for which to perform
1988: tuple substitution.
1989: .ph
1990: d) For each tuple variable Vj for which C(js)
1991: is non null, reformat the storage structure
1992: of the relation Rj which it ranges over, if necessary,
1993: so that the key of Rj is the concatenation of
1994: domains dj1,... appearing
1995: in C(js).
1996: This ensures that when the clauses in C(js) become one-variable
1997: after substituting for Vs, subsequent calls to OVQP to restrict
1998: further the range of Vj will be done as efficiently as possible.
1999: .ph
2000: e) Perform the following two steps for all
2001: tuples in the range of the variable selected in (c):
2002: .bb
2003: 1) substitute values from tuple into query.
2004: .br
2005: 2) call decomposition algorithm recursively on a
2006: copy of resulting query which now has been reduced by one variable.
2007: .ph
2008: .be
2009: The following comments on the algorithm
2010: are appropriate:
2011: .ph
2012: a) OVD is almost always assured of speeding
2013: processing. Not only is it possible to
2014: wisely choose the storage structure
2015: of a temporary relation but also the
2016: cardinality of this relation
2017: may be much less than the one it
2018: replaces as the range for a tuple variable.
2019: It only fails if little or no
2020: reduction takes place and
2021: reformating is unproductive.
2022:
2023: It should be noted that a temporary relation is created rather
2024: than a list of qualifying tuple-id's. The basic tradeoff
2025: is that OVD must copy qualifying tuples but can remove duplicates
2026: created during the projection. Storing tuple-id's avoids the copy
2027: operation at the expense of reaccessing qualifying tuples and
2028: retaining duplicates. It is clear that cases exist where each
2029: strategy is superior. The INGRES designers have chosen OVD because it
2030: does not appear to offer worse performance than the alternative,
2031: allows a more accurate choice of the variable with the smallest
2032: range in step c) above and results in cleaner code.
2033:
2034: b) Tuple substitution is done when necessary on the
2035: variable associated with the smallest
2036: number of tuples. This has the effect of
2037: reducing the number of eventual calls on OVQP.
2038:
2039: c) Reformatting is done (if necessary)
2040: with the knowledge that it will replace a collection
2041: of complete sequential scans of a relation by a
2042: collection of limited scans. This
2043: will almost always reduce processing time.
2044:
2045: d) It is believed that this algorithm
2046: efficiently handles a large class of
2047: interactions. Moreover, the algorithm does not
2048: require excessive CPU overhead to perform. There are,
2049: however, cases where a more elaborate algorithm is
2050: needed. The following comment applies to these cases.
2051:
2052: e) Suppose that we have two or more strategies ST0, ST1, ... ,STn,
2053: each one being better than the previous one but also requiring
2054: a greater overhead.
2055: Suppose we begin an interaction on ST0 and run it for an amount
2056: of time equal to a fraction of the
2057: estimated overhead of ST1.
2058: At the end of that time, by simply counting
2059: the number of tuples of the first
2060: substitution variable which
2061: have already been processed, we can get an
2062: estimate for the total
2063: processing time using ST0. If this is significantly
2064: greater than the overhead of ST1, then we switch to ST1.
2065: Otherwise we stay and complete processing the interaction
2066: using ST0. Obviously, the procedure can be
2067: repeated on ST1 to call ST2 if necessary,
2068: and so forth.
2069:
2070: The algorithm detailed in this section is ST0. A more sophisticated algorithm
2071: ST1 is currently under development and is discussed in [WONG76].
2072: .se
2073: 5.2 ONE VARIABLE QUERY PROCESSOR (OVQP)
2074: .ph
2075: This program is concerned solely with the
2076: efficient accessing of tuples from a single
2077: relation given a particular one-variable query.
2078: The initial portion of this program, known
2079: as STRATEGY, determines what key (if any) may
2080: be profitably used
2081: to access the relation, what the value(s) of that
2082: key will be used in calls to the AMI routine "find", and
2083: whether the access
2084: may be accomplished
2085: directly through the AMI to the storage structure of the
2086: relation itself or if a secondary index on the relation should be used.
2087: If access is to be through a secondary index then STRATEGY must
2088: choose which ONE of possibly many indices to use.
2089: .ph
2090: Then, the tuples retrieved
2091: according to the access strategy selected are processed
2092: by the SCAN
2093: portion of OVQP. This program
2094: evaluates each tuple against the
2095: qualification part of the query, creates
2096: target list values for qualifying tuples, and
2097: disposes of the target list appropriately.
2098: .ph
2099: Since SCAN is relatively straightforward, we
2100: discuss only the policy decisions made in
2101: STRATEGY.
2102: .ph
2103: First STRATEGY examines the qualification
2104: for clauses which specify the value of a domain,
2105: i.e. clauses of the form:
2106:
2107: V.domain op constant
2108: .br
2109: where "op" is one of {=, <, >, <=, >=}.
2110: Such clauses are termed "simple" clauses and are
2111: organized into a list
2112: .ph
2113: Obviously a non-simple clause may be equivalent to a simple one.
2114: .br
2115: For example
2116: E.SALARY/2 = 10000 is equivalent to E.SALARY = 20000.
2117: .br
2118: However, recognizing and converting such clauses
2119: requires a general algebraic symbol
2120: manipulator. This issue has been
2121: avoided by ignoring all non-simple
2122: clauses. STRATEGY must now select one of
2123: two accessing strategies:
2124: .br
2125: a) issuing two AMI find commands on the primary
2126: relation followed by a sequential scan of the
2127: relation between the limits specified
2128: .br
2129: b) issuing two AMI find commands on some index relation
2130: followed by a sequential scan of the index
2131: between the limits specified. For each
2132: tuple retrieved the "pointer" domain
2133: is obtained and is a tuple-id of a tuple in
2134: the primary relation. This tuple
2135: is fetched and examined.
2136: .ph
2137: Key information about the primary
2138: relation is obtained using
2139: the AMI function "paramd".
2140: Names of indices are obtained from the
2141: index catalog and keying information about indices in obtained with
2142: the function "parami".
2143: .ph
2144: STRATEGY now checks if a simple
2145: clause is available to limit the scan of
2146: the primary relation or an index relation.
2147: If a relation is hashed the simple
2148: clause must specify equality as the
2149: operator in order to be
2150: useful. ISAM structures on the other hand
2151: allow ranges of values and less than
2152: all keys may be specified as long as
2153: the first one is present for structures
2154: with combined keys.
2155: .ph
2156: STRATEGY checks for such a simple clause for a relation in the following
2157: order:
2158: .bb
2159: a. hashed primary relation
2160: .br
2161: b. hashed index
2162: .br
2163: c. ISAM primary relation
2164: .br
2165: d. ISAM index
2166: .be
2167: .in 0
2168: The rationale for this ordering is related to the
2169: expected number of page accesses required to retrieve
2170: a tuple from the source relation in each case.
2171: .ph
2172: In case a) the key value provided locates a desired
2173: source tuple in one access,
2174: (ignoring overflows).
2175: In case b) the
2176: key value locates an appropriate index relation
2177: tuple in one access but an additional access in required to retrieve the proper
2178: source tuple. For the ISAM scheme, the
2179: directory must be examined. The number of accesses
2180: incurred in this look-up is at least 2.
2181: Again, with an index,
2182: an additional
2183: access is required, making the total at least 3
2184: in case d).
2185: .bp
2186: .fo 'DESIGN OF INGRES (VI)'-%-'12/5/75'
2187: 6 UTILITIES IN PROCESS 4
2188: .se
2189: 6.1 IMPLEMENTATION OF UTILITY COMMANDS
2190: .ph
2191: We have indicated in Section 1 several data base utilities
2192: available to users. We will now briefly describe their
2193: implementation and indicate their configuration within the
2194: INGRES system.
2195: .ph
2196: The commands are organized into several overlay programs.
2197: Since an overlay may have more than one
2198: entry point, it can contain
2199: more than one utility command.
2200: In fact, the utilities
2201: are grouped where possible to minimize overlaying.
2202: The overlays all contain a common main program known as
2203: the "controller", which reads pipe C and writes
2204: completion messages into pipe D.
2205: The processing of a utility command occurs as
2206: follows.
2207: .ph
2208: First, the parser recognizes a utility command in a user interaction.
2209: This name is looked up in the INGRES "process-table",
2210: which has an entry for each command name in the language.
2211: Each entry has an "overlay-id" and a "function-id".
2212: The first indicates the overlay program containing the
2213: command, and, the second indicates the proper entry point
2214: within that overlay.
2215: .ph
2216: These id's are passed down pipe B to process 3
2217: followed by the parameters of the command specified.
2218: Process 3 determines from the "overlay-id"
2219: if the command is a utility command intended for process 4.
2220: If so, the information is simply written on pipe C.
2221: .ph
2222: At this point, some overlay is occupying process 4, having remained
2223: from a previous command. Its copy of the controller
2224: reads pipe C to obtain the overlay-id.
2225: If a different overlay from the present one is
2226: indicated, the controller overlays process 4 and
2227: control passes to the controller of the new overlay.
2228: .ph
2229: Most of the utilities update or read the system
2230: relations using AMI calls. MODIFY contains a sort
2231: routine which puts tuples in collating sequence
2232: according to the concatenation of the
2233: desired keys (which need not
2234: be of the same data type).
2235: Pages are initially loaded to approximately 80%
2236: of capacity. The sort routine
2237: is a recursive N-way merge-sort
2238: where N is maximum number of files process 4
2239: can have open at once (currently 8).
2240: The index building occurs in an obvious way. To convert
2241: to hash structures MODIFY
2242: must specify the number of primary pages to be
2243: allocated. This parameter is used by the AMI in its
2244: hash scheme (which is a standard modulo division method).
2245: This is done by a rule of thumb.
2246: .ph
2247: It should be noted that a user who
2248: creates an empty hash relation using the CREATE command and then
2249: copies a large UNIX file into it using COPY
2250: will create a very inefficient structure. This is because
2251: a relatively small default number of primary pages will have
2252: been specified by CREATE and overflow chains
2253: will be long. A better strategy is
2254: to COPY into an unsorted table so that
2255: MODIFY can subsequently make a good guess
2256: at the number of primary pages to allocate.
2257: .se
2258: 6.2 DEFERRED UPDATE AND RECOVERY
2259:
2260: Any updates (APPEND, DELETE, REPLACE) are processed
2261: by writing the tuples to be added changed or modified into
2262: a temporary file. When process 3 finishes,
2263: it calls process 4 to actually perform the modifications
2264: requested as a final step in processing. Deferred update
2265: is done for four reasons.
2266:
2267: a) Secondary index considerations. Suppose the following
2268: QUEL statement is executed;
2269: .in 10
2270: .ss
2271:
2272: RANGE OF E IS EMPLOYEE
2273: .br
2274: REPLACE E(SALARY = 1.1*E.SALARY) WHERE
2275: .br
2276: E.SALARY > 20000
2277: .ds
2278: .in 0
2279:
2280: Suppose further that there is a secondary index on the
2281: salary domain and the primary relation is keyed on another domain.
2282:
2283: OVQP in finding the employees who qualify for the raise
2284: will use the secondary index. If one employee (say Smith qualifies and his
2285: tuple is modified and the secondary index updated) then the scan of the secondary
2286: index will find his tuple a second time (in fact an arbitrary number of times).
2287: Either secondary indexes cannot be used to identify qualifying
2288: tuples when range qualifications are present (a rather
2289: unnatural restriction) or secondary
2290: indices must be updated in deferred mode.
2291:
2292: .ph
2293: b) Primary relation considerations.
2294: Suppose the following QUEL statement is executed
2295:
2296: .in 10
2297: .ss
2298: RANGE OF E, M IS EMPLOYEE
2299: .br
2300: REPLACE E(SALARY = .9*E.SALARY) WHERE
2301: .br
2302: .in 15
2303: E. MGR = M.NAME
2304: .br
2305: .in 17
2306: AND
2307: .br
2308: .in 15
2309: E.SALARY > M.SALARY
2310: .ds
2311: .in 0
2312:
2313: for the following EMPLOYEE relation
2314:
2315: .nf
2316: NAME SAL MANAGE
2317: .ss
2318: Smith 10k Jones
2319: Jones 8k
2320: Brown 9.5k Smith
2321: .fi
2322: .ds
2323: Logically Smith should get the pay cut but Brown should
2324: not. However, if Smith's tuple is updated before Brown
2325: is checked for the pay cut, Brown will qualify. This
2326: undesirable situation must be avoided by deferred update.
2327:
2328: c) Functionality of updates.
2329: Suppose the following QUEL statement is executed.
2330:
2331: .in 10
2332: .ss
2333: RANGE of E,M is EMPLOYEE
2334: .br
2335: REPLACE E(SALARY = M.SALARY)
2336: .ds
2337: .in 0
2338:
2339: This update attempts to assign to each employee
2340: the salary of every other employee, i.e.
2341: a single data item is to be replaced by multiple values.
2342: Stated differently, the REPLACE statement does not specify a function.
2343: This non-functionality
2344: can only be checked if deferred update is performed.
2345: .ph
2346: d) Recovery is easier.
2347: The deferred update file provides a log of updates
2348: to be made. Recovery is provided upon system
2349: crash by the RESTORE command.
2350: In this case the deferred update routine is requested
2351: to destroy the temporary file if it has
2352: not yet started processing it.
2353: If it has begun processing, it reprocesses the
2354: entire update file which is done in such a way that the
2355: effect is the same as if it were processed exactly once from
2356: start to finish.
2357:
2358: Hence the update is "backed out" if deferred updating has not yet
2359: begun; otherwise it is processed to conclusion.
2360: The software is designed so the update file can be optionally spooled onto tape and
2361: recovered from tape. This added feature should soon be operational.
2362:
2363: If a user from the terminal monitor (or a C-program)
2364: wishes to stop a command he can issue a "break" character.
2365: In this case all processes reset execept the deferred update
2366: program which recovers in the same manner as above.
2367:
2368: All update commands do deferred update; however the INGRES
2369: utilities have not yet been modified to do likewise.
2370: When this is completed INGRES will
2371: recover from all crashes which leave the disk intact.
2372: In the meantime there can be disk-intact crashes which
2373: cannot be recovered in this manner (if they happen in such a
2374: way that the system catalogs are left inconsistent).
2375:
2376: The INGRES "super-user" can checkpoint a data base(s)
2377: onto tape using the UNIX backup scheme. Since INGRES
2378: logs all interactions, a consistent system can always be
2379: obtained (albeit slowly) by restoring the last checkpoint
2380: and running the log of interactions (or the tape(s)
2381: of deferred updates if it exists).
2382:
2383:
2384: .bp
2385: .fo 'DESIGN OF INGRES (VII)'-%-'12/5/75'
2386: 7 CONCLUSION AND FUTURE EXTENSIONS
2387:
2388: The system described herein is in use at 5
2389: installations and is being brought up at 8 others.
2390: It forms the basis of an
2391: accounting system, a system for managing student records,
2392: a geo-data system, a system for maintaining
2393: a wiring diagram for a large telelphone company and assorted
2394: other smaller applications. These
2395: applications have been running for periods of
2396: up to nine months.
2397:
2398: 7.1 PERFORMANCE
2399:
2400: At this time no detailed performance measurements have
2401: been made. However, on our system (an 11/40 mainframe with 80k words of
2402: core) about 4-6 simultaneous INGRES users can be supported with
2403: reasonable response time (assuming they are doing interactions which
2404: affect a small number of tuples and for which a fast access path
2405: exists. Of course, a user has the ability to execute interactions
2406: in INGRES which require hours to process). This hardware configuration
2407: costs about
2408: $60,000. Larger 11/70 installations should be
2409: able to run substantially more INGRES users.
2410:
2411: The sizes of the processes in INGRES are indicated below.
2412: Since the access methods are loaded
2413: with processes 2 and 3 and with many of the utilities
2414: their contribution to the respective process sizes has been noted separately.
2415: .bb
2416: access methods (AM) 11 K (bytes)
2417: .br
2418: terminal monitor 10 K
2419: .br
2420: EQUEL 30 K + AM
2421: .br
2422: process 2 45 K + AM
2423: .br
2424: process 3 (query processor) 45 K + AM
2425: .br
2426: utilities (8 overlays) 160 K + AM
2427:
2428: .in 0
2429: 7.2 USER FEEDBACK
2430:
2431: The feedback from internal and external users
2432: has been overwhelmingly positive.
2433: In this section we indicate features that have been
2434: suggested for future systems.
2435:
2436: a) higher performance
2437:
2438: Earlier versions of INGRES were very slow.
2439: The current version should alleviate this problem.
2440:
2441: b) recursion
2442:
2443: QUEL does not support recursion. Hence, recursion must be
2444: tediously programmed in C using the precompiler.
2445: This has been suggested as a desired extension.
2446:
2447:
2448: c) other language extensions
2449:
2450: These include user defined functions (especially counters),
2451: multiple target lists for a single qualification statement and if-then-else
2452: control structures in QUEL.
2453: These extensions are all so a user can avoid using the precompiler.
2454:
2455: d) report generator
2456:
2457: PRINT is only a very primitive report generator.
2458: The need for augmented facilities in this area is clear.
2459: It should be written in EQUEL.
2460:
2461: e) bulk copy
2462:
2463: The COPY routine fails to handle easily all situations that have arisen.
2464:
2465: 7.3 FUTURE EXTENSIONS
2466:
2467: Noted throughout the paper are areas where
2468: system improvement is in progress, planned or desired by users.
2469: Other areas of extension include the
2470: following
2471:
2472: a) A multi-computer system version of INGRES to operate
2473: on distributed data bases
2474:
2475: b) Further performance enhancements
2476:
2477: c) A higher level user language including recursion and user defined
2478: functions
2479:
2480: d) Better data definition and integrity features
2481:
2482: e) A data base administrator advisor. This program would run at idle
2483: priority and issue queries against a statistics relation to be kept by
2484: INGRES. It could then offer advice to a DBA concerning the choice of
2485: access methods and the selection of indices. This topic is discussed
2486: further in [HELD75b].
2487:
2488: ACKNOWLEDGEMENT
2489: .ph
2490: Besides the authors, the following
2491: persons have played an active role
2492: in the design and implementation of INGRES:
2493: Eric Allman, Rick Berman, Jim Ford, Angela Go,
2494: Nancy McDonald, Peter Rubinstein, Iris Schoenberg, Nick Whyte,
2495: Carol Williams, Karel Youssefi, Bill Zook.
2496: .ph
2497: Support for the project has come from
2498: ARO23007, DAHC04-74-G0087,
2499: NESCN0039-76-C-0022,
2500: NSFF44620-71-C-0087,
2501: JSEPDCR75-03839, NSFENG74-06651-AO1,
2502: and a grant from the Sloan Foundation.
2503: .wh 0 tm
2504: .de tm
2505: .sp 6
2506: ..
2507: .pl 60
2508: .po 5
2509: REFERENCES
2510:
2511:
2512: .ls 2
2513: .in+9
2514: .ti-9
2515: ALLM76
2516: Allman, E., Stonebraker, M., and Held, G.
2517: "Embedding a Relational Data Sub-language in a General Purpose Programming Language.",
2518: Proc. ACM SIGPLAN-SIGMOD Workshop on Data,
2519: Salt Lake City, Utah, March, 1976.
2520: .ti-9
2521: ASTR76
2522: Astrahan, M. et. al., "SYSTEM R: A Relational Approach to
2523: Data Base Management," TODS, Vol 1, No. 2.
2524: .ti-9
2525: BOYC73
2526: Boyce, R. et. al.,
2527: "Specifying Queries as Relational Expressions: SQUARE",
2528: IBM Research, San Jose, Ca., RJ 1291, Oct. 1973.
2529: .ti-9
2530: CHAM74
2531: Chamberlin, D. and Boyce, R.,
2532: "SEQUEL: A Structured English Query Language",
2533: Proc. 1974 ACM-SIGFIDET Workshop on Data Description, Access and Control, Ann Arbor, Mich., May 1974.
2534: .ti-9
2535: CHAM75
2536: Chamberlin, D., Gray, J.N., and Traiger, I.L.,
2537: "Views, Authorization and Locking in a Relational Data Base System",
2538: Proc. 1975 National Computer Conference, AFIPS Press, May 1975.
2539: .ti-9
2540: CODA71
2541: Committee on Data Systems Languages,
2542: "CODASYL Data Base Task Group Report",
2543: ACM, New York, 1971.
2544: .ti-9
2545: CODD70
2546: Codd, E.F.,
2547: "A Relational Model of Data for Large Shared Data Banks",
2548: CACM, Vol. 13 No. 6, pp. 377-387, June, 1970.
2549: .ti-9
2550: CODD71
2551: Codd, E.F.,
2552: "A Data Base Sublanguage Founded on the Relational Calculus",
2553: Proc. 1971 ACM-SIGFIDET Workshop on Data Description, Access and Control, San Diego, Ca., Nov. 1971.
2554: .ti-9
2555: CODD72
2556: Codd, E.F.,
2557: "Relational Completeness of Data Base Sublanguages",
2558: Courant Computer Science Symposium 6, May 1972.
2559: .ti-9
2560: CODD74
2561: Codd, E.F. and Date, C.J.,
2562: "Interactive Support for Non-Programmers, The Relational and Network Approaches",
2563: Proc. 1974 ACM-SIGFIDET Workshop on Data Description, Access and Control, Ann Arbor, Mich., May 1974.
2564: .ti-9
2565: DATE74
2566: Date, C.J. and Codd, E.F.,
2567: "The Relational and Network Approaches: Comparison of the Aplication Programming Interfaces",
2568: Proc. 1974 ACM-SIGFIDET Workshop on Data Description, Access and Control, Ann Arbor, Mich., May 1974.
2569: .ti-9
2570: GRAY75 Gray, J.N., Lorie, R.A., and Putzolu, G.R.
2571: "Granularity of Locks in a Shared Data Base"
2572: Proc. International Conference on Very Large
2573: Data Bases, Framingham, Mass., September, 1975.
2574: .ti-9
2575: GO75
2576: Go, A., Stonebraker, M., and Williams, C.,
2577: "An Approach to Implementing a Geo-data System",
2578: Proc. ACM SIGGRAPH/SIGMOD Conference on Data Bases in Interactive Design,
2579: Waterloo, Ontario, Sept. 1975.
2580: .ti-9
2581: GOTT75 Gottlieb, D., et. al. "A Classification of
2582: Compression Methods and their
2583: Usefulness in a Large Data Processing Center"
2584: Proc. 1975 National Computer Conference,
2585: AFIPS Press, May, 1975.
2586: .ti-9
2587: HELD75a
2588: Held, G.D., Stonebraker, M. and Wong, E.,
2589: "INGRES - A Relational Data Base Management System",
2590: Proc. 1975 National Computer Conference, AFIPS Press, 1975.
2591: .ti-9
2592: HELD75b
2593: Held, G.D.,
2594: "Storage Structures for Relational Data Base Management Systems",
2595: Ph.D. Thesis, Dept. of Electrical Engineering and Computer Science, Univ. of California, Berkeley, 1975.
2596: .ti-9
2597: HELD75c
2598: Held, G., and Stonebraker M.,
2599: "B-Trees Re-examined",
2600: to appear in CACM.
2601: .ti-9
2602: IBM66
2603: IBM Corp.,
2604: "OS ISAM Logic",
2605: IBM, White Plains, N.Y., GY28-6618.
2606: .ti-9
2607: JOHN74
2608: Johnson, S.C.,
2609: "YACC, Yet Another Compiler-Compiler",
2610: UNIX Programmer's Manual, Bell Telephone Labs, Murray Hill, N.J., July 1974.
2611: .ti-9
2612: MCDO75a
2613: McDonald, N. and Stonebraker, M.,
2614: "Cupid -- The Friendly Query Language",
2615: Proc. ACM-Pacific-75, San Francisco, California, April 1975.
2616: .ti-9
2617: MCDO75b
2618: McDonald, Nancy.,
2619: "CUPID: A Graphics Oriented Facility for Support of Non-programmer Interactions with a Data Base",
2620: Ph.D. Thesis, Dept. of Electrical Engineering and Computer Science,
2621: University of California, Berkeley, 1975.
2622: .ti-9
2623: RITC74
2624: Ritchie, D.M. and Thompson, K.
2625: "The UNIX Tine-Sharing System,"
2626: CACM,
2627: Vol. 17, No. 3., March, 1974.
2628: .ti-9
2629: SCHO75
2630: Schoenberg, Iris,
2631: "Implementation of Integrity Constraints in the Relational Data Base Management System, INGRES",
2632: M.S. Thesis, Dept. of Electrical Engineering and Computer Science,
2633: University of California, Berkeley, 1975.
2634: .ti-9
2635: STON74a
2636: Stonebraker, M., "A Functional View of Data Independence,"
2637: Proc. 1974 SIGFIDET Workshop on Data
2638: Description, Access and Control, Ann Arbor, Mich., May 1974.
2639: .ti-9
2640: STON74b
2641: Stonebraker, M. and Wong, E.,
2642: "Access Control in a Relational Data Base Management System by Query Modification",
2643: Proc. 1974 ACM National Conference, San Diego, Ca., Nov. 1974
2644: .ti-9
2645: STON74c Stonebraker, M., "High Level Integrity
2646: Assurance in Relational Data Base Systems",
2647: University of California, Electronics Research
2648: Laboratory, Memo ERL-M473, August, 1974.
2649: .ti-9
2650: STON75
2651: Stonebraker, M., "Implementation of Integrity Constraints and
2652: Views by Query Modification", Proc 1975 SIGMOD Workshop
2653: on Management of Data, San Jose, Ca., May 1975.
2654: .ti-9
2655: STON76
2656: Stonebraker, M. and Rubinstein, P., "The INGRES Protection System,"
2657: Proc. 1976 ACM National Conference, Houston, Texas, October, 1976.
2658: .ti-9
2659: TSIC75
2660: Tsichritzis, D.,
2661: "A Network Framework for Relational Implementation",
2662: University of Toronto, Computer Systems Research Group Report CSRG-51, Feb. 1975
2663: .ti-9
2664: WONG76
2665: Wong, E. and Youssefi, K.,
2666: "Decomposition-A Strategy for Query Processing,"
2667: TODS, (this issue).
2668: .ti-9
2669: ZOOK76
2670: Zook, W., et. al.,
2671: "INGRES - Reference Manual (Version 5)",
2672: University of California, Electronics Research
2673: Laboratory, Memo. No. ERL-M585, April, 1976.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.