42BSD/ingres/doc/other/maintain.nr - annotate

Return to maintain.nr CVS log
Up to [CSRG BSD Unix] / 42BSD / ingres / doc / other
Annotation of 42BSD/ingres/doc/other/maintain.nr, revision 1.1.1.1

1.1       root        1: .ds HE 'DATA BASE'INGRES'Page %'
                      2: .so nmacs
                      3: .hx
                      4: .rs
                      5: .sp 8
                      6: .ce
                      7: CREATING AND MAINTAINING A DATABASE USING INGRES
                      8: .sp 15
                      9: .ce
                     10: by
                     11: .ce
                     12: Robert Epstein
                     13: .sp 18
                     14: .ce 2
                     15: Memorandum No. ERL - M77-71
                     16: December 16, 1977
                     17: .sp 3
                     18: .ce 4
                     19: Electronics Research Laboratory
                     20: College of Engineering
                     21: University of California, Berkeley
                     22: 94720
                     23: .bp 1
                     24: .ce
                     25: .bl + 5
                     26: CREATING AND MAINTAINING A DATABASE USING INGRES
                     27: .sp 3
                     28: 1.   INTRODUCTION
                     29: .sp
                     30: In this paper we describe how to create,
                     31: structure and maintain relations
                     32: in INGRES.
                     33: It is assumed that the reader
                     34: is familiar with INGRES
                     35: and understands QUEL, the INGRES
                     36: query language.
                     37: It is strongly suggested that the
                     38: document "A Tutorial on INGRES"
                     39: (ERL M77/25) be read first.
                     40: .sp 1
                     41: This paper is divided into six sections
                     42: .in +5
                     43: .sp 1
                     44: 1.  Introduction
                     45: .sp 1
                     46: 2.  Creating a Relation
                     47: .sp 1
                     48: 3.  Using Copy
                     49: .sp 1
                     50: 4.  Storage Structures
                     51: .sp 1
                     52: 5.  Secondary Indices
                     53: .sp 1
                     54: 6.  Recovery and Data Update
                     55: .in -5
                     56: .sp 1
                     57: To create a new data base you must be a valid
                     58: INGRES user and have "create data base"
                     59: permission.
                     60: These permissions are granted by the
                     61: "ingres" superuser.
                     62: If you pass those two requirements you can create
                     63: a data base using the command to the
                     64: Unix shell:
                     65: .sp 1
                     66: % creatdb mydata
                     67: .sp 1
                     68: where "mydata" is the name of the data base.
                     69: You become the "data base administrator" (DBA) for
                     70: mydata.
                     71: As the DBA you have certain special
                     72: powers.
                     73: 
                     74: .in +4
                     75: .ti -5
                     76: 1.  Any relation created by you can be accessed
                     77: by anyone else using "mydata".
                     78: If any other user creates a relation it is
                     79: strictly private and cannot
                     80: be accessed by the DBA or any other user.
                     81: 
                     82: .ti -5
                     83: 2.  You can use the "-u" flag in ingres
                     84: and printr. 
                     85: This enables you to use ingres on "mydata"
                     86: with someone else's id.
                     87: Refer to the INGRES reference
                     88: manual under sections
                     89: ingres(unix) and users(files)
                     90: for details.
                     91: 
                     92: .ti -5
                     93: 3.  You can run sysmod, restore and purge
                     94: on "mydata".
                     95: 
                     96: .ti -5
                     97: 4.  The data base by default is created to
                     98: allow for multiple concurrent users.
                     99: If only one user will ever use the
                    100: data base at a time,
                    101: the data base administrator can
                    102: turn off the concurrency control.
                    103: Refer to creatdb(unix) in the INGRES
                    104: reference manual.
                    105: .in -4
                    106: .sp 1
                    107: Once a data base has been created you should
                    108: immediately run
                    109: .sp 1
                    110: % sysmod mydata
                    111: .sp 1
                    112: This program will convert
                    113: the system relations to their
                    114: "best" structure for use in INGRES.
                    115: Sysmod will be explained further in
                    116: section 4.
                    117: .sp 1
                    118: As a DBA or as a user you can create and
                    119: structure new relations in any data
                    120: base to which you have access.
                    121: The remainder of this paper describes how this is done.
                    122: .bp
                    123: 2.   CREATING NEW RELATIONS IN INGRES
                    124: .sp
                    125: There are two ways to create new relations in INGRES.
                    126: .sp 1
                    127: .ti +5
                    128: create
                    129: .br
                    130: .ti +5
                    131: retrieve into
                    132: .sp 1
                    133: "Retrieve into" is used to form a new relation from one or
                    134: more existing relations.
                    135: "Create" is used to create
                    136: a new relation with no tuples in it.
                    137: .sp 1
                    138: example 1:
                    139: .sp 1
                    140: .ti +5
                    141: range of p is parts
                    142: .ti +5
                    143: range of s is supply
                    144: .ti +5
                    145: retrieve into newsupply(
                    146: .ti +19
                    147: number = s.snum,
                    148: .ti +19
                    149: p.pname,
                    150: .ti +19
                    151: s.shipdate)
                    152: .ti +5
                    153: where s.pnum = p.pnum
                    154: .sp 1
                    155: example 2:
                    156: .sp 1
                    157: .ti +5
                    158: create newsupply(
                    159: .ti +12
                    160: number = i2,
                    161: .ti +12
                    162: pname = c20,
                    163: .ti +12
                    164: shipdate = c8)
                    165: .sp 1
                    166: In example 1 INGRES creates a new relation called
                    167: "newsupply", computing what the format
                    168: of each domain should be.
                    169: The query is then run and newsupply is
                    170: modified to "cheapsort".
                    171: (This will be covered in more detail in section 4.)
                    172: .sp 1
                    173: In example 2 "newsupply"
                    174: is created and the name and
                    175: format for each domain is given.
                    176: The format types which are
                    177: allowed are:
                    178: .sp 1
                    179: .in +5
                    180: .nf
                    181: i1             1  byte integer
                    182: i2             2   "      "
                    183: i4             4   "      "
                    184: f4             4  byte floating point number
                    185: f8             8   "      "       "     "
                    186: c1,c2,..,c255  1,2,..,255 byte character
                    187: .in -5
                    188: .fi
                    189: .sp 1
                    190: In example 2, the width of an individual
                    191: tuple is 30 bytes
                    192: (2 + 20 + 8), and the
                    193: relation has three domains.
                    194: Beware that INGRES
                    195: has limits.
                    196: A relation cannot have more
                    197: than 49 domains and the tuple width
                    198: cannot exceed 498 bytes.
                    199: .sp 1
                    200: UNIX allocates space on a disk
                    201: in units of 512 byte pages.
                    202: INGRES gets a performance advantage
                    203: by doing I/O in one block units.
                    204: Therefore relations are divided into 512 byte pages.
                    205: INGRES never splits a tuple
                    206: between two pages.
                    207: Thus some space can be wasted.
                    208: There is an overhead of 12 bytes per page plus
                    209: 2 bytes for every tuple on the page.
                    210: The formulas are:
                    211: .sp 1
                    212: .ti +5
                    213: number tuples per page = 500/(tuple width + 2)
                    214: .sp 1
                    215: .ti +5
                    216: wasted space = 500 - number of tuples per page
                    217: .ti +5
                    218: *(tuple width +2)
                    219: .sp 1
                    220: For our example there are
                    221: .sp 1
                    222: .ti +5
                    223: 22 = 500/(20 + 2)
                    224: .sp 1
                    225: .ti +5
                    226: 16 = 500 - 22 * (20 + 2)
                    227: .sp 1
                    228: 22 tuples per page and 16 bytes
                    229: wasted per page.
                    230: These computations are valid
                    231: only for uncompressed relations.
                    232: We will return to this subject
                    233: in section 4 when we discuss compression.
                    234: .sp 1
                    235: If you forget a domain name or
                    236: format, use the "help" command.
                    237: For example if you gave the INGRES
                    238: command:
                    239: .sp 1
                    240: .ti +5
                    241: help newsupply
                    242: .sp 1
                    243: the following would be printed:
                    244: .nf
                    245: 
                    246: Relation:              newsupply
                    247: Owner:                 bob
                    248: Tuple width:           30
                    249: Saved until:           Thu Nov 10 16:17:06 1977
                    250: Number of tuples:      0
                    251: Storage structure:     paged heap
                    252: Relation type:         user relation
                    253: 
                    254:  attribute name    type  length  keyno.
                    255: 
                    256:  number            i       2
                    257:  pname             c      20
                    258:  shipdate          c       8
                    259: 
                    260: .fi
                    261: Notice that every relation has an expiration
                    262: date.
                    263: This is set to be one week
                    264: from the time when it was
                    265: created.
                    266: The "save" command
                    267: can be used to save the relation longer.
                    268: See "save(quel)" and "purge(unix)" in the 
                    269: INGRES reference manual.
                    270: 
                    271: .bp
                    272: 3.   COPYING DATA TO AND FROM INGRES
                    273: .sp
                    274: Once a relation is created, there are two mechanisms for
                    275: inserting new data:
                    276: .sp
                    277: .in +5
                    278: append command
                    279: .br
                    280: copy command
                    281: .in -5
                    282: .sp
                    283: Append is used to insert tuples one at a time,
                    284: or for filling one relation from other relations.
                    285: .sp
                    286: Copy is used for copying data from a UNIX file into
                    287: a relation. 
                    288: It is used for copying data from another program, or for copying
                    289: data from another system.  
                    290: It is also the most convenient way to copy any data
                    291: larger than a few tuples.
                    292: .sp
                    293: Let's begin by creating a simple relation
                    294: and loading data into it.
                    295: .sp
                    296: Example:
                    297: .sp
                    298: .ti +5
                    299: .nf
                    300: create donation (name = c10, amount = f4, ext = i2)
                    301: .sp
                    302: .fi
                    303: Now suppose we have two people to enter.  
                    304: The simplest procedure is probably
                    305: to run the two queries in INGRES using
                    306: the append command.
                    307: .sp
                    308: .ti 5
                    309: .nf
                    310: append to donation (name="frank",amount = 5,ext = 204)
                    311: .sp 1
                    312: .ti 5
                    313: append to donation (name="harry",ext = 209,amount = 4.50)
                    314: .fi
                    315: .sp
                    316: Note that the order in which the domains are given
                    317: does not matter.
                    318: INGRES matches by recognizing attribute names and
                    319: does not care in what order attributes
                    320: are listed.
                    321: Here is what the relation "donation" looks like now:
                    322: .nf
                    323: 
                    324: donation relation
                    325: 
                    326: |name      |amount    |ext   |
                    327: |----------------------------|
                    328: |frank     |5.000     |204   |
                    329: |harry     |4.500     |209   |
                    330: |----------------------------|
                    331: .fi
                    332: .sp
                    333: We now have two people entered into
                    334: the donation relation.
                    335: Suppose we had fifty more to enter.
                    336: Using the append command is far too tedious
                    337: since so much typing is involved for each tuple.
                    338: The copy command will better suit our purposes.
                    339: .sp
                    340: Copy can take data from a regular
                    341: Unix file in a variety of formats and
                    342: append it to a relation.
                    343: To use the copy command first create
                    344: a Unix file (typically using "ed") containing
                    345: the data.
                    346: .sp
                    347: For example, let's put five new names in a file
                    348: using the editor.
                    349: .sp
                    350: .nf
                    351: .tr Z.
                    352: % ed
                    353: a
                    354: bill,3.50,302
                    355: sam,10.00,410
                    356: susan,,100
                    357: sally,.5,305
                    358: george,4.00,302
                    359: Z
                    360: w newdom
                    361: 68
                    362: q
                    363: %
                    364: .tr ZZ
                    365: .sp 
                    366: .fi
                    367: The format of the above file is a
                    368: name followed by a comma, followed 
                    369: by the amount, then a comma, then the extension,
                    370: and finally a newline.
                    371: Null entries, for example the amount
                    372: for susan, are perfectly
                    373: legal and default to zero
                    374: for numerical domains and
                    375: blanks for character domains.
                    376: .sp
                    377: To use copy we enter INGRES and give the copy command.
                    378: .sp
                    379: .in +5
                    380: .nf
                    381: copy donation (name = c0, amount = c0, ext = c0)
                    382:        from "/mnt/bob/newdom"
                    383: .fi
                    384: .sp
                    385: .in -5
                    386: Here is how the copy command works:
                    387: .sp
                    388: .ti +5
                    389: copy relname (list of what to copy) from "full pathname"
                    390: .sp
                    391: In the case above we wrote:
                    392: .sp
                    393: .ti +5
                    394: copy donation (. . .) from "/mnt/bob/newdom"
                    395: .sp
                    396: Although amount and ext are stored in the relation
                    397: as f4 (floating point) and i2 (integer), in the
                    398: Unix file they were entered as characters.
                    399: In specifying the format of the domain,
                    400: copy accepts:
                    401: .sp
                    402: .ti +5
                    403: domain = format
                    404: .sp
                    405: where domain is the domain name and
                    406: the format in the UNIX file is one of
                    407: .sp
                    408: .in +5
                    409: .nf
                    410: i1, i2, i4         (true binary integer of size 1, 2, or 4)
                    411: .br
                    412: f4, f8             (true binary float point of size 4 or 8)
                    413: .br
                    414: c1, c2, c3,...c255 (a fixed length character string)
                    415: .br
                    416: c0                 (a variable length character string de-
                    417:                    limited by a comma, tab or new line)
                    418: .sp
                    419: .in -5
                    420: .fi
                    421: In the example we use
                    422: .sp
                    423: .ti +5
                    424: name = c0, amount = c0, extension = c0
                    425: .sp
                    426: This means that each of the domains
                    427: was stored in the Unix file as
                    428: variable length character
                    429: strings.
                    430: Copy takes the first comma,
                    431: tab, or new line character
                    432: as the end of the string.
                    433: This by far is the most
                    434: common use of copy
                    435: when the data is being entered
                    436: into a relation
                    437: for the first time.
                    438: .sp
                    439: Copy can also be used to copy data from a relation
                    440: into a Unix file.
                    441: For example:
                    442: .sp
                    443: .in +5
                    444: .nf
                    445: copy donation (name = c10, amount = c10, ext = c5)
                    446:         into "/mnt/bob/data"
                    447: .fi
                    448: .in -5
                    449: .sp
                    450: This will cause the following to happen:
                    451: .sp 1
                    452: .in +4
                    453: .ti -5
                    454: 1.  If the file /mnt/bob/data already exists it will
                    455: be destroyed.
                    456: .ti -5
                    457: .sp 1
                    458: 2.  The file is created in mode 600 (read/write by you only)
                    459: .sp 1
                    460: .ti -5
                    461: 3.  Name will be copied as a 10 character field,
                    462: immediately followed by amount,
                    463: immediately followed by ext.
                    464: Amount will be converted to a character
                    465: field 10 characters wide.
                    466: Ext will be converted to a character
                    467: field 5 characters wide.
                    468: .in -4
                    469: .sp 1
                    470: The file "/mnt/bob/data" would be a stream of characters
                    471: looking like this:
                    472: 
                    473: .tr Z 
                    474: .nf
                    475: frankZZZZZZZZZZ5.000ZZ204harryZZZZZZZZZZ4.500ZZ209bill
                    476: ZZZZZZZZZZZ3.500ZZ302samZZZZZZZZZZZ10.000ZZ410susanZZZ
                    477: ZZZZZZZ0.000ZZ100sallyZZZZZZZZZZ0.500ZZ305georgeZZZZZZ
                    478: ZZZ4.000ZZ302
                    479: .fi
                    480: .tr ZZ
                    481: 
                    482: .sp
                    483: The output was broken into four lines to
                    484: make it fit on this page.
                    485: In actuality the file
                    486: is a single line.
                    487: Another example:
                    488: .sp
                    489: .in +5
                    490: .nf
                    491: copy (name = c0, colon = d1, ext = c0, comma = d1
                    492:        amt = c0, nl = d1) into "/mnt/bob/data"
                    493: .fi
                    494: .in -5
                    495: .sp
                    496: In this example "c0" is interpreted to mean "use
                    497: the appropriate character format".
                    498: For character domains it is the
                    499: width of the domain.
                    500: Numeric domains are converted to characters
                    501: according to the INGRES defaults
                    502: (see ingres(unix)).
                    503: .sp
                    504: The statements:
                    505: .sp
                    506: .in +5
                    507: colon = d1
                    508: .br
                    509: comma = d1
                    510: .br
                    511: nl = d1
                    512: .in -5
                    513: .sp
                    514: are used to insert one colon,
                    515: comma, and newline into the file.
                    516: The format "d1" is interpreted to mean
                    517: one dummy character.
                    518: When copying into a Unix file,
                    519: a selected set of characters can be inserted into
                    520: the file using this
                    521: "dummy domain" specification.
                    522: Here is what the file "/mnt/bob/data" would look like:
                    523: 
                    524: .nf
                    525: frank     :   204,     5.000
                    526: harry     :   209,     4.500
                    527: bill      :   302,     3.500
                    528: sam       :   410,    10.000
                    529: susan     :   100,     0.000
                    530: sally     :   305,     0.500
                    531: george    :   302,     4.000
                    532: 
                    533: .fi
                    534: .sp
                    535: If you wanted a file with the true binary representation
                    536: of the numbers you would use:
                    537: .sp
                    538: .ti +5
                    539: copy (name = c10, amount = f4, ext = i2)
                    540: .sp
                    541: This would create a file with the exact
                    542: copy of each tuple,
                    543: one after the other.
                    544: This is frequently desireable for
                    545: temporary backup purposes
                    546: and it guarantees that floating
                    547: point domains will be exact.
                    548: .sp 2
                    549: TYPICAL ERRORS
                    550: .sp 1
                    551: There are 17 different errors
                    552: that can occur in copy.
                    553: We will go through the most
                    554: common ones.
                    555: .sp 1
                    556: Suppose you have a file with
                    557: .sp 1
                    558: bill,3.5,302
                    559: .br
                    560: sam,10,410,
                    561: .br
                    562: susan,3,100
                    563: .sp 1
                    564: and run the copy command
                    565: .sp 1
                    566: .in +5
                    567: .nf
                    568: copy donation (name = c0, amount = c0, ext = c0)
                    569:        from "/mnt/bob/data"
                    570: .fi
                    571: .in -5
                    572: .sp 1
                    573: You would get the error message
                    574: .sp 1
                    575: .nf
                    576: 5809: COPY: bad input string for domain amount. Input was "susan".
                    577: There were 2 tuples sucessfully copied from /mnt/bob/data into
                    578: donation.
                    579: .fi
                    580: .sp 1
                    581: What happened is that line 2 had an extra
                    582: comma.
                    583: The first two tuples were copied correctly.
                    584: For the next tuple, name = "" (blank), amount =
                    585: "susan", and ext = "3".  
                    586: Since "susan" is not a proper floating point
                    587: number, an error was generated and
                    588: processing was stopped after two tuples.
                    589: .sp 1
                    590: If you tried to copy the file with a file
                    591: such as
                    592: .sp 1
                    593: nancy,5.0,35000
                    594: .sp 1
                    595: you would get the error message
                    596: .sp 1
                    597: .nf
                    598: 5809: COPY: bad input string for domain ext. Input was "35000".
                    599: There were 0 tuples successfully copied from /mnt/bob/data into
                    600: donation.
                    601: .fi
                    602: .sp 1
                    603: Here, since ext is an i2 (integer) domain,
                    604: it cannot exceed the value 32767.
                    605: .sp 1
                    606: There are numerous other error messages,
                    607: most of which are self-explanatory.
                    608: .sp 1
                    609: In addition there are three, non-fatal warnings
                    610: which may appear on a copy "from".
                    611: .sp 1
                    612: If you are copying from
                    613: a file into a relation which
                    614: is ISAM or hash, a count
                    615: of the number of duplicate
                    616: tuples will appear, (if there were
                    617: any).
                    618: This will never appear on a "heap"
                    619: because no duplicate checking
                    620: is performed.
                    621: .sp 1
                    622: INGRES does not allow
                    623: control characters
                    624: (such as "bell" etc.)
                    625: to be stored.
                    626: If copy reads any control characters, it converts them
                    627: to blanks and reports the number
                    628: of domains that had control characters in them.
                    629: .sp 1
                    630: If you are copying using the c0
                    631: option, copy will
                    632: report if any character strings were
                    633: longer than their domains
                    634: and had to be truncated.
                    635: .sp 2
                    636: SPECIAL FEATURES
                    637: .sp 1
                    638: .ti +3
                    639: There are a few special functions that
                    640: make copy a little
                    641: easier to use
                    642: .sp 1
                    643: .nr in 6n
                    644: .ti -4
                    645: 1.  Bulk copy
                    646: .sp 1
                    647: If you ask for:
                    648: .sp 1
                    649: .ti +4
                    650: copy relname () from "file"
                    651: .ti +8
                    652: or
                    653: .ti +4
                    654: copy relname () into "file"
                    655: .sp 1
                    656: copy expands the statement to mean:
                    657: .sp 1
                    658: .in +5
                    659: copy each domain in its proper
                    660: order according to its proper
                    661: format.  
                    662: .sp 1
                    663: .in -5
                    664: So, if you said
                    665: .sp 1
                    666: .ti +4
                    667: copy donation () into "/mnt/bob/donation"
                    668: .sp 1
                    669: it would be the same as asking for:
                    670: .sp 1
                    671: .ti +4
                    672: .nf
                    673: copy donation (name = c10, amount = f4, ext = i2)
                    674:        into "/mnt/bob/donation"
                    675: .fi
                    676: .sp 1
                    677: This provides a convenient way to copy
                    678: whole relations to and from INGRES.
                    679: .sp 1
                    680: .ti -4
                    681: 2.  Dummy Domains
                    682: .sp 1
                    683: If you are copying data
                    684: from another computer or program,
                    685: frequently there will be
                    686: a portion of data that you will want to
                    687: ignore.
                    688: This can be done using the
                    689: dummy domain specifications
                    690: d0, d1, d2 ... d511.
                    691: For example
                    692: .sp 1
                    693: .ti +4
                    694: .nf
                    695: copy rel (dom1 = c5, dummy = d2, dom2 = i4,
                    696:        dumb = d0) from "/mnt/me/data"
                    697: .fi
                    698: .sp 1
                    699: The first five characters
                    700: are put in dom1,
                    701: the next two characters are ignored.
                    702: The next four bytes are
                    703: an i4 (integer) and go in dom2,
                    704: and the remaining delimited string
                    705: is ignored.
                    706: The name given to a dummy specifier is
                    707: ignored.
                    708: .sp 1
                    709: As mentioned previously,
                    710: dummy domains can be used on a copy
                    711: "into" a Unix file for inserting
                    712: special characters.
                    713: The list of recognizable names includes:
                    714: .sp 1
                    715: .in +5
                    716: .nf
                    717: nl        newline
                    718: tab       tab character
                    719: sp        space
                    720: nul       a zero byte
                    721: null      a zero byte
                    722: comma     ,
                    723: dash      -
                    724: colon     :
                    725: lparen    (
                    726: rparen    )
                    727: .fi
                    728: .in -5
                    729: .sp 1
                    730: .ti -4
                    731: 3.  Truncation
                    732: .sp 1
                    733: It is not uncommon to have a mistake occur
                    734: and need to start over.
                    735: The simplest way to do that
                    736: is to "truncate" the relation.
                    737: This is done by the command:
                    738: .sp 1
                    739: .ti +4
                    740: modify relname to truncated
                    741: .sp 1
                    742: This has the effect of removing
                    743: all tuples in relname,
                    744: releasing all disk space,
                    745: and making relname a heap again.
                    746: It is the logical equivalent of
                    747: a destroy followed by a create
                    748: (but with a lot less typing).
                    749: .sp 1
                    750: Since formatting mistakes are possible
                    751: with copy,
                    752: it is not generally a good idea to
                    753: copy data into a relation that already
                    754: has valid data in it.
                    755: The best procedure is to create a
                    756: temporary relation with the same domains
                    757: as the existing relation.
                    758: Copy data into the temporary relation
                    759: and then append it to the real relation.
                    760: For example:
                    761: 
                    762: .in +8
                    763: .nf
                    764: create tempdom(name=c10,amount=f4,ext=i2)
                    765: 
                    766: copy tempdom(name=c0,amount=c0,ext=c0)
                    767: from "/mnt/bob/data"
                    768: 
                    769: range of td is tempdom
                    770: append to donation(td.all)
                    771: .fi
                    772: .in -8
                    773: .sp 1
                    774: 4.  Specifing Delimitors.
                    775: .sp 1
                    776: Sometimes it is desirable to specify
                    777: what the delimiting character should be
                    778: on a copy "from" a file.
                    779: This can be done by specifing:
                    780: 
                    781: .ti +8
                    782: domain = c0delim
                    783: 
                    784: where "delim" is a valid delimitor
                    785: taken from the list of recognizable names.
                    786: This list was summarized on the
                    787: previous page under "dummy domains".
                    788: For example:
                    789: 
                    790: .ti +8
                    791: copy donation (name = c0nl) from "/mnt/me/data"
                    792: 
                    793: will copy names from the file to the relation.
                    794: Only a new line will delimit the names so
                    795: any commas or tabs will be passed along as
                    796: part of the name.
                    797: 
                    798: When copying "into" a Unix file,
                    799: the "delim" is actually written into the
                    800: file,
                    801: so on a copy "into" the specification:
                    802: 
                    803: .ti +8
                    804: copy donation (name = c0nl) into "/mnt/me/file"
                    805: 
                    806: will cause "name" to be written followed by a new line
                    807: character.
                    808: .nr in 0
                    809: .bp
                    810: 4.   CHOOSING THE BEST STORAGE STRUCTURES
                    811: .sp 1
                    812: .sp
                    813: We now turn to the issue of efficiency.
                    814: Once you have created a relation
                    815: and inserted your
                    816: data using either copy or append,
                    817: INGRES can process any query
                    818: on the relation.
                    819: There are several things you can do
                    820: to improve the speed at which INGRES
                    821: can process a query.
                    822: .sp
                    823: INGRES can store a relation in three different
                    824: internal
                    825: structures.
                    826: These are called "heap",
                    827: "isam", and "hash".
                    828: First we will briefly describe each
                    829: structure and then later expand our
                    830: discussion.
                    831: .sp
                    832: HEAP
                    833: .sp 1
                    834: When a relation is first created, it is
                    835: created as a "heap".
                    836: There are two important properties about a heap:
                    837: duplicate tuples are not removed,
                    838: and nothing is known about the location of the tuples.
                    839: If you ran the query:
                    840: .sp 1
                    841: .ti +5
                    842: range of d is donation
                    843: .br
                    844: .ti +5
                    845: retrieve (d.amount) where d.name = "bill"
                    846: .sp 1
                    847: INGRES would have to read every tuple in the
                    848: relation looking for those with name "bill".
                    849: If the relation is small this isn't a 
                    850: serious matter.
                    851: But if the relation is very large, this can take
                    852: minutes (or even hours!).
                    853: .sp 1
                    854: HASH
                    855: .sp 1
                    856: A relation whose structure is "hash" can give fast
                    857: access to searches on certain domains.
                    858: (Those domains are usually referred to as
                    859: "keyed domains".)
                    860: In addition, a "hashed" relation contains
                    861: no duplicate tuples.
                    862: For example, suppose the donation relation is stored hashed on
                    863: domain "name".  
                    864: Then the query:
                    865: .sp 1
                    866: .ti +5
                    867: retrieve (d.amount) where d.name = "bill"
                    868: .sp 1
                    869: will run quickly
                    870: since INGRES knows approximately where on
                    871: disk the tuple is stored.
                    872: If the relation contains only a few tuples you
                    873: won't notice the difference between a "heap"
                    874: and a "hash" structure.
                    875: But as the relation becomes larger, the
                    876: difference in speed becomes
                    877: much more noticeable.
                    878: .sp 1
                    879: ISAM
                    880: .sp 1
                    881: An isam structure is one where the relation is
                    882: sorted on one or more domains,
                    883: (also called keyed domains).
                    884: Duplicates are also removed on "isam relations".
                    885: When new tuples are appended they are
                    886: placed "approximately" in their sorted position in the
                    887: relation.
                    888: (The "approximately" will be explained a bit
                    889: later.)
                    890: .sp 1
                    891: Suppose donation is isam on name.
                    892: To process the query
                    893: .sp 1
                    894: .ti +5
                    895: retrieve (d.amount) where d.name = "bill"
                    896: .sp 1
                    897: INGRES will determine where in the sorted order
                    898: the name "bill" would be and read only
                    899: those portions of the relation.
                    900: .sp 1
                    901: Since the relation is approximately sorted,
                    902: an isam structure is also efficient for
                    903: processing the query:
                    904: .ti +5
                    905: .sp 1
                    906: retrieve (d.amount) where d.name >= "b" and d.name < "g"
                    907: .sp 1
                    908: This query would retrieve all names beginning
                    909: with "b" through "f".
                    910: The entire relation would not have to be
                    911: searched since it is isam on name.
                    912: .sp 2
                    913: SPECIFYING THE STORAGE STRUCTURE
                    914: .sp
                    915: Any user created relation can be converted
                    916: to any storage structure using the
                    917: "modify" command.
                    918: For example
                    919: .sp
                    920: .ti 5
                    921: modify donation to hash on name
                    922: .br
                    923: or
                    924: .br
                    925: .ti 5
                    926: modify donation to isam on name
                    927: .sp
                    928: or even
                    929: .sp
                    930: .ti 5
                    931: modify donation to heap
                    932: .sp 2
                    933: PRIMARY AND OVERFLOW PAGES
                    934: .sp
                    935: At this point it is necessary to introduce the
                    936: concepts of primary and overflow pages on
                    937: hash and isam structures.
                    938: Both hash and isam are techniques for assigning
                    939: specific tuples to specific pages of a relation
                    940: based on the tuple's keyed domains.
                    941: Thus each page will contain only a certain
                    942: specified subset of the relation.
                    943: 
                    944: When a new tuple is appended to a hash or isam
                    945: relation, INGRES
                    946: first determines what page it belongs to,
                    947: and then looks for room on that page.
                    948: If there is space then the tuple
                    949: is placed on that page.
                    950: If not,
                    951: then an "overflow" page is created and
                    952: the tuple is placed there.
                    953: 
                    954: The overflow page is linked to the
                    955: original page.
                    956: The original page is called the "primary"
                    957: page.
                    958: If the overflow page became full,
                    959: then INGRES
                    960: would connect an overflow page to it.
                    961: We would then have one primary page
                    962: linked to an overflow page,
                    963: linked to another overflow page.
                    964: Overflow pages are dynamically added as
                    965: needed.
                    966: .sp 2
                    967: SPECIFYING FREE SPACE
                    968: .sp
                    969: The modify command also lets you specify how much
                    970: room to leave for the relation to grow.
                    971: As was mentiond in "create",
                    972: relations are divided into pages.
                    973: A "fillfactor" can be used to specify how
                    974: full to make each primary page.
                    975: This decision should be based
                    976: only on whether more tuples 
                    977: will be appended to the relation.
                    978: For example:
                    979: .sp
                    980: .ti 5
                    981: .nf
                    982: modify donation to isam on name where fillfactor = 100
                    983: .fi
                    984: .sp
                    985: This tells modify to make each page 100% full
                    986: if at all possible.
                    987: .sp
                    988: .ti 5
                    989: .nf
                    990: modify donation to isam on name where fillfactor = 25
                    991: .fi
                    992: .sp
                    993: This will leave each page 25% full or, in other words,
                    994: 75% empty.
                    995: We would do this if we had roughly 1/4 of the
                    996: data already loaded and it was fairly well distributed
                    997: about the alphabet.
                    998: .sp
                    999: Keep in mind that if you don't specify the fillfactor,
                   1000: INGRES will typically default to a reasonable choice.
                   1001: Also when a page becomes full, INGRES
                   1002: automatically creates an "overflow"
                   1003: page so it is never the case that a relation
                   1004: will be unable to expand.  
                   1005: .sp
                   1006: When modifying a relation
                   1007: to hash, an additional
                   1008: parameter "minpages" can
                   1009: be specified.
                   1010: Modify will guarantee
                   1011: that at least "minpage" primary pages will be allocated
                   1012: for the relation.
                   1013: .sp
                   1014: Modify computes how may primary pages will be
                   1015: needed to store the existing tuples at 
                   1016: the specified fillfactor
                   1017: assuming that no overflow pages will be necessary originally.
                   1018: If that number is less than
                   1019: minpages, then minpages is used instead.
                   1020: .sp
                   1021: For example:
                   1022: .sp
                   1023: .ti 5
                   1024: .nf
                   1025: modify donation to hash on name where fillfactor = 50,
                   1026: .ti 10
                   1027: minpages = 1
                   1028: .sp 1
                   1029: .ti 5
                   1030: modify donation to hash on name where minpages = 150
                   1031: .fi
                   1032: .sp
                   1033: In the first case we guarantee that no more
                   1034: pages than are necessary will be
                   1035: used for 50% occupancy.
                   1036: The second case is typically
                   1037: used for modifying an empty or near
                   1038: empty relation.
                   1039: If the approximate maximum
                   1040: size of the relation is known in advance,
                   1041: minpages
                   1042: can be used to guarantee that the relation will
                   1043: have its expected maximum size.
                   1044: .sp
                   1045: There is one other option available for hash called
                   1046: "maxpages".
                   1047: Its syntax is the same as minpages.
                   1048: It can be used to specify the maximum
                   1049: number of primary pages to use.
                   1050: .sp
                   1051: COMPRESSION
                   1052: .sp 1
                   1053: The three storage structures
                   1054: (heap, hash, isam) can optionally
                   1055: have "compression" applied
                   1056: to them.
                   1057: To do this, refer to the
                   1058: storage structures as cheap, chash, and cisam.
                   1059: Compression reduces
                   1060: the amount of space needed to store each tuple
                   1061: internally.
                   1062: The current compression technique is to
                   1063: suppress trailing blanks in 
                   1064: character domains.
                   1065: Using compression will never
                   1066: require more space and typically
                   1067: it can save disk space and improve
                   1068: performance.
                   1069: Here is an example:
                   1070: .sp 1
                   1071: .nf
                   1072: .ti +5
                   1073: modify donation to cisam on name where fillfactor = 100
                   1074: .fi
                   1075: .sp 1
                   1076: This will make donation a compressed isam 
                   1077: structure and fill every page as
                   1078: full as possible.
                   1079: With compression, each tuple
                   1080: can have a different compressed
                   1081: length.
                   1082: Thus the number of tuples
                   1083: that can fit on one page will
                   1084: depend on how successfully
                   1085: they can be compressed.
                   1086: 
                   1087: Compressed relations can be more expensive to update.
                   1088: In particular if a replace is done on one or
                   1089: more domains and the compressed tuple is no
                   1090: longer the same length,
                   1091: then INGRES must look for a new place to put the tuple.
                   1092: .sp 2
                   1093: TWO VARIATIONS ON A THEME
                   1094: .sp
                   1095: As mentioned, duplicates are not removed
                   1096: from a relation stored
                   1097: as a heap.
                   1098: Frequently it is desirable
                   1099: to remove duplicates and sort
                   1100: a heap relation.
                   1101: One way of doing this is to modify the
                   1102: relation to isam specifying
                   1103: the order in which to sort
                   1104: the relation.
                   1105: An alternative to this is to use either
                   1106: "heapsort" or "cheapsort".
                   1107: For example
                   1108: .sp
                   1109: .ti 5
                   1110: .nf
                   1111: modify donation to heapsort on name, ext
                   1112: .fi
                   1113: .sp
                   1114: This will sort the relation by
                   1115: name then ext.
                   1116: The tuples are further sorted on the
                   1117: remaining domains,
                   1118: in the order they were listed in the
                   1119: original create statement.
                   1120: So in this case the relation will be
                   1121: sorted on name then ext and then amount.
                   1122: Duplicate tuples are always removed.
                   1123: The relation will be left
                   1124: as a heap.
                   1125: Heapsort and cheapsort are intended
                   1126: for sorting a temporary relation before printing and
                   1127: destroying it.
                   1128: It is more efficient than modifying
                   1129: to isam because
                   1130: with isam INGRES creates a
                   1131: "directory" containing
                   1132: key information about each page.
                   1133: The relation will NOT be kept sorted
                   1134: when further updates occur.
                   1135: .sp
                   1136: Examples:
                   1137: .sp
                   1138: .nr in 2n
                   1139: Here are a collection of examples
                   1140: and comments as to the efficiency of 
                   1141: each query.
                   1142: The queries are based on the
                   1143: relations:
                   1144: .(l
                   1145: parts(pnum, pname, color, weight, qoh)
                   1146: .br
                   1147: supply(snum, pnum, jnum, shipdate, quan)
                   1148: .sp 1
                   1149: range of p is parts
                   1150: .br
                   1151: range of s is supply
                   1152: .sp 1
                   1153: modify parts to hash on pnum
                   1154: .br
                   1155: modify supply to hash on snum,jnum
                   1156: .)l
                   1157: .ti +5
                   1158: .sp 1
                   1159: retrieve (p.all) where p.pnum = 10
                   1160: .sp 1
                   1161: INGRES will recognize that parts is
                   1162: hashed on pnum and go directly to the
                   1163: page where parts with number 10 would be stored.
                   1164: .sp 1
                   1165: .ti +5
                   1166: retrieve (p.all) where p.pname = "tape drive"
                   1167: .sp 1
                   1168: INGRES will read the entire relation
                   1169: looking for matching pnames.
                   1170: .sp 1
                   1171: .ti +5
                   1172: retrieve (p.all) where p.pnum < 10  and p.pnum > 5
                   1173: .sp 1
                   1174: INGRES will read the entire relation
                   1175: because no exact value for pnum
                   1176: was given.
                   1177: .sp 1
                   1178: .ti +5
                   1179: retrieve (s.shipdate) where s.snum = 471 and s.jnum = 1008
                   1180: .sp 1
                   1181: INGRES will recognize that supply is hashed on the
                   1182: combination of snum and jnum and will go directly
                   1183: to the correct page.
                   1184: .ti +5
                   1185: .sp 1
                   1186: retrieve (s.shipdate) where s.snum = 471
                   1187: .sp 1
                   1188: INGRES will read the entire
                   1189: relation.
                   1190: Supply is hashed on the
                   1191: combination of snum and jnum.
                   1192: Unless INGRES is given a unique
                   1193: value for both, it cannot
                   1194: take advantage of the storage
                   1195: structure.
                   1196: .sp 1
                   1197: .ti +5
                   1198: retrieve (p.pname, s.shipdate) where
                   1199: .ti +5
                   1200: .br
                   1201: p.pnum = s.pnum and s.snum = 471 and s.jnum = 1008
                   1202: .sp 1
                   1203: INGRES will take advantage of both
                   1204: storage structures.
                   1205: It will first find all
                   1206: s.pnum and s.shipdate
                   1207: where s.snum = 471 and
                   1208: s.jnum = 1008.
                   1209: After that it will look for all 
                   1210: p.pname where p.pnum is equal to
                   1211: the correct value.
                   1212: .sp 1
                   1213: This example illustrates the idea that it is 
                   1214: frequently a good idea to hash a
                   1215: relation on the domains where it is
                   1216: "joined" with another relation.
                   1217: For example, in this
                   1218: case it is very common to ask
                   1219: for p.pnum = s.pnum
                   1220: .sp 1
                   1221: To summarize:
                   1222: .sp 1
                   1223: To take advantage of a hash
                   1224: structure,
                   1225: INGRES needs an exact value
                   1226: for each key domain.
                   1227: An exact value is anything
                   1228: such as:
                   1229: .ti +5
                   1230: .sp 1
                   1231: s.snum = 471
                   1232: .br
                   1233: .ti +5
                   1234: s.pnum = p.pnum
                   1235: .sp 1
                   1236: An exact value is not
                   1237: .sp 1
                   1238: .ti +5
                   1239: s.snum >= 471
                   1240: .br
                   1241: .ti +5
                   1242: (s.snum = 10 or s.snum = 20)
                   1243: .sp 1
                   1244: Now let's consider some
                   1245: cases using isam
                   1246: .sp 1
                   1247: .in +5
                   1248: modify supply to isam on snum,shipdate
                   1249: .br
                   1250: retrieve (s.all) where s.snum = 471
                   1251: .br
                   1252: and s.shipdate > "75-12-31"
                   1253: .br
                   1254: and s.shipdate < "77-01-01"
                   1255: .sp 1
                   1256: .in -5
                   1257: Since supply is sorted first on snum and then
                   1258: on shipdate, INGRES
                   1259: can take full advantage of the
                   1260: isam structure to locate the
                   1261: portions of supply which satisfy
                   1262: the query.
                   1263: .sp 1
                   1264: .ti +5
                   1265: retrieve (s.all) where s.snum = 47l
                   1266: .sp 1
                   1267: Unlike hash, an isam structure
                   1268: can still be used if only the first key is
                   1269: provided.
                   1270: .sp 1
                   1271: .ti +5
                   1272: retrieve (s.all) where s.snum > 400 and s.snum < 500
                   1273: .sp 1
                   1274: Again INGRES will take advantage of the structure.
                   1275: .sp 1
                   1276: .ti +5
                   1277: retrieve (s.all) where s.shipdate >= "75-12-31" and
                   1278: .ti +5
                   1279: s.shipdate <= "77-01-01"
                   1280: .sp 1
                   1281: Here INGRES will read the entire relation.
                   1282: This is because the first key (snum) is not
                   1283: provided in the query.
                   1284: .sp 1
                   1285: To summarize:
                   1286: .sp 1
                   1287: Isam can provide improved access
                   1288: on either exact values or ranges of
                   1289: values.
                   1290: It is useful as long as at least
                   1291: the first key is provided.
                   1292: .sp 1
                   1293: To locate where the tuples are
                   1294: in an isam relation,
                   1295: INGRES searches the isam directory for that
                   1296: relation.
                   1297: When a relation is modified to isam,
                   1298: the tuples are first sorted and duplicates
                   1299: are removed.
                   1300: Next, the relation is
                   1301: filled (according to the fillfactor) starting
                   1302: at page 0, 1, 2... for as many
                   1303: pages as are needed.
                   1304: .sp 1
                   1305: Now the directory is built.
                   1306: The key domains from the first
                   1307: tuple on each page are collected and
                   1308: organized into a directory (stored in the relation
                   1309: on disk).
                   1310: The directory is never changed
                   1311: until the next time a modify is done.
                   1312: .sp 1
                   1313: Whenever a tuple is added to the relation,
                   1314: the directory is searched to find
                   1315: which page the new tuple belongs on.
                   1316: Within that page, the individual
                   1317: tuples are NOT kept sorted.
                   1318: This is what is meant by "approximately" sorted.
                   1319: .sp 2
                   1320: .nr in 0
                   1321: HEAP v. HASH v. ISAM
                   1322: .sp 1
                   1323: Let's now compare the relative advantages and disadvantages
                   1324: of each option.
                   1325: A relation is always created as a heap.
                   1326: A heap is the most efficient
                   1327: structure to use to initially
                   1328: fill a relation using copy or append.
                   1329: .sp 1
                   1330: Space from deleted tuples of a heap
                   1331: is only reused on the last page.
                   1332: No duplicate checking is done on
                   1333: a heap relation.
                   1334: .sp 1
                   1335: Hash is advantageous for locating tuples
                   1336: referenced in a qualification by an exact
                   1337: value.
                   1338: The primary page for tuples with a specific
                   1339: value can be easily computed.
                   1340: .sp 1
                   1341: Isam is useful for both exact values and ranges of values.
                   1342: Since the isam directory must be searched to
                   1343: locate tuples, it is never as efficient as hash.
                   1344: .sp 2
                   1345: OVERFLOW PAGES
                   1346: .sp 1
                   1347: When a tuple is to be inserted
                   1348: and there is no more room on the
                   1349: primary page of a relation, then an
                   1350: overflow page is created.
                   1351: As more tuples are inserted, additional overflow
                   1352: pages are added as needed.
                   1353: Overflow pages, while necessary, decrease
                   1354: the system performance for
                   1355: retrieves and updates.
                   1356: .sp 1
                   1357: For example, let's suppose that supply
                   1358: is hashed on snum and has 10 primary pages.
                   1359: Suppose the value snum = 3 falls on page 7.
                   1360: To find all snum = 3 requires INGRES to search
                   1361: primary page 7 and all overflow pages of page 7
                   1362: (if any).
                   1363: As more overflow pages are added the time
                   1364: needed to search for
                   1365: snum = 3 will increase.
                   1366: Since duplicates are removed on isam and hash,
                   1367: this search must be performed on appends and
                   1368: replaces also.
                   1369: .sp 1
                   1370: When a hash or isam relation has too many overflow pages
                   1371: it should be remodified to hash
                   1372: or isam again.
                   1373: This will clear up the relation
                   1374: and eliminate as many overflow pages as possible.
                   1375: .sp 2
                   1376: UNIQUE KEYS
                   1377: .sp 1
                   1378: When choosing key domains for a relation
                   1379: it is desirable to have each set of
                   1380: key domains
                   1381: as unique as possible.
                   1382: For example, employee id numbers  
                   1383: typically have no
                   1384: duplicate values, while
                   1385: something like color
                   1386: is likely to have only a few distinct
                   1387: values, and something like
                   1388: sex, to the best of our knowledge, has only two
                   1389: values.
                   1390: .sp 1
                   1391: If a relation is hashed on domain sex then you can expect to have all
                   1392: males on one primary page and all its
                   1393: overflow pages and a corresponding
                   1394: situation with females.
                   1395: With a hash relation there is no solution to this
                   1396: problem.
                   1397: A trade-off must be made between the
                   1398: most desirable key domains to use in a
                   1399: qualification versus the uniqueness of the
                   1400: key values.
                   1401: .sp 1
                   1402: Since isam structure can be used if at least
                   1403: the first key is provided, extra
                   1404: key domains can sometimes be added to increase uniqueness.
                   1405: For example, suppose the supply
                   1406: relation has only 10 unique supplier numbers
                   1407: but thousands of tuples.
                   1408: Choosing an isam structure with the keys snum and jnum
                   1409: will probably give many more unique keys.
                   1410: However, the directory size will
                   1411: be larger and consequently it will
                   1412: take longer to search.
                   1413: When providing additional keys
                   1414: just for the sake of increasing
                   1415: uniqueness,
                   1416: try to use the smallest possible domains.
                   1417: .sp 2
                   1418: SYSTEM RELATIONS
                   1419: .sp 1
                   1420: INGRES uses three relations
                   1421: ("relation", "attribute", and "indexes") to maintain
                   1422: and organize a data base.
                   1423: The "relation" relation has one tuple for
                   1424: each relation in the data base.
                   1425: The "attribute" relation has one tuple
                   1426: for each attribute in each
                   1427: relation.
                   1428: The "indexes" relation
                   1429: has one tuple for each secondary
                   1430: index.
                   1431: .sp 1
                   1432: INGRES accesses these relations
                   1433: in a very well defined manner.
                   1434: A program called "sysmod" should be used
                   1435: to modify these relations to hash on the
                   1436: appropriate domains.
                   1437: To use sysmod the data base
                   1438: administrator types
                   1439: .sp 1
                   1440: % sysmod data-base-name
                   1441: .sp 1
                   1442: Sysmod should be run
                   1443: initially after the data base is created and subsequently
                   1444: as relations are created and the data
                   1445: base grows.
                   1446: It is insufficient to run
                   1447: sysmod only once and forget about it.
                   1448: Rerunning sysmod will cause the
                   1449: system relations to be remodified.
                   1450: This will typically remove
                   1451: most overflow pages and improve
                   1452: system response time
                   1453: for everything.
                   1454: .bp
                   1455: 5.  SECONDARY INDICES
                   1456: .sp 1
                   1457: Using an isam or hash structure
                   1458: provides a fast way to find
                   1459: tuples in a relation given values for the key
                   1460: domains.
                   1461: Sometimes this is not enough.
                   1462: For example, suppose we have
                   1463: the donation relation
                   1464: .sp 1
                   1465: .ti +5
                   1466: donation(name, amount, ext)
                   1467: .sp 1
                   1468: hashed on name.
                   1469: This will provide fast access
                   1470: to queries where the qualification has
                   1471: an exact value for name.
                   1472: What if we also will be doing
                   1473: queries giving exact values for ext?
                   1474: .sp 1
                   1475: Donation can be hashed either on name
                   1476: or ext, so we would have to choose which is more common
                   1477: and hash donation on that domain.
                   1478: The other domain (say ext) can have
                   1479: a secondary index.
                   1480: A secondary index is a relation which contains
                   1481: each "ext" together with the exact
                   1482: location of where the tuple is in the relation
                   1483: donation.
                   1484: .sp 1
                   1485: The command to create a secondary
                   1486: index is:
                   1487: .sp 1
                   1488: .ti +5
                   1489: index on donation is donext (ext)
                   1490: .sp 1
                   1491: The general format is:
                   1492: .sp 1
                   1493: .ti +5
                   1494: index on relation_name is secondary_index_name (domains)
                   1495: .sp 1
                   1496: Here we are asking INGRES
                   1497: to create a secondary index on the relation
                   1498: donation.
                   1499: The domain being indexed is "ext".
                   1500: Indices are formed in three steps:
                   1501: .sp 1
                   1502: .in +4
                   1503: .ti -5
                   1504: 1.  "Donext" is created as a heap.
                   1505: .br
                   1506: .ti -5
                   1507: 2.  For each
                   1508: tuple in donation, a tuple is inserted
                   1509: in "donext" with the value for ext and the
                   1510: exact location of the corresponding tuple in
                   1511: donation.
                   1512: .br
                   1513: .ti -5
                   1514: 3.  By default "donext" is modified to isam.
                   1515: .in -4
                   1516: .sp 1
                   1517: Now if you run the query
                   1518: .sp 1
                   1519: .ti +5
                   1520: range of d is donation
                   1521: .ti +5
                   1522: retrieve(d.amount) where d.ext = 207
                   1523: .sp 1
                   1524: INGRES will automatically look first in
                   1525: "donext" to find ext = 207.
                   1526: When it finds one it then goes directly
                   1527: to the tuple in the donation relation.
                   1528: Since "donext" is isam on ext, search for
                   1529: ext = 207 can typically be
                   1530: done rapidly.
                   1531: .sp 1
                   1532: If you run the query
                   1533: .sp 1
                   1534: .ti 5
                   1535: retrieve(d.amount) where d.name = "frank"
                   1536: .sp 1
                   1537: then INGRES will continue to use the hash
                   1538: structure of the relation "donation"
                   1539: to locate the qualifying tuples.
                   1540: .sp 1
                   1541: Since secondary indices are themselves relations,
                   1542: they also can be either hash, isam, chash or cisam.
                   1543: It never makes sense to a secondary index a heap.
                   1544: .sp 1
                   1545: The decision as to what structure to make
                   1546: them on involves the same issues
                   1547: as were discussed before:
                   1548: .sp 1
                   1549: Will the domains be referenced by exact value?
                   1550: .br
                   1551: Will they be referenced by ranges of value?
                   1552: .br
                   1553: etc.
                   1554: .sp 1
                   1555: In this case the "ext" domain
                   1556: will be referenced by exact values, and
                   1557: since the relation is nearly full we will do:
                   1558: .sp 1
                   1559: .ti +5
                   1560: modify donext to hash on ext where fillfactor = 100
                   1561: .ti +5
                   1562: and minpages = 1
                   1563: .sp 1
                   1564: Secondary indices provide a way for INGRES
                   1565: to access tuples based on domains
                   1566: that are not key domains.
                   1567: A relation can have any number of secondary
                   1568: indices and in addition
                   1569: each secondary index can be an index
                   1570: on up to six domains of the primary relation.
                   1571: .sp 1
                   1572: Whenever a tuple is replaced, deleted
                   1573: or appended to a primary relation,
                   1574: all secondary indices must
                   1575: also be updated.  
                   1576: Thus secondary indices
                   1577: are "not free". 
                   1578: They increase
                   1579: the cost of updating the
                   1580: primary relation, but
                   1581: can decrease the cost of finding tuples
                   1582: in the primary relation.
                   1583: .sp 1
                   1584: Whether a secondary index will improve
                   1585: performance or not strongly
                   1586: depends on the uniqueness of the
                   1587: values of the domains being
                   1588: indexed.
                   1589: The primary concern is whether searching
                   1590: through the secondary index is
                   1591: more efficient than simply
                   1592: reading the entire primary relation.
                   1593: In general it is if the
                   1594: number of tuples which satisfy the
                   1595: qualification is less than the number of total pages
                   1596: (both primary and overflow) in the primary
                   1597: relation.
                   1598: .sp 1
                   1599: For example if we frequently want to find
                   1600: all people who donated less than
                   1601: five dollars, consider creating
                   1602: .sp 1
                   1603: .ti +5
                   1604: index on donation is donamount (amount)
                   1605: .sp 1
                   1606: By default donamount will be isam
                   1607: on amount.
                   1608: IF INGRES processes the query:
                   1609: .sp 1
                   1610: .ti +5
                   1611: retrieve(d.name) where d.amount < 5.0
                   1612: .sp 1
                   1613: it will locate d.amount < 5.0 in the secondary
                   1614: index and for each tuple it
                   1615: finds will fetch the corresponding
                   1616: tuple in donation.
                   1617: The tuples in donamount are sorted by
                   1618: amount but the tuples
                   1619: in donation are not.
                   1620: Thus in general each tuple fetch from
                   1621: donation via donamount will be on a
                   1622: different page.
                   1623: Retrieval using the secondary index can then cause more page
                   1624: reads than simply reading all of donation sequentially!
                   1625: So in this example it would
                   1626: be a bad idea to create the secondary
                   1627: index.
                   1628: .bp
                   1629: 6.  RECOVERY AND DATA UPDATE
                   1630: .sp 1
                   1631: INGRES has been carefully designed
                   1632: to protect the integrity of a data base
                   1633: against certain classes
                   1634: of system failures.
                   1635: To do this INGRES
                   1636: processes changes to a relation
                   1637: using what we call "deferred
                   1638: update" or "batch file update".
                   1639: In addition there are two INGRES
                   1640: programs "restore" and "purge" that can be used to check
                   1641: out a data base after a system failure.
                   1642: We will first discuss how deferred updates are created
                   1643: and processed, and second we will discuss
                   1644: the use of purge and restore.
                   1645: .sp 1
                   1646: DEFERRED UPDATE (Batch update)
                   1647: .in +4
                   1648: .sp 1
                   1649: .ti -5
                   1650: An append, replace or delete command is run in four steps:
                   1651: .sp 1
                   1652: .ti -5
                   1653: 1.  An empty batch file is created.
                   1654: .ti -5
                   1655: 2.  The command is run to completion
                   1656: and each change to the result relation is written into
                   1657: the batch file.
                   1658: .ti -5
                   1659: 3.  The batch file is read and the
                   1660: relation and its secondary indices (if any)
                   1661: are actually updated.
                   1662: .ti -5
                   1663: 4.  The batch file is destroyed and INGRES
                   1664: returns back to the user.
                   1665: .sp 1
                   1666: .in -4
                   1667: Deferred update defers all actual
                   1668: updating until the very end of
                   1669: the query.
                   1670: There are three advantages to doing this.
                   1671: .sp 1
                   1672: l.  Provides recovery from system failures
                   1673: .sp 1
                   1674: If the system "crashes" during an update,
                   1675: the INGRES recovery program will decide to either
                   1676: run the update to completion or else
                   1677: "back out" the update, leaving the
                   1678: relation as it looked before the update
                   1679: was started.
                   1680: .sp 1
                   1681: 2.  Prevents infinite queries
                   1682: .sp 1
                   1683: If "donation" were a heap and the query
                   1684: .sp 1
                   1685: .ti +4
                   1686: range of d is donation
                   1687: .ti +4
                   1688: append to donation(d.all)
                   1689: .sp 1
                   1690: were run without deferred update,
                   1691: it would terminate only when it ran
                   1692: out of space on disk!
                   1693: This is because INGRES would start reading the
                   1694: relation from the beginning and
                   1695: appending each tuple at the end.
                   1696: It would soon start reading the tuples it
                   1697: had just previously appended and
                   1698: continue indefinitely to
                   1699: "chase its tail".
                   1700: .sp 1
                   1701: While this query is certainly not
                   1702: typical, it illustrates the point.
                   1703: There are certain classes of queries
                   1704: where problems occur if WHEN
                   1705: an update actually occurs
                   1706: is not precisely defined.
                   1707: With deferred update we can
                   1708: guarantee consistent and logical
                   1709: results.
                   1710: .sp 1
                   1711: 3.  Speeds up processing of secondary indices
                   1712: .sp 1
                   1713: Secondary indices can be updated
                   1714: faster if they are done one at a time
                   1715: instead of all at once.
                   1716: It also insures protection against
                   1717: the secondary index becoming inconsistent
                   1718: with its primary relation.
                   1719: .sp 1
                   1720: TURNING DEFERRED UPDATE OFF
                   1721: .sp 1
                   1722: If you are not persuaded by any of
                   1723: these arguments, INGRES
                   1724: allows you to turn deferred update off!
                   1725: Indeed there are certain cases when
                   1726: it is appropriate (although
                   1727: certainly not essential) to perform
                   1728: updates directly, that is, the relation is updated
                   1729: while the query is being processed.
                   1730: .sp 1
                   1731: To use direct update, you must be given
                   1732: permission by the INGRES
                   1733: super user.
                   1734: Then when invoking INGRES
                   1735: specify the "-b" flag which turns
                   1736: off batch update.
                   1737: .sp 1
                   1738: .ti +4
                   1739: % ingres mydate -b
                   1740: .sp 1
                   1741: INGRES will use direct update on any relation without
                   1742: secondary indices.  
                   1743: It will still silently use
                   1744: deferred update if a relation
                   1745: has any secondary indices.
                   1746: By using the "-b" flag you are
                   1747: sacrificing points 1 and 2 above.
                   1748: In most cases you SHOULD NOT
                   1749: use the -b flag.
                   1750: .sp 1
                   1751: If you are using INGRES
                   1752: to interactively enter
                   1753: or change one tuple at
                   1754: a time, it is slightly
                   1755: more efficient to have deferred
                   1756: update turned off.
                   1757: If the system crashes during an
                   1758: update the person entering the data
                   1759: will be aware of the situation
                   1760: and can check whether the tuple
                   1761: was updated or not.
                   1762: .sp 1
                   1763: RESTORE
                   1764: .sp 1
                   1765: INGRES is designed to recover
                   1766: from the common types of system
                   1767: crashes which leave the Unix file
                   1768: system intact.
                   1769: It can recover from updates, creates,
                   1770: destroys, modifies and index commands.
                   1771: .sp 1
                   1772: INGRES is designed to "fail safe".
                   1773: If any inconsistancies are
                   1774: discovered or any failures
                   1775: are returned from Unix,
                   1776: INGRES will generate a system error
                   1777: message (SYSERR) and exit.
                   1778: .sp 1
                   1779: Whenever Unix crashes while INGRES
                   1780: is running or whenever an INGRES
                   1781: syserr occurs, it is
                   1782: generally a good idea to have the date
                   1783: base administrator run the command
                   1784: .sp 1
                   1785: .ti +5
                   1786: % restore data_base_name
                   1787: .sp 1
                   1788: The restore program performs the
                   1789: following functions:
                   1790: .in +4
                   1791: .sp 1
                   1792: .ti -5
                   1793: 1.  Looks for batch update files.
                   1794: If any are found, it examines each
                   1795: one to see if it is complete.
                   1796: If the system crash occured while
                   1797: the batch file was being read
                   1798: and the data base being updated,
                   1799: then restore will complete
                   1800: the update.
                   1801: Otherwise the batch file was not
                   1802: completed and it is simply destroyed;
                   1803: the effect is as though the query had never been run.
                   1804: .sp 1
                   1805: .ti -5
                   1806: 2.  Checks for uncompleted modify commands.
                   1807: This step is crucial.
                   1808: It guarantees that you will either have the
                   1809: relation as it existed before
                   1810: the modify, or restore will complete
                   1811: the modify command.
                   1812: Modify works by creating a new copy
                   1813: of the relation in the new structure.
                   1814: Then when it is ready to replace the old
                   1815: relation, it stores the new information in a
                   1816: "modify batch file".  
                   1817: This enables restore to determine the state of
                   1818: uncompleted modifies.
                   1819: .sp 1
                   1820: .ti -5
                   1821: 3.  Checks consistency of system
                   1822: relations.
                   1823: This check is used to complete "destory"
                   1824: commands, back out "create" commands,
                   1825: and back out or complete "index"
                   1826: commands that were interrupted by a
                   1827: system crash.
                   1828: .sp 1
                   1829: .ti -5
                   1830: 4.  Purges temporary relations and files.
                   1831: Restore executes the "purge" program to
                   1832: remove temporary relations and temporary
                   1833: files created by the system.
                   1834: Purge will be discussed in more detail a bit later.
                   1835: .in -4
                   1836: .sp 1
                   1837: Restore cannot tell the user which queries have run and
                   1838: which have not.
                   1839: It can only identify those queries which were in the
                   1840: process of being run when the crash occured.
                   1841: When batching queries together,
                   1842: it is a good idea to save the output in a file.
                   1843: By having the monitor print out each query or set of
                   1844: queries,
                   1845: the user can later identify which queries were run.
                   1846: .sp 1
                   1847: Restore has several options to increase its
                   1848: usability.
                   1849: They are specified by "flags".
                   1850: The options include:
                   1851: .sp 1
                   1852: .in +4
                   1853: .nf
                   1854: -a            ask before doing anything
                   1855: -f            passed to purge. used to remove temporary files.
                   1856: -p            passed to purge.  used to destory expired rela-
                   1857:               tions.
                   1858: no database   restores all data bases for which you are the
                   1859:               dba.
                   1860: .fi
                   1861: .in -4
                   1862: .sp 1
                   1863: Of these options the "-a" is the most
                   1864: important.
                   1865: It can happen that a Unix crash can cause a page of
                   1866: the system catalogues to be
                   1867: incorrect.
                   1868: This might cause restore to destory
                   1869: a relation.
                   1870: In fact, you might want
                   1871: to "patch" the system relations to correct
                   1872: the problem.
                   1873: No restore program can account
                   1874: for all possibilities.
                   1875: It is therefore no replacement
                   1876: (fortunately) for a human.
                   1877: .sp 1
                   1878: If "-a" is specified, restore
                   1879: will state what it wants to do and then ask
                   1880: for permission.
                   1881: It reads standard input and
                   1882: accepts "y" to mean go ahead and anything
                   1883: else to mean no.
                   1884: For example, to have restore ask you before
                   1885: doing anything
                   1886: .sp 1
                   1887: .ti +5
                   1888: restore -a mydatabase
                   1889: .sp 1
                   1890: To have it take "no" for all its questions
                   1891: .sp 1
                   1892: .ti +5
                   1893: restore -a mydatabase </dev/null
                   1894: .sp 1
                   1895: Using the -a flag,
                   1896: restore might ask for permission
                   1897: to perform some cleanup;
                   1898: for example,
                   1899: if it finds an attribute for which there
                   1900: is no corresponding relation,
                   1901: or if it finds a secondary index for which
                   1902: there is no primary relation,
                   1903: etc.
                   1904: .sp 1
                   1905: To date, we have never had a system
                   1906: crash which INGRES
                   1907: could not recover from.
                   1908: This does not mean that it will never happen, but
                   1909: rather that it shouldn't
                   1910: be too great 
                   1911: a concern for you.
                   1912: It should be mentioned that restore is not
                   1913: a substitution for doing periodic
                   1914: backing up, nor does it
                   1915: ever perform such a function.
                   1916: .sp 1
                   1917: PURGE
                   1918: .sp 1
                   1919: Purge can be used to report expired relations,
                   1920: destroy temporary system relations,
                   1921: remove extraneous files,
                   1922: and destory expired relations.
                   1923: To use purge you must be the DBA
                   1924: for the data base.
                   1925: .sp 1
                   1926: .ti +5
                   1927: % purge mydatabase
                   1928: .sp 1
                   1929: Purge has several options which are
                   1930: specified by flags which are
                   1931: worth noting:
                   1932: .nr in 4n
                   1933: .sp 1
                   1934: .nf
                   1935: -f   (default is off) remove all extraneous files.
                   1936:      Each file is reported and then removed.  If "-f"
                   1937:      is not specified then the file is only reported.
                   1938: .sp 1
                   1939: -p   (default is off) destroy all expired relations.
                   1940:      Each expired relation is reported and if "-p"
                   1941:      was specified the relation is destroyed.
                   1942: .fi
                   1943: .nr in 0
                   1944: .sp 1
                   1945: Purge always destroys relations and files
                   1946: which are known to be INGRES
                   1947: system temporaries.
                   1948: When processing multi-variable
                   1949: queries and queries with aggregate functions, 
                   1950: INGRES will usually create temporary relations
                   1951: with intermediate results.
                   1952: These relations always begin with the
                   1953: characters "_SYS".  
                   1954: Other INGRES commands create temporary files which also
                   1955: begin with "_SYS".
                   1956: Under normal processing they are
                   1957: always destroyed. 
                   1958: If a system crash occurs, they might be left.
                   1959: Purge will always clean up the temporary
                   1960: system files.
                   1961: It cleans up the user relations only
                   1962: when specifically asked to.
unix.superglobalmegacorp.com
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.