|
|
1.1 ! root 1: .ds HE 'DATA BASE'INGRES'Page %' ! 2: .so nmacs ! 3: .hx ! 4: .rs ! 5: .sp 8 ! 6: .ce ! 7: CREATING AND MAINTAINING A DATABASE USING INGRES ! 8: .sp 15 ! 9: .ce ! 10: by ! 11: .ce ! 12: Robert Epstein ! 13: .sp 18 ! 14: .ce 2 ! 15: Memorandum No. ERL - M77-71 ! 16: December 16, 1977 ! 17: .sp 3 ! 18: .ce 4 ! 19: Electronics Research Laboratory ! 20: College of Engineering ! 21: University of California, Berkeley ! 22: 94720 ! 23: .bp 1 ! 24: .ce ! 25: .bl + 5 ! 26: CREATING AND MAINTAINING A DATABASE USING INGRES ! 27: .sp 3 ! 28: 1. INTRODUCTION ! 29: .sp ! 30: In this paper we describe how to create, ! 31: structure and maintain relations ! 32: in INGRES. ! 33: It is assumed that the reader ! 34: is familiar with INGRES ! 35: and understands QUEL, the INGRES ! 36: query language. ! 37: It is strongly suggested that the ! 38: document "A Tutorial on INGRES" ! 39: (ERL M77/25) be read first. ! 40: .sp 1 ! 41: This paper is divided into six sections ! 42: .in +5 ! 43: .sp 1 ! 44: 1. Introduction ! 45: .sp 1 ! 46: 2. Creating a Relation ! 47: .sp 1 ! 48: 3. Using Copy ! 49: .sp 1 ! 50: 4. Storage Structures ! 51: .sp 1 ! 52: 5. Secondary Indices ! 53: .sp 1 ! 54: 6. Recovery and Data Update ! 55: .in -5 ! 56: .sp 1 ! 57: To create a new data base you must be a valid ! 58: INGRES user and have "create data base" ! 59: permission. ! 60: These permissions are granted by the ! 61: "ingres" superuser. ! 62: If you pass those two requirements you can create ! 63: a data base using the command to the ! 64: Unix shell: ! 65: .sp 1 ! 66: % creatdb mydata ! 67: .sp 1 ! 68: where "mydata" is the name of the data base. ! 69: You become the "data base administrator" (DBA) for ! 70: mydata. ! 71: As the DBA you have certain special ! 72: powers. ! 73: ! 74: .in +4 ! 75: .ti -5 ! 76: 1. Any relation created by you can be accessed ! 77: by anyone else using "mydata". ! 78: If any other user creates a relation it is ! 79: strictly private and cannot ! 80: be accessed by the DBA or any other user. ! 81: ! 82: .ti -5 ! 83: 2. You can use the "-u" flag in ingres ! 84: and printr. ! 85: This enables you to use ingres on "mydata" ! 86: with someone else's id. ! 87: Refer to the INGRES reference ! 88: manual under sections ! 89: ingres(unix) and users(files) ! 90: for details. ! 91: ! 92: .ti -5 ! 93: 3. You can run sysmod, restore and purge ! 94: on "mydata". ! 95: ! 96: .ti -5 ! 97: 4. The data base by default is created to ! 98: allow for multiple concurrent users. ! 99: If only one user will ever use the ! 100: data base at a time, ! 101: the data base administrator can ! 102: turn off the concurrency control. ! 103: Refer to creatdb(unix) in the INGRES ! 104: reference manual. ! 105: .in -4 ! 106: .sp 1 ! 107: Once a data base has been created you should ! 108: immediately run ! 109: .sp 1 ! 110: % sysmod mydata ! 111: .sp 1 ! 112: This program will convert ! 113: the system relations to their ! 114: "best" structure for use in INGRES. ! 115: Sysmod will be explained further in ! 116: section 4. ! 117: .sp 1 ! 118: As a DBA or as a user you can create and ! 119: structure new relations in any data ! 120: base to which you have access. ! 121: The remainder of this paper describes how this is done. ! 122: .bp ! 123: 2. CREATING NEW RELATIONS IN INGRES ! 124: .sp ! 125: There are two ways to create new relations in INGRES. ! 126: .sp 1 ! 127: .ti +5 ! 128: create ! 129: .br ! 130: .ti +5 ! 131: retrieve into ! 132: .sp 1 ! 133: "Retrieve into" is used to form a new relation from one or ! 134: more existing relations. ! 135: "Create" is used to create ! 136: a new relation with no tuples in it. ! 137: .sp 1 ! 138: example 1: ! 139: .sp 1 ! 140: .ti +5 ! 141: range of p is parts ! 142: .ti +5 ! 143: range of s is supply ! 144: .ti +5 ! 145: retrieve into newsupply( ! 146: .ti +19 ! 147: number = s.snum, ! 148: .ti +19 ! 149: p.pname, ! 150: .ti +19 ! 151: s.shipdate) ! 152: .ti +5 ! 153: where s.pnum = p.pnum ! 154: .sp 1 ! 155: example 2: ! 156: .sp 1 ! 157: .ti +5 ! 158: create newsupply( ! 159: .ti +12 ! 160: number = i2, ! 161: .ti +12 ! 162: pname = c20, ! 163: .ti +12 ! 164: shipdate = c8) ! 165: .sp 1 ! 166: In example 1 INGRES creates a new relation called ! 167: "newsupply", computing what the format ! 168: of each domain should be. ! 169: The query is then run and newsupply is ! 170: modified to "cheapsort". ! 171: (This will be covered in more detail in section 4.) ! 172: .sp 1 ! 173: In example 2 "newsupply" ! 174: is created and the name and ! 175: format for each domain is given. ! 176: The format types which are ! 177: allowed are: ! 178: .sp 1 ! 179: .in +5 ! 180: .nf ! 181: i1 1 byte integer ! 182: i2 2 " " ! 183: i4 4 " " ! 184: f4 4 byte floating point number ! 185: f8 8 " " " " ! 186: c1,c2,..,c255 1,2,..,255 byte character ! 187: .in -5 ! 188: .fi ! 189: .sp 1 ! 190: In example 2, the width of an individual ! 191: tuple is 30 bytes ! 192: (2 + 20 + 8), and the ! 193: relation has three domains. ! 194: Beware that INGRES ! 195: has limits. ! 196: A relation cannot have more ! 197: than 49 domains and the tuple width ! 198: cannot exceed 498 bytes. ! 199: .sp 1 ! 200: UNIX allocates space on a disk ! 201: in units of 512 byte pages. ! 202: INGRES gets a performance advantage ! 203: by doing I/O in one block units. ! 204: Therefore relations are divided into 512 byte pages. ! 205: INGRES never splits a tuple ! 206: between two pages. ! 207: Thus some space can be wasted. ! 208: There is an overhead of 12 bytes per page plus ! 209: 2 bytes for every tuple on the page. ! 210: The formulas are: ! 211: .sp 1 ! 212: .ti +5 ! 213: number tuples per page = 500/(tuple width + 2) ! 214: .sp 1 ! 215: .ti +5 ! 216: wasted space = 500 - number of tuples per page ! 217: .ti +5 ! 218: *(tuple width +2) ! 219: .sp 1 ! 220: For our example there are ! 221: .sp 1 ! 222: .ti +5 ! 223: 22 = 500/(20 + 2) ! 224: .sp 1 ! 225: .ti +5 ! 226: 16 = 500 - 22 * (20 + 2) ! 227: .sp 1 ! 228: 22 tuples per page and 16 bytes ! 229: wasted per page. ! 230: These computations are valid ! 231: only for uncompressed relations. ! 232: We will return to this subject ! 233: in section 4 when we discuss compression. ! 234: .sp 1 ! 235: If you forget a domain name or ! 236: format, use the "help" command. ! 237: For example if you gave the INGRES ! 238: command: ! 239: .sp 1 ! 240: .ti +5 ! 241: help newsupply ! 242: .sp 1 ! 243: the following would be printed: ! 244: .nf ! 245: ! 246: Relation: newsupply ! 247: Owner: bob ! 248: Tuple width: 30 ! 249: Saved until: Thu Nov 10 16:17:06 1977 ! 250: Number of tuples: 0 ! 251: Storage structure: paged heap ! 252: Relation type: user relation ! 253: ! 254: attribute name type length keyno. ! 255: ! 256: number i 2 ! 257: pname c 20 ! 258: shipdate c 8 ! 259: ! 260: .fi ! 261: Notice that every relation has an expiration ! 262: date. ! 263: This is set to be one week ! 264: from the time when it was ! 265: created. ! 266: The "save" command ! 267: can be used to save the relation longer. ! 268: See "save(quel)" and "purge(unix)" in the ! 269: INGRES reference manual. ! 270: ! 271: .bp ! 272: 3. COPYING DATA TO AND FROM INGRES ! 273: .sp ! 274: Once a relation is created, there are two mechanisms for ! 275: inserting new data: ! 276: .sp ! 277: .in +5 ! 278: append command ! 279: .br ! 280: copy command ! 281: .in -5 ! 282: .sp ! 283: Append is used to insert tuples one at a time, ! 284: or for filling one relation from other relations. ! 285: .sp ! 286: Copy is used for copying data from a UNIX file into ! 287: a relation. ! 288: It is used for copying data from another program, or for copying ! 289: data from another system. ! 290: It is also the most convenient way to copy any data ! 291: larger than a few tuples. ! 292: .sp ! 293: Let's begin by creating a simple relation ! 294: and loading data into it. ! 295: .sp ! 296: Example: ! 297: .sp ! 298: .ti +5 ! 299: .nf ! 300: create donation (name = c10, amount = f4, ext = i2) ! 301: .sp ! 302: .fi ! 303: Now suppose we have two people to enter. ! 304: The simplest procedure is probably ! 305: to run the two queries in INGRES using ! 306: the append command. ! 307: .sp ! 308: .ti 5 ! 309: .nf ! 310: append to donation (name="frank",amount = 5,ext = 204) ! 311: .sp 1 ! 312: .ti 5 ! 313: append to donation (name="harry",ext = 209,amount = 4.50) ! 314: .fi ! 315: .sp ! 316: Note that the order in which the domains are given ! 317: does not matter. ! 318: INGRES matches by recognizing attribute names and ! 319: does not care in what order attributes ! 320: are listed. ! 321: Here is what the relation "donation" looks like now: ! 322: .nf ! 323: ! 324: donation relation ! 325: ! 326: |name |amount |ext | ! 327: |----------------------------| ! 328: |frank |5.000 |204 | ! 329: |harry |4.500 |209 | ! 330: |----------------------------| ! 331: .fi ! 332: .sp ! 333: We now have two people entered into ! 334: the donation relation. ! 335: Suppose we had fifty more to enter. ! 336: Using the append command is far too tedious ! 337: since so much typing is involved for each tuple. ! 338: The copy command will better suit our purposes. ! 339: .sp ! 340: Copy can take data from a regular ! 341: Unix file in a variety of formats and ! 342: append it to a relation. ! 343: To use the copy command first create ! 344: a Unix file (typically using "ed") containing ! 345: the data. ! 346: .sp ! 347: For example, let's put five new names in a file ! 348: using the editor. ! 349: .sp ! 350: .nf ! 351: .tr Z. ! 352: % ed ! 353: a ! 354: bill,3.50,302 ! 355: sam,10.00,410 ! 356: susan,,100 ! 357: sally,.5,305 ! 358: george,4.00,302 ! 359: Z ! 360: w newdom ! 361: 68 ! 362: q ! 363: % ! 364: .tr ZZ ! 365: .sp ! 366: .fi ! 367: The format of the above file is a ! 368: name followed by a comma, followed ! 369: by the amount, then a comma, then the extension, ! 370: and finally a newline. ! 371: Null entries, for example the amount ! 372: for susan, are perfectly ! 373: legal and default to zero ! 374: for numerical domains and ! 375: blanks for character domains. ! 376: .sp ! 377: To use copy we enter INGRES and give the copy command. ! 378: .sp ! 379: .in +5 ! 380: .nf ! 381: copy donation (name = c0, amount = c0, ext = c0) ! 382: from "/mnt/bob/newdom" ! 383: .fi ! 384: .sp ! 385: .in -5 ! 386: Here is how the copy command works: ! 387: .sp ! 388: .ti +5 ! 389: copy relname (list of what to copy) from "full pathname" ! 390: .sp ! 391: In the case above we wrote: ! 392: .sp ! 393: .ti +5 ! 394: copy donation (. . .) from "/mnt/bob/newdom" ! 395: .sp ! 396: Although amount and ext are stored in the relation ! 397: as f4 (floating point) and i2 (integer), in the ! 398: Unix file they were entered as characters. ! 399: In specifying the format of the domain, ! 400: copy accepts: ! 401: .sp ! 402: .ti +5 ! 403: domain = format ! 404: .sp ! 405: where domain is the domain name and ! 406: the format in the UNIX file is one of ! 407: .sp ! 408: .in +5 ! 409: .nf ! 410: i1, i2, i4 (true binary integer of size 1, 2, or 4) ! 411: .br ! 412: f4, f8 (true binary float point of size 4 or 8) ! 413: .br ! 414: c1, c2, c3,...c255 (a fixed length character string) ! 415: .br ! 416: c0 (a variable length character string de- ! 417: limited by a comma, tab or new line) ! 418: .sp ! 419: .in -5 ! 420: .fi ! 421: In the example we use ! 422: .sp ! 423: .ti +5 ! 424: name = c0, amount = c0, extension = c0 ! 425: .sp ! 426: This means that each of the domains ! 427: was stored in the Unix file as ! 428: variable length character ! 429: strings. ! 430: Copy takes the first comma, ! 431: tab, or new line character ! 432: as the end of the string. ! 433: This by far is the most ! 434: common use of copy ! 435: when the data is being entered ! 436: into a relation ! 437: for the first time. ! 438: .sp ! 439: Copy can also be used to copy data from a relation ! 440: into a Unix file. ! 441: For example: ! 442: .sp ! 443: .in +5 ! 444: .nf ! 445: copy donation (name = c10, amount = c10, ext = c5) ! 446: into "/mnt/bob/data" ! 447: .fi ! 448: .in -5 ! 449: .sp ! 450: This will cause the following to happen: ! 451: .sp 1 ! 452: .in +4 ! 453: .ti -5 ! 454: 1. If the file /mnt/bob/data already exists it will ! 455: be destroyed. ! 456: .ti -5 ! 457: .sp 1 ! 458: 2. The file is created in mode 600 (read/write by you only) ! 459: .sp 1 ! 460: .ti -5 ! 461: 3. Name will be copied as a 10 character field, ! 462: immediately followed by amount, ! 463: immediately followed by ext. ! 464: Amount will be converted to a character ! 465: field 10 characters wide. ! 466: Ext will be converted to a character ! 467: field 5 characters wide. ! 468: .in -4 ! 469: .sp 1 ! 470: The file "/mnt/bob/data" would be a stream of characters ! 471: looking like this: ! 472: ! 473: .tr Z ! 474: .nf ! 475: frankZZZZZZZZZZ5.000ZZ204harryZZZZZZZZZZ4.500ZZ209bill ! 476: ZZZZZZZZZZZ3.500ZZ302samZZZZZZZZZZZ10.000ZZ410susanZZZ ! 477: ZZZZZZZ0.000ZZ100sallyZZZZZZZZZZ0.500ZZ305georgeZZZZZZ ! 478: ZZZ4.000ZZ302 ! 479: .fi ! 480: .tr ZZ ! 481: ! 482: .sp ! 483: The output was broken into four lines to ! 484: make it fit on this page. ! 485: In actuality the file ! 486: is a single line. ! 487: Another example: ! 488: .sp ! 489: .in +5 ! 490: .nf ! 491: copy (name = c0, colon = d1, ext = c0, comma = d1 ! 492: amt = c0, nl = d1) into "/mnt/bob/data" ! 493: .fi ! 494: .in -5 ! 495: .sp ! 496: In this example "c0" is interpreted to mean "use ! 497: the appropriate character format". ! 498: For character domains it is the ! 499: width of the domain. ! 500: Numeric domains are converted to characters ! 501: according to the INGRES defaults ! 502: (see ingres(unix)). ! 503: .sp ! 504: The statements: ! 505: .sp ! 506: .in +5 ! 507: colon = d1 ! 508: .br ! 509: comma = d1 ! 510: .br ! 511: nl = d1 ! 512: .in -5 ! 513: .sp ! 514: are used to insert one colon, ! 515: comma, and newline into the file. ! 516: The format "d1" is interpreted to mean ! 517: one dummy character. ! 518: When copying into a Unix file, ! 519: a selected set of characters can be inserted into ! 520: the file using this ! 521: "dummy domain" specification. ! 522: Here is what the file "/mnt/bob/data" would look like: ! 523: ! 524: .nf ! 525: frank : 204, 5.000 ! 526: harry : 209, 4.500 ! 527: bill : 302, 3.500 ! 528: sam : 410, 10.000 ! 529: susan : 100, 0.000 ! 530: sally : 305, 0.500 ! 531: george : 302, 4.000 ! 532: ! 533: .fi ! 534: .sp ! 535: If you wanted a file with the true binary representation ! 536: of the numbers you would use: ! 537: .sp ! 538: .ti +5 ! 539: copy (name = c10, amount = f4, ext = i2) ! 540: .sp ! 541: This would create a file with the exact ! 542: copy of each tuple, ! 543: one after the other. ! 544: This is frequently desireable for ! 545: temporary backup purposes ! 546: and it guarantees that floating ! 547: point domains will be exact. ! 548: .sp 2 ! 549: TYPICAL ERRORS ! 550: .sp 1 ! 551: There are 17 different errors ! 552: that can occur in copy. ! 553: We will go through the most ! 554: common ones. ! 555: .sp 1 ! 556: Suppose you have a file with ! 557: .sp 1 ! 558: bill,3.5,302 ! 559: .br ! 560: sam,10,410, ! 561: .br ! 562: susan,3,100 ! 563: .sp 1 ! 564: and run the copy command ! 565: .sp 1 ! 566: .in +5 ! 567: .nf ! 568: copy donation (name = c0, amount = c0, ext = c0) ! 569: from "/mnt/bob/data" ! 570: .fi ! 571: .in -5 ! 572: .sp 1 ! 573: You would get the error message ! 574: .sp 1 ! 575: .nf ! 576: 5809: COPY: bad input string for domain amount. Input was "susan". ! 577: There were 2 tuples sucessfully copied from /mnt/bob/data into ! 578: donation. ! 579: .fi ! 580: .sp 1 ! 581: What happened is that line 2 had an extra ! 582: comma. ! 583: The first two tuples were copied correctly. ! 584: For the next tuple, name = "" (blank), amount = ! 585: "susan", and ext = "3". ! 586: Since "susan" is not a proper floating point ! 587: number, an error was generated and ! 588: processing was stopped after two tuples. ! 589: .sp 1 ! 590: If you tried to copy the file with a file ! 591: such as ! 592: .sp 1 ! 593: nancy,5.0,35000 ! 594: .sp 1 ! 595: you would get the error message ! 596: .sp 1 ! 597: .nf ! 598: 5809: COPY: bad input string for domain ext. Input was "35000". ! 599: There were 0 tuples successfully copied from /mnt/bob/data into ! 600: donation. ! 601: .fi ! 602: .sp 1 ! 603: Here, since ext is an i2 (integer) domain, ! 604: it cannot exceed the value 32767. ! 605: .sp 1 ! 606: There are numerous other error messages, ! 607: most of which are self-explanatory. ! 608: .sp 1 ! 609: In addition there are three, non-fatal warnings ! 610: which may appear on a copy "from". ! 611: .sp 1 ! 612: If you are copying from ! 613: a file into a relation which ! 614: is ISAM or hash, a count ! 615: of the number of duplicate ! 616: tuples will appear, (if there were ! 617: any). ! 618: This will never appear on a "heap" ! 619: because no duplicate checking ! 620: is performed. ! 621: .sp 1 ! 622: INGRES does not allow ! 623: control characters ! 624: (such as "bell" etc.) ! 625: to be stored. ! 626: If copy reads any control characters, it converts them ! 627: to blanks and reports the number ! 628: of domains that had control characters in them. ! 629: .sp 1 ! 630: If you are copying using the c0 ! 631: option, copy will ! 632: report if any character strings were ! 633: longer than their domains ! 634: and had to be truncated. ! 635: .sp 2 ! 636: SPECIAL FEATURES ! 637: .sp 1 ! 638: .ti +3 ! 639: There are a few special functions that ! 640: make copy a little ! 641: easier to use ! 642: .sp 1 ! 643: .nr in 6n ! 644: .ti -4 ! 645: 1. Bulk copy ! 646: .sp 1 ! 647: If you ask for: ! 648: .sp 1 ! 649: .ti +4 ! 650: copy relname () from "file" ! 651: .ti +8 ! 652: or ! 653: .ti +4 ! 654: copy relname () into "file" ! 655: .sp 1 ! 656: copy expands the statement to mean: ! 657: .sp 1 ! 658: .in +5 ! 659: copy each domain in its proper ! 660: order according to its proper ! 661: format. ! 662: .sp 1 ! 663: .in -5 ! 664: So, if you said ! 665: .sp 1 ! 666: .ti +4 ! 667: copy donation () into "/mnt/bob/donation" ! 668: .sp 1 ! 669: it would be the same as asking for: ! 670: .sp 1 ! 671: .ti +4 ! 672: .nf ! 673: copy donation (name = c10, amount = f4, ext = i2) ! 674: into "/mnt/bob/donation" ! 675: .fi ! 676: .sp 1 ! 677: This provides a convenient way to copy ! 678: whole relations to and from INGRES. ! 679: .sp 1 ! 680: .ti -4 ! 681: 2. Dummy Domains ! 682: .sp 1 ! 683: If you are copying data ! 684: from another computer or program, ! 685: frequently there will be ! 686: a portion of data that you will want to ! 687: ignore. ! 688: This can be done using the ! 689: dummy domain specifications ! 690: d0, d1, d2 ... d511. ! 691: For example ! 692: .sp 1 ! 693: .ti +4 ! 694: .nf ! 695: copy rel (dom1 = c5, dummy = d2, dom2 = i4, ! 696: dumb = d0) from "/mnt/me/data" ! 697: .fi ! 698: .sp 1 ! 699: The first five characters ! 700: are put in dom1, ! 701: the next two characters are ignored. ! 702: The next four bytes are ! 703: an i4 (integer) and go in dom2, ! 704: and the remaining delimited string ! 705: is ignored. ! 706: The name given to a dummy specifier is ! 707: ignored. ! 708: .sp 1 ! 709: As mentioned previously, ! 710: dummy domains can be used on a copy ! 711: "into" a Unix file for inserting ! 712: special characters. ! 713: The list of recognizable names includes: ! 714: .sp 1 ! 715: .in +5 ! 716: .nf ! 717: nl newline ! 718: tab tab character ! 719: sp space ! 720: nul a zero byte ! 721: null a zero byte ! 722: comma , ! 723: dash - ! 724: colon : ! 725: lparen ( ! 726: rparen ) ! 727: .fi ! 728: .in -5 ! 729: .sp 1 ! 730: .ti -4 ! 731: 3. Truncation ! 732: .sp 1 ! 733: It is not uncommon to have a mistake occur ! 734: and need to start over. ! 735: The simplest way to do that ! 736: is to "truncate" the relation. ! 737: This is done by the command: ! 738: .sp 1 ! 739: .ti +4 ! 740: modify relname to truncated ! 741: .sp 1 ! 742: This has the effect of removing ! 743: all tuples in relname, ! 744: releasing all disk space, ! 745: and making relname a heap again. ! 746: It is the logical equivalent of ! 747: a destroy followed by a create ! 748: (but with a lot less typing). ! 749: .sp 1 ! 750: Since formatting mistakes are possible ! 751: with copy, ! 752: it is not generally a good idea to ! 753: copy data into a relation that already ! 754: has valid data in it. ! 755: The best procedure is to create a ! 756: temporary relation with the same domains ! 757: as the existing relation. ! 758: Copy data into the temporary relation ! 759: and then append it to the real relation. ! 760: For example: ! 761: ! 762: .in +8 ! 763: .nf ! 764: create tempdom(name=c10,amount=f4,ext=i2) ! 765: ! 766: copy tempdom(name=c0,amount=c0,ext=c0) ! 767: from "/mnt/bob/data" ! 768: ! 769: range of td is tempdom ! 770: append to donation(td.all) ! 771: .fi ! 772: .in -8 ! 773: .sp 1 ! 774: 4. Specifing Delimitors. ! 775: .sp 1 ! 776: Sometimes it is desirable to specify ! 777: what the delimiting character should be ! 778: on a copy "from" a file. ! 779: This can be done by specifing: ! 780: ! 781: .ti +8 ! 782: domain = c0delim ! 783: ! 784: where "delim" is a valid delimitor ! 785: taken from the list of recognizable names. ! 786: This list was summarized on the ! 787: previous page under "dummy domains". ! 788: For example: ! 789: ! 790: .ti +8 ! 791: copy donation (name = c0nl) from "/mnt/me/data" ! 792: ! 793: will copy names from the file to the relation. ! 794: Only a new line will delimit the names so ! 795: any commas or tabs will be passed along as ! 796: part of the name. ! 797: ! 798: When copying "into" a Unix file, ! 799: the "delim" is actually written into the ! 800: file, ! 801: so on a copy "into" the specification: ! 802: ! 803: .ti +8 ! 804: copy donation (name = c0nl) into "/mnt/me/file" ! 805: ! 806: will cause "name" to be written followed by a new line ! 807: character. ! 808: .nr in 0 ! 809: .bp ! 810: 4. CHOOSING THE BEST STORAGE STRUCTURES ! 811: .sp 1 ! 812: .sp ! 813: We now turn to the issue of efficiency. ! 814: Once you have created a relation ! 815: and inserted your ! 816: data using either copy or append, ! 817: INGRES can process any query ! 818: on the relation. ! 819: There are several things you can do ! 820: to improve the speed at which INGRES ! 821: can process a query. ! 822: .sp ! 823: INGRES can store a relation in three different ! 824: internal ! 825: structures. ! 826: These are called "heap", ! 827: "isam", and "hash". ! 828: First we will briefly describe each ! 829: structure and then later expand our ! 830: discussion. ! 831: .sp ! 832: HEAP ! 833: .sp 1 ! 834: When a relation is first created, it is ! 835: created as a "heap". ! 836: There are two important properties about a heap: ! 837: duplicate tuples are not removed, ! 838: and nothing is known about the location of the tuples. ! 839: If you ran the query: ! 840: .sp 1 ! 841: .ti +5 ! 842: range of d is donation ! 843: .br ! 844: .ti +5 ! 845: retrieve (d.amount) where d.name = "bill" ! 846: .sp 1 ! 847: INGRES would have to read every tuple in the ! 848: relation looking for those with name "bill". ! 849: If the relation is small this isn't a ! 850: serious matter. ! 851: But if the relation is very large, this can take ! 852: minutes (or even hours!). ! 853: .sp 1 ! 854: HASH ! 855: .sp 1 ! 856: A relation whose structure is "hash" can give fast ! 857: access to searches on certain domains. ! 858: (Those domains are usually referred to as ! 859: "keyed domains".) ! 860: In addition, a "hashed" relation contains ! 861: no duplicate tuples. ! 862: For example, suppose the donation relation is stored hashed on ! 863: domain "name". ! 864: Then the query: ! 865: .sp 1 ! 866: .ti +5 ! 867: retrieve (d.amount) where d.name = "bill" ! 868: .sp 1 ! 869: will run quickly ! 870: since INGRES knows approximately where on ! 871: disk the tuple is stored. ! 872: If the relation contains only a few tuples you ! 873: won't notice the difference between a "heap" ! 874: and a "hash" structure. ! 875: But as the relation becomes larger, the ! 876: difference in speed becomes ! 877: much more noticeable. ! 878: .sp 1 ! 879: ISAM ! 880: .sp 1 ! 881: An isam structure is one where the relation is ! 882: sorted on one or more domains, ! 883: (also called keyed domains). ! 884: Duplicates are also removed on "isam relations". ! 885: When new tuples are appended they are ! 886: placed "approximately" in their sorted position in the ! 887: relation. ! 888: (The "approximately" will be explained a bit ! 889: later.) ! 890: .sp 1 ! 891: Suppose donation is isam on name. ! 892: To process the query ! 893: .sp 1 ! 894: .ti +5 ! 895: retrieve (d.amount) where d.name = "bill" ! 896: .sp 1 ! 897: INGRES will determine where in the sorted order ! 898: the name "bill" would be and read only ! 899: those portions of the relation. ! 900: .sp 1 ! 901: Since the relation is approximately sorted, ! 902: an isam structure is also efficient for ! 903: processing the query: ! 904: .ti +5 ! 905: .sp 1 ! 906: retrieve (d.amount) where d.name >= "b" and d.name < "g" ! 907: .sp 1 ! 908: This query would retrieve all names beginning ! 909: with "b" through "f". ! 910: The entire relation would not have to be ! 911: searched since it is isam on name. ! 912: .sp 2 ! 913: SPECIFYING THE STORAGE STRUCTURE ! 914: .sp ! 915: Any user created relation can be converted ! 916: to any storage structure using the ! 917: "modify" command. ! 918: For example ! 919: .sp ! 920: .ti 5 ! 921: modify donation to hash on name ! 922: .br ! 923: or ! 924: .br ! 925: .ti 5 ! 926: modify donation to isam on name ! 927: .sp ! 928: or even ! 929: .sp ! 930: .ti 5 ! 931: modify donation to heap ! 932: .sp 2 ! 933: PRIMARY AND OVERFLOW PAGES ! 934: .sp ! 935: At this point it is necessary to introduce the ! 936: concepts of primary and overflow pages on ! 937: hash and isam structures. ! 938: Both hash and isam are techniques for assigning ! 939: specific tuples to specific pages of a relation ! 940: based on the tuple's keyed domains. ! 941: Thus each page will contain only a certain ! 942: specified subset of the relation. ! 943: ! 944: When a new tuple is appended to a hash or isam ! 945: relation, INGRES ! 946: first determines what page it belongs to, ! 947: and then looks for room on that page. ! 948: If there is space then the tuple ! 949: is placed on that page. ! 950: If not, ! 951: then an "overflow" page is created and ! 952: the tuple is placed there. ! 953: ! 954: The overflow page is linked to the ! 955: original page. ! 956: The original page is called the "primary" ! 957: page. ! 958: If the overflow page became full, ! 959: then INGRES ! 960: would connect an overflow page to it. ! 961: We would then have one primary page ! 962: linked to an overflow page, ! 963: linked to another overflow page. ! 964: Overflow pages are dynamically added as ! 965: needed. ! 966: .sp 2 ! 967: SPECIFYING FREE SPACE ! 968: .sp ! 969: The modify command also lets you specify how much ! 970: room to leave for the relation to grow. ! 971: As was mentiond in "create", ! 972: relations are divided into pages. ! 973: A "fillfactor" can be used to specify how ! 974: full to make each primary page. ! 975: This decision should be based ! 976: only on whether more tuples ! 977: will be appended to the relation. ! 978: For example: ! 979: .sp ! 980: .ti 5 ! 981: .nf ! 982: modify donation to isam on name where fillfactor = 100 ! 983: .fi ! 984: .sp ! 985: This tells modify to make each page 100% full ! 986: if at all possible. ! 987: .sp ! 988: .ti 5 ! 989: .nf ! 990: modify donation to isam on name where fillfactor = 25 ! 991: .fi ! 992: .sp ! 993: This will leave each page 25% full or, in other words, ! 994: 75% empty. ! 995: We would do this if we had roughly 1/4 of the ! 996: data already loaded and it was fairly well distributed ! 997: about the alphabet. ! 998: .sp ! 999: Keep in mind that if you don't specify the fillfactor, ! 1000: INGRES will typically default to a reasonable choice. ! 1001: Also when a page becomes full, INGRES ! 1002: automatically creates an "overflow" ! 1003: page so it is never the case that a relation ! 1004: will be unable to expand. ! 1005: .sp ! 1006: When modifying a relation ! 1007: to hash, an additional ! 1008: parameter "minpages" can ! 1009: be specified. ! 1010: Modify will guarantee ! 1011: that at least "minpage" primary pages will be allocated ! 1012: for the relation. ! 1013: .sp ! 1014: Modify computes how may primary pages will be ! 1015: needed to store the existing tuples at ! 1016: the specified fillfactor ! 1017: assuming that no overflow pages will be necessary originally. ! 1018: If that number is less than ! 1019: minpages, then minpages is used instead. ! 1020: .sp ! 1021: For example: ! 1022: .sp ! 1023: .ti 5 ! 1024: .nf ! 1025: modify donation to hash on name where fillfactor = 50, ! 1026: .ti 10 ! 1027: minpages = 1 ! 1028: .sp 1 ! 1029: .ti 5 ! 1030: modify donation to hash on name where minpages = 150 ! 1031: .fi ! 1032: .sp ! 1033: In the first case we guarantee that no more ! 1034: pages than are necessary will be ! 1035: used for 50% occupancy. ! 1036: The second case is typically ! 1037: used for modifying an empty or near ! 1038: empty relation. ! 1039: If the approximate maximum ! 1040: size of the relation is known in advance, ! 1041: minpages ! 1042: can be used to guarantee that the relation will ! 1043: have its expected maximum size. ! 1044: .sp ! 1045: There is one other option available for hash called ! 1046: "maxpages". ! 1047: Its syntax is the same as minpages. ! 1048: It can be used to specify the maximum ! 1049: number of primary pages to use. ! 1050: .sp ! 1051: COMPRESSION ! 1052: .sp 1 ! 1053: The three storage structures ! 1054: (heap, hash, isam) can optionally ! 1055: have "compression" applied ! 1056: to them. ! 1057: To do this, refer to the ! 1058: storage structures as cheap, chash, and cisam. ! 1059: Compression reduces ! 1060: the amount of space needed to store each tuple ! 1061: internally. ! 1062: The current compression technique is to ! 1063: suppress trailing blanks in ! 1064: character domains. ! 1065: Using compression will never ! 1066: require more space and typically ! 1067: it can save disk space and improve ! 1068: performance. ! 1069: Here is an example: ! 1070: .sp 1 ! 1071: .nf ! 1072: .ti +5 ! 1073: modify donation to cisam on name where fillfactor = 100 ! 1074: .fi ! 1075: .sp 1 ! 1076: This will make donation a compressed isam ! 1077: structure and fill every page as ! 1078: full as possible. ! 1079: With compression, each tuple ! 1080: can have a different compressed ! 1081: length. ! 1082: Thus the number of tuples ! 1083: that can fit on one page will ! 1084: depend on how successfully ! 1085: they can be compressed. ! 1086: ! 1087: Compressed relations can be more expensive to update. ! 1088: In particular if a replace is done on one or ! 1089: more domains and the compressed tuple is no ! 1090: longer the same length, ! 1091: then INGRES must look for a new place to put the tuple. ! 1092: .sp 2 ! 1093: TWO VARIATIONS ON A THEME ! 1094: .sp ! 1095: As mentioned, duplicates are not removed ! 1096: from a relation stored ! 1097: as a heap. ! 1098: Frequently it is desirable ! 1099: to remove duplicates and sort ! 1100: a heap relation. ! 1101: One way of doing this is to modify the ! 1102: relation to isam specifying ! 1103: the order in which to sort ! 1104: the relation. ! 1105: An alternative to this is to use either ! 1106: "heapsort" or "cheapsort". ! 1107: For example ! 1108: .sp ! 1109: .ti 5 ! 1110: .nf ! 1111: modify donation to heapsort on name, ext ! 1112: .fi ! 1113: .sp ! 1114: This will sort the relation by ! 1115: name then ext. ! 1116: The tuples are further sorted on the ! 1117: remaining domains, ! 1118: in the order they were listed in the ! 1119: original create statement. ! 1120: So in this case the relation will be ! 1121: sorted on name then ext and then amount. ! 1122: Duplicate tuples are always removed. ! 1123: The relation will be left ! 1124: as a heap. ! 1125: Heapsort and cheapsort are intended ! 1126: for sorting a temporary relation before printing and ! 1127: destroying it. ! 1128: It is more efficient than modifying ! 1129: to isam because ! 1130: with isam INGRES creates a ! 1131: "directory" containing ! 1132: key information about each page. ! 1133: The relation will NOT be kept sorted ! 1134: when further updates occur. ! 1135: .sp ! 1136: Examples: ! 1137: .sp ! 1138: .nr in 2n ! 1139: Here are a collection of examples ! 1140: and comments as to the efficiency of ! 1141: each query. ! 1142: The queries are based on the ! 1143: relations: ! 1144: .(l ! 1145: parts(pnum, pname, color, weight, qoh) ! 1146: .br ! 1147: supply(snum, pnum, jnum, shipdate, quan) ! 1148: .sp 1 ! 1149: range of p is parts ! 1150: .br ! 1151: range of s is supply ! 1152: .sp 1 ! 1153: modify parts to hash on pnum ! 1154: .br ! 1155: modify supply to hash on snum,jnum ! 1156: .)l ! 1157: .ti +5 ! 1158: .sp 1 ! 1159: retrieve (p.all) where p.pnum = 10 ! 1160: .sp 1 ! 1161: INGRES will recognize that parts is ! 1162: hashed on pnum and go directly to the ! 1163: page where parts with number 10 would be stored. ! 1164: .sp 1 ! 1165: .ti +5 ! 1166: retrieve (p.all) where p.pname = "tape drive" ! 1167: .sp 1 ! 1168: INGRES will read the entire relation ! 1169: looking for matching pnames. ! 1170: .sp 1 ! 1171: .ti +5 ! 1172: retrieve (p.all) where p.pnum < 10 and p.pnum > 5 ! 1173: .sp 1 ! 1174: INGRES will read the entire relation ! 1175: because no exact value for pnum ! 1176: was given. ! 1177: .sp 1 ! 1178: .ti +5 ! 1179: retrieve (s.shipdate) where s.snum = 471 and s.jnum = 1008 ! 1180: .sp 1 ! 1181: INGRES will recognize that supply is hashed on the ! 1182: combination of snum and jnum and will go directly ! 1183: to the correct page. ! 1184: .ti +5 ! 1185: .sp 1 ! 1186: retrieve (s.shipdate) where s.snum = 471 ! 1187: .sp 1 ! 1188: INGRES will read the entire ! 1189: relation. ! 1190: Supply is hashed on the ! 1191: combination of snum and jnum. ! 1192: Unless INGRES is given a unique ! 1193: value for both, it cannot ! 1194: take advantage of the storage ! 1195: structure. ! 1196: .sp 1 ! 1197: .ti +5 ! 1198: retrieve (p.pname, s.shipdate) where ! 1199: .ti +5 ! 1200: .br ! 1201: p.pnum = s.pnum and s.snum = 471 and s.jnum = 1008 ! 1202: .sp 1 ! 1203: INGRES will take advantage of both ! 1204: storage structures. ! 1205: It will first find all ! 1206: s.pnum and s.shipdate ! 1207: where s.snum = 471 and ! 1208: s.jnum = 1008. ! 1209: After that it will look for all ! 1210: p.pname where p.pnum is equal to ! 1211: the correct value. ! 1212: .sp 1 ! 1213: This example illustrates the idea that it is ! 1214: frequently a good idea to hash a ! 1215: relation on the domains where it is ! 1216: "joined" with another relation. ! 1217: For example, in this ! 1218: case it is very common to ask ! 1219: for p.pnum = s.pnum ! 1220: .sp 1 ! 1221: To summarize: ! 1222: .sp 1 ! 1223: To take advantage of a hash ! 1224: structure, ! 1225: INGRES needs an exact value ! 1226: for each key domain. ! 1227: An exact value is anything ! 1228: such as: ! 1229: .ti +5 ! 1230: .sp 1 ! 1231: s.snum = 471 ! 1232: .br ! 1233: .ti +5 ! 1234: s.pnum = p.pnum ! 1235: .sp 1 ! 1236: An exact value is not ! 1237: .sp 1 ! 1238: .ti +5 ! 1239: s.snum >= 471 ! 1240: .br ! 1241: .ti +5 ! 1242: (s.snum = 10 or s.snum = 20) ! 1243: .sp 1 ! 1244: Now let's consider some ! 1245: cases using isam ! 1246: .sp 1 ! 1247: .in +5 ! 1248: modify supply to isam on snum,shipdate ! 1249: .br ! 1250: retrieve (s.all) where s.snum = 471 ! 1251: .br ! 1252: and s.shipdate > "75-12-31" ! 1253: .br ! 1254: and s.shipdate < "77-01-01" ! 1255: .sp 1 ! 1256: .in -5 ! 1257: Since supply is sorted first on snum and then ! 1258: on shipdate, INGRES ! 1259: can take full advantage of the ! 1260: isam structure to locate the ! 1261: portions of supply which satisfy ! 1262: the query. ! 1263: .sp 1 ! 1264: .ti +5 ! 1265: retrieve (s.all) where s.snum = 47l ! 1266: .sp 1 ! 1267: Unlike hash, an isam structure ! 1268: can still be used if only the first key is ! 1269: provided. ! 1270: .sp 1 ! 1271: .ti +5 ! 1272: retrieve (s.all) where s.snum > 400 and s.snum < 500 ! 1273: .sp 1 ! 1274: Again INGRES will take advantage of the structure. ! 1275: .sp 1 ! 1276: .ti +5 ! 1277: retrieve (s.all) where s.shipdate >= "75-12-31" and ! 1278: .ti +5 ! 1279: s.shipdate <= "77-01-01" ! 1280: .sp 1 ! 1281: Here INGRES will read the entire relation. ! 1282: This is because the first key (snum) is not ! 1283: provided in the query. ! 1284: .sp 1 ! 1285: To summarize: ! 1286: .sp 1 ! 1287: Isam can provide improved access ! 1288: on either exact values or ranges of ! 1289: values. ! 1290: It is useful as long as at least ! 1291: the first key is provided. ! 1292: .sp 1 ! 1293: To locate where the tuples are ! 1294: in an isam relation, ! 1295: INGRES searches the isam directory for that ! 1296: relation. ! 1297: When a relation is modified to isam, ! 1298: the tuples are first sorted and duplicates ! 1299: are removed. ! 1300: Next, the relation is ! 1301: filled (according to the fillfactor) starting ! 1302: at page 0, 1, 2... for as many ! 1303: pages as are needed. ! 1304: .sp 1 ! 1305: Now the directory is built. ! 1306: The key domains from the first ! 1307: tuple on each page are collected and ! 1308: organized into a directory (stored in the relation ! 1309: on disk). ! 1310: The directory is never changed ! 1311: until the next time a modify is done. ! 1312: .sp 1 ! 1313: Whenever a tuple is added to the relation, ! 1314: the directory is searched to find ! 1315: which page the new tuple belongs on. ! 1316: Within that page, the individual ! 1317: tuples are NOT kept sorted. ! 1318: This is what is meant by "approximately" sorted. ! 1319: .sp 2 ! 1320: .nr in 0 ! 1321: HEAP v. HASH v. ISAM ! 1322: .sp 1 ! 1323: Let's now compare the relative advantages and disadvantages ! 1324: of each option. ! 1325: A relation is always created as a heap. ! 1326: A heap is the most efficient ! 1327: structure to use to initially ! 1328: fill a relation using copy or append. ! 1329: .sp 1 ! 1330: Space from deleted tuples of a heap ! 1331: is only reused on the last page. ! 1332: No duplicate checking is done on ! 1333: a heap relation. ! 1334: .sp 1 ! 1335: Hash is advantageous for locating tuples ! 1336: referenced in a qualification by an exact ! 1337: value. ! 1338: The primary page for tuples with a specific ! 1339: value can be easily computed. ! 1340: .sp 1 ! 1341: Isam is useful for both exact values and ranges of values. ! 1342: Since the isam directory must be searched to ! 1343: locate tuples, it is never as efficient as hash. ! 1344: .sp 2 ! 1345: OVERFLOW PAGES ! 1346: .sp 1 ! 1347: When a tuple is to be inserted ! 1348: and there is no more room on the ! 1349: primary page of a relation, then an ! 1350: overflow page is created. ! 1351: As more tuples are inserted, additional overflow ! 1352: pages are added as needed. ! 1353: Overflow pages, while necessary, decrease ! 1354: the system performance for ! 1355: retrieves and updates. ! 1356: .sp 1 ! 1357: For example, let's suppose that supply ! 1358: is hashed on snum and has 10 primary pages. ! 1359: Suppose the value snum = 3 falls on page 7. ! 1360: To find all snum = 3 requires INGRES to search ! 1361: primary page 7 and all overflow pages of page 7 ! 1362: (if any). ! 1363: As more overflow pages are added the time ! 1364: needed to search for ! 1365: snum = 3 will increase. ! 1366: Since duplicates are removed on isam and hash, ! 1367: this search must be performed on appends and ! 1368: replaces also. ! 1369: .sp 1 ! 1370: When a hash or isam relation has too many overflow pages ! 1371: it should be remodified to hash ! 1372: or isam again. ! 1373: This will clear up the relation ! 1374: and eliminate as many overflow pages as possible. ! 1375: .sp 2 ! 1376: UNIQUE KEYS ! 1377: .sp 1 ! 1378: When choosing key domains for a relation ! 1379: it is desirable to have each set of ! 1380: key domains ! 1381: as unique as possible. ! 1382: For example, employee id numbers ! 1383: typically have no ! 1384: duplicate values, while ! 1385: something like color ! 1386: is likely to have only a few distinct ! 1387: values, and something like ! 1388: sex, to the best of our knowledge, has only two ! 1389: values. ! 1390: .sp 1 ! 1391: If a relation is hashed on domain sex then you can expect to have all ! 1392: males on one primary page and all its ! 1393: overflow pages and a corresponding ! 1394: situation with females. ! 1395: With a hash relation there is no solution to this ! 1396: problem. ! 1397: A trade-off must be made between the ! 1398: most desirable key domains to use in a ! 1399: qualification versus the uniqueness of the ! 1400: key values. ! 1401: .sp 1 ! 1402: Since isam structure can be used if at least ! 1403: the first key is provided, extra ! 1404: key domains can sometimes be added to increase uniqueness. ! 1405: For example, suppose the supply ! 1406: relation has only 10 unique supplier numbers ! 1407: but thousands of tuples. ! 1408: Choosing an isam structure with the keys snum and jnum ! 1409: will probably give many more unique keys. ! 1410: However, the directory size will ! 1411: be larger and consequently it will ! 1412: take longer to search. ! 1413: When providing additional keys ! 1414: just for the sake of increasing ! 1415: uniqueness, ! 1416: try to use the smallest possible domains. ! 1417: .sp 2 ! 1418: SYSTEM RELATIONS ! 1419: .sp 1 ! 1420: INGRES uses three relations ! 1421: ("relation", "attribute", and "indexes") to maintain ! 1422: and organize a data base. ! 1423: The "relation" relation has one tuple for ! 1424: each relation in the data base. ! 1425: The "attribute" relation has one tuple ! 1426: for each attribute in each ! 1427: relation. ! 1428: The "indexes" relation ! 1429: has one tuple for each secondary ! 1430: index. ! 1431: .sp 1 ! 1432: INGRES accesses these relations ! 1433: in a very well defined manner. ! 1434: A program called "sysmod" should be used ! 1435: to modify these relations to hash on the ! 1436: appropriate domains. ! 1437: To use sysmod the data base ! 1438: administrator types ! 1439: .sp 1 ! 1440: % sysmod data-base-name ! 1441: .sp 1 ! 1442: Sysmod should be run ! 1443: initially after the data base is created and subsequently ! 1444: as relations are created and the data ! 1445: base grows. ! 1446: It is insufficient to run ! 1447: sysmod only once and forget about it. ! 1448: Rerunning sysmod will cause the ! 1449: system relations to be remodified. ! 1450: This will typically remove ! 1451: most overflow pages and improve ! 1452: system response time ! 1453: for everything. ! 1454: .bp ! 1455: 5. SECONDARY INDICES ! 1456: .sp 1 ! 1457: Using an isam or hash structure ! 1458: provides a fast way to find ! 1459: tuples in a relation given values for the key ! 1460: domains. ! 1461: Sometimes this is not enough. ! 1462: For example, suppose we have ! 1463: the donation relation ! 1464: .sp 1 ! 1465: .ti +5 ! 1466: donation(name, amount, ext) ! 1467: .sp 1 ! 1468: hashed on name. ! 1469: This will provide fast access ! 1470: to queries where the qualification has ! 1471: an exact value for name. ! 1472: What if we also will be doing ! 1473: queries giving exact values for ext? ! 1474: .sp 1 ! 1475: Donation can be hashed either on name ! 1476: or ext, so we would have to choose which is more common ! 1477: and hash donation on that domain. ! 1478: The other domain (say ext) can have ! 1479: a secondary index. ! 1480: A secondary index is a relation which contains ! 1481: each "ext" together with the exact ! 1482: location of where the tuple is in the relation ! 1483: donation. ! 1484: .sp 1 ! 1485: The command to create a secondary ! 1486: index is: ! 1487: .sp 1 ! 1488: .ti +5 ! 1489: index on donation is donext (ext) ! 1490: .sp 1 ! 1491: The general format is: ! 1492: .sp 1 ! 1493: .ti +5 ! 1494: index on relation_name is secondary_index_name (domains) ! 1495: .sp 1 ! 1496: Here we are asking INGRES ! 1497: to create a secondary index on the relation ! 1498: donation. ! 1499: The domain being indexed is "ext". ! 1500: Indices are formed in three steps: ! 1501: .sp 1 ! 1502: .in +4 ! 1503: .ti -5 ! 1504: 1. "Donext" is created as a heap. ! 1505: .br ! 1506: .ti -5 ! 1507: 2. For each ! 1508: tuple in donation, a tuple is inserted ! 1509: in "donext" with the value for ext and the ! 1510: exact location of the corresponding tuple in ! 1511: donation. ! 1512: .br ! 1513: .ti -5 ! 1514: 3. By default "donext" is modified to isam. ! 1515: .in -4 ! 1516: .sp 1 ! 1517: Now if you run the query ! 1518: .sp 1 ! 1519: .ti +5 ! 1520: range of d is donation ! 1521: .ti +5 ! 1522: retrieve(d.amount) where d.ext = 207 ! 1523: .sp 1 ! 1524: INGRES will automatically look first in ! 1525: "donext" to find ext = 207. ! 1526: When it finds one it then goes directly ! 1527: to the tuple in the donation relation. ! 1528: Since "donext" is isam on ext, search for ! 1529: ext = 207 can typically be ! 1530: done rapidly. ! 1531: .sp 1 ! 1532: If you run the query ! 1533: .sp 1 ! 1534: .ti 5 ! 1535: retrieve(d.amount) where d.name = "frank" ! 1536: .sp 1 ! 1537: then INGRES will continue to use the hash ! 1538: structure of the relation "donation" ! 1539: to locate the qualifying tuples. ! 1540: .sp 1 ! 1541: Since secondary indices are themselves relations, ! 1542: they also can be either hash, isam, chash or cisam. ! 1543: It never makes sense to a secondary index a heap. ! 1544: .sp 1 ! 1545: The decision as to what structure to make ! 1546: them on involves the same issues ! 1547: as were discussed before: ! 1548: .sp 1 ! 1549: Will the domains be referenced by exact value? ! 1550: .br ! 1551: Will they be referenced by ranges of value? ! 1552: .br ! 1553: etc. ! 1554: .sp 1 ! 1555: In this case the "ext" domain ! 1556: will be referenced by exact values, and ! 1557: since the relation is nearly full we will do: ! 1558: .sp 1 ! 1559: .ti +5 ! 1560: modify donext to hash on ext where fillfactor = 100 ! 1561: .ti +5 ! 1562: and minpages = 1 ! 1563: .sp 1 ! 1564: Secondary indices provide a way for INGRES ! 1565: to access tuples based on domains ! 1566: that are not key domains. ! 1567: A relation can have any number of secondary ! 1568: indices and in addition ! 1569: each secondary index can be an index ! 1570: on up to six domains of the primary relation. ! 1571: .sp 1 ! 1572: Whenever a tuple is replaced, deleted ! 1573: or appended to a primary relation, ! 1574: all secondary indices must ! 1575: also be updated. ! 1576: Thus secondary indices ! 1577: are "not free". ! 1578: They increase ! 1579: the cost of updating the ! 1580: primary relation, but ! 1581: can decrease the cost of finding tuples ! 1582: in the primary relation. ! 1583: .sp 1 ! 1584: Whether a secondary index will improve ! 1585: performance or not strongly ! 1586: depends on the uniqueness of the ! 1587: values of the domains being ! 1588: indexed. ! 1589: The primary concern is whether searching ! 1590: through the secondary index is ! 1591: more efficient than simply ! 1592: reading the entire primary relation. ! 1593: In general it is if the ! 1594: number of tuples which satisfy the ! 1595: qualification is less than the number of total pages ! 1596: (both primary and overflow) in the primary ! 1597: relation. ! 1598: .sp 1 ! 1599: For example if we frequently want to find ! 1600: all people who donated less than ! 1601: five dollars, consider creating ! 1602: .sp 1 ! 1603: .ti +5 ! 1604: index on donation is donamount (amount) ! 1605: .sp 1 ! 1606: By default donamount will be isam ! 1607: on amount. ! 1608: IF INGRES processes the query: ! 1609: .sp 1 ! 1610: .ti +5 ! 1611: retrieve(d.name) where d.amount < 5.0 ! 1612: .sp 1 ! 1613: it will locate d.amount < 5.0 in the secondary ! 1614: index and for each tuple it ! 1615: finds will fetch the corresponding ! 1616: tuple in donation. ! 1617: The tuples in donamount are sorted by ! 1618: amount but the tuples ! 1619: in donation are not. ! 1620: Thus in general each tuple fetch from ! 1621: donation via donamount will be on a ! 1622: different page. ! 1623: Retrieval using the secondary index can then cause more page ! 1624: reads than simply reading all of donation sequentially! ! 1625: So in this example it would ! 1626: be a bad idea to create the secondary ! 1627: index. ! 1628: .bp ! 1629: 6. RECOVERY AND DATA UPDATE ! 1630: .sp 1 ! 1631: INGRES has been carefully designed ! 1632: to protect the integrity of a data base ! 1633: against certain classes ! 1634: of system failures. ! 1635: To do this INGRES ! 1636: processes changes to a relation ! 1637: using what we call "deferred ! 1638: update" or "batch file update". ! 1639: In addition there are two INGRES ! 1640: programs "restore" and "purge" that can be used to check ! 1641: out a data base after a system failure. ! 1642: We will first discuss how deferred updates are created ! 1643: and processed, and second we will discuss ! 1644: the use of purge and restore. ! 1645: .sp 1 ! 1646: DEFERRED UPDATE (Batch update) ! 1647: .in +4 ! 1648: .sp 1 ! 1649: .ti -5 ! 1650: An append, replace or delete command is run in four steps: ! 1651: .sp 1 ! 1652: .ti -5 ! 1653: 1. An empty batch file is created. ! 1654: .ti -5 ! 1655: 2. The command is run to completion ! 1656: and each change to the result relation is written into ! 1657: the batch file. ! 1658: .ti -5 ! 1659: 3. The batch file is read and the ! 1660: relation and its secondary indices (if any) ! 1661: are actually updated. ! 1662: .ti -5 ! 1663: 4. The batch file is destroyed and INGRES ! 1664: returns back to the user. ! 1665: .sp 1 ! 1666: .in -4 ! 1667: Deferred update defers all actual ! 1668: updating until the very end of ! 1669: the query. ! 1670: There are three advantages to doing this. ! 1671: .sp 1 ! 1672: l. Provides recovery from system failures ! 1673: .sp 1 ! 1674: If the system "crashes" during an update, ! 1675: the INGRES recovery program will decide to either ! 1676: run the update to completion or else ! 1677: "back out" the update, leaving the ! 1678: relation as it looked before the update ! 1679: was started. ! 1680: .sp 1 ! 1681: 2. Prevents infinite queries ! 1682: .sp 1 ! 1683: If "donation" were a heap and the query ! 1684: .sp 1 ! 1685: .ti +4 ! 1686: range of d is donation ! 1687: .ti +4 ! 1688: append to donation(d.all) ! 1689: .sp 1 ! 1690: were run without deferred update, ! 1691: it would terminate only when it ran ! 1692: out of space on disk! ! 1693: This is because INGRES would start reading the ! 1694: relation from the beginning and ! 1695: appending each tuple at the end. ! 1696: It would soon start reading the tuples it ! 1697: had just previously appended and ! 1698: continue indefinitely to ! 1699: "chase its tail". ! 1700: .sp 1 ! 1701: While this query is certainly not ! 1702: typical, it illustrates the point. ! 1703: There are certain classes of queries ! 1704: where problems occur if WHEN ! 1705: an update actually occurs ! 1706: is not precisely defined. ! 1707: With deferred update we can ! 1708: guarantee consistent and logical ! 1709: results. ! 1710: .sp 1 ! 1711: 3. Speeds up processing of secondary indices ! 1712: .sp 1 ! 1713: Secondary indices can be updated ! 1714: faster if they are done one at a time ! 1715: instead of all at once. ! 1716: It also insures protection against ! 1717: the secondary index becoming inconsistent ! 1718: with its primary relation. ! 1719: .sp 1 ! 1720: TURNING DEFERRED UPDATE OFF ! 1721: .sp 1 ! 1722: If you are not persuaded by any of ! 1723: these arguments, INGRES ! 1724: allows you to turn deferred update off! ! 1725: Indeed there are certain cases when ! 1726: it is appropriate (although ! 1727: certainly not essential) to perform ! 1728: updates directly, that is, the relation is updated ! 1729: while the query is being processed. ! 1730: .sp 1 ! 1731: To use direct update, you must be given ! 1732: permission by the INGRES ! 1733: super user. ! 1734: Then when invoking INGRES ! 1735: specify the "-b" flag which turns ! 1736: off batch update. ! 1737: .sp 1 ! 1738: .ti +4 ! 1739: % ingres mydate -b ! 1740: .sp 1 ! 1741: INGRES will use direct update on any relation without ! 1742: secondary indices. ! 1743: It will still silently use ! 1744: deferred update if a relation ! 1745: has any secondary indices. ! 1746: By using the "-b" flag you are ! 1747: sacrificing points 1 and 2 above. ! 1748: In most cases you SHOULD NOT ! 1749: use the -b flag. ! 1750: .sp 1 ! 1751: If you are using INGRES ! 1752: to interactively enter ! 1753: or change one tuple at ! 1754: a time, it is slightly ! 1755: more efficient to have deferred ! 1756: update turned off. ! 1757: If the system crashes during an ! 1758: update the person entering the data ! 1759: will be aware of the situation ! 1760: and can check whether the tuple ! 1761: was updated or not. ! 1762: .sp 1 ! 1763: RESTORE ! 1764: .sp 1 ! 1765: INGRES is designed to recover ! 1766: from the common types of system ! 1767: crashes which leave the Unix file ! 1768: system intact. ! 1769: It can recover from updates, creates, ! 1770: destroys, modifies and index commands. ! 1771: .sp 1 ! 1772: INGRES is designed to "fail safe". ! 1773: If any inconsistancies are ! 1774: discovered or any failures ! 1775: are returned from Unix, ! 1776: INGRES will generate a system error ! 1777: message (SYSERR) and exit. ! 1778: .sp 1 ! 1779: Whenever Unix crashes while INGRES ! 1780: is running or whenever an INGRES ! 1781: syserr occurs, it is ! 1782: generally a good idea to have the date ! 1783: base administrator run the command ! 1784: .sp 1 ! 1785: .ti +5 ! 1786: % restore data_base_name ! 1787: .sp 1 ! 1788: The restore program performs the ! 1789: following functions: ! 1790: .in +4 ! 1791: .sp 1 ! 1792: .ti -5 ! 1793: 1. Looks for batch update files. ! 1794: If any are found, it examines each ! 1795: one to see if it is complete. ! 1796: If the system crash occured while ! 1797: the batch file was being read ! 1798: and the data base being updated, ! 1799: then restore will complete ! 1800: the update. ! 1801: Otherwise the batch file was not ! 1802: completed and it is simply destroyed; ! 1803: the effect is as though the query had never been run. ! 1804: .sp 1 ! 1805: .ti -5 ! 1806: 2. Checks for uncompleted modify commands. ! 1807: This step is crucial. ! 1808: It guarantees that you will either have the ! 1809: relation as it existed before ! 1810: the modify, or restore will complete ! 1811: the modify command. ! 1812: Modify works by creating a new copy ! 1813: of the relation in the new structure. ! 1814: Then when it is ready to replace the old ! 1815: relation, it stores the new information in a ! 1816: "modify batch file". ! 1817: This enables restore to determine the state of ! 1818: uncompleted modifies. ! 1819: .sp 1 ! 1820: .ti -5 ! 1821: 3. Checks consistency of system ! 1822: relations. ! 1823: This check is used to complete "destory" ! 1824: commands, back out "create" commands, ! 1825: and back out or complete "index" ! 1826: commands that were interrupted by a ! 1827: system crash. ! 1828: .sp 1 ! 1829: .ti -5 ! 1830: 4. Purges temporary relations and files. ! 1831: Restore executes the "purge" program to ! 1832: remove temporary relations and temporary ! 1833: files created by the system. ! 1834: Purge will be discussed in more detail a bit later. ! 1835: .in -4 ! 1836: .sp 1 ! 1837: Restore cannot tell the user which queries have run and ! 1838: which have not. ! 1839: It can only identify those queries which were in the ! 1840: process of being run when the crash occured. ! 1841: When batching queries together, ! 1842: it is a good idea to save the output in a file. ! 1843: By having the monitor print out each query or set of ! 1844: queries, ! 1845: the user can later identify which queries were run. ! 1846: .sp 1 ! 1847: Restore has several options to increase its ! 1848: usability. ! 1849: They are specified by "flags". ! 1850: The options include: ! 1851: .sp 1 ! 1852: .in +4 ! 1853: .nf ! 1854: -a ask before doing anything ! 1855: -f passed to purge. used to remove temporary files. ! 1856: -p passed to purge. used to destory expired rela- ! 1857: tions. ! 1858: no database restores all data bases for which you are the ! 1859: dba. ! 1860: .fi ! 1861: .in -4 ! 1862: .sp 1 ! 1863: Of these options the "-a" is the most ! 1864: important. ! 1865: It can happen that a Unix crash can cause a page of ! 1866: the system catalogues to be ! 1867: incorrect. ! 1868: This might cause restore to destory ! 1869: a relation. ! 1870: In fact, you might want ! 1871: to "patch" the system relations to correct ! 1872: the problem. ! 1873: No restore program can account ! 1874: for all possibilities. ! 1875: It is therefore no replacement ! 1876: (fortunately) for a human. ! 1877: .sp 1 ! 1878: If "-a" is specified, restore ! 1879: will state what it wants to do and then ask ! 1880: for permission. ! 1881: It reads standard input and ! 1882: accepts "y" to mean go ahead and anything ! 1883: else to mean no. ! 1884: For example, to have restore ask you before ! 1885: doing anything ! 1886: .sp 1 ! 1887: .ti +5 ! 1888: restore -a mydatabase ! 1889: .sp 1 ! 1890: To have it take "no" for all its questions ! 1891: .sp 1 ! 1892: .ti +5 ! 1893: restore -a mydatabase </dev/null ! 1894: .sp 1 ! 1895: Using the -a flag, ! 1896: restore might ask for permission ! 1897: to perform some cleanup; ! 1898: for example, ! 1899: if it finds an attribute for which there ! 1900: is no corresponding relation, ! 1901: or if it finds a secondary index for which ! 1902: there is no primary relation, ! 1903: etc. ! 1904: .sp 1 ! 1905: To date, we have never had a system ! 1906: crash which INGRES ! 1907: could not recover from. ! 1908: This does not mean that it will never happen, but ! 1909: rather that it shouldn't ! 1910: be too great ! 1911: a concern for you. ! 1912: It should be mentioned that restore is not ! 1913: a substitution for doing periodic ! 1914: backing up, nor does it ! 1915: ever perform such a function. ! 1916: .sp 1 ! 1917: PURGE ! 1918: .sp 1 ! 1919: Purge can be used to report expired relations, ! 1920: destroy temporary system relations, ! 1921: remove extraneous files, ! 1922: and destory expired relations. ! 1923: To use purge you must be the DBA ! 1924: for the data base. ! 1925: .sp 1 ! 1926: .ti +5 ! 1927: % purge mydatabase ! 1928: .sp 1 ! 1929: Purge has several options which are ! 1930: specified by flags which are ! 1931: worth noting: ! 1932: .nr in 4n ! 1933: .sp 1 ! 1934: .nf ! 1935: -f (default is off) remove all extraneous files. ! 1936: Each file is reported and then removed. If "-f" ! 1937: is not specified then the file is only reported. ! 1938: .sp 1 ! 1939: -p (default is off) destroy all expired relations. ! 1940: Each expired relation is reported and if "-p" ! 1941: was specified the relation is destroyed. ! 1942: .fi ! 1943: .nr in 0 ! 1944: .sp 1 ! 1945: Purge always destroys relations and files ! 1946: which are known to be INGRES ! 1947: system temporaries. ! 1948: When processing multi-variable ! 1949: queries and queries with aggregate functions, ! 1950: INGRES will usually create temporary relations ! 1951: with intermediate results. ! 1952: These relations always begin with the ! 1953: characters "_SYS". ! 1954: Other INGRES commands create temporary files which also ! 1955: begin with "_SYS". ! 1956: Under normal processing they are ! 1957: always destroyed. ! 1958: If a system crash occurs, they might be left. ! 1959: Purge will always clean up the temporary ! 1960: system files. ! 1961: It cleans up the user relations only ! 1962: when specifically asked to.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.