Annotation of 43BSD/ucb/compress/README, revision 1.1.1.1

1.1       root        1: 
                      2:        @(#)README      5.3 (Berkeley) 9/17/85
                      3: 
                      4: Compress version 4.0 improvements over 3.0:
                      5:        o compress() speedup (10-50%) by changing division hash to xor
                      6:        o decompress() speedup (5-10%)
                      7:        o Memory requirements reduced (3-30%)
                      8:        o Stack requirements reduced to less than 4kb
                      9:        o Removed 'Big+Fast' compress code (FBITS) because of compress speedup
                     10:        o Portability mods for Z8000 and PC/XT (but not zeus 3.2)
                     11:        o Default to 'quiet' mode
                     12:        o Unification of 'force' flags
                     13:        o Manual page overhaul
                     14:        o Portability enhancement for M_XENIX
                     15:        o Removed text on #else and #endif
                     16:        o Added "-V" switch to print version and options
                     17:        o Added #defines for SIGNED_COMPARE_SLOW
                     18:        o Added Makefile and "usermem" program
                     19:        o Removed all floating point computations
                     20:        o New programs: [deleted]
                     21: 
                     22: The "usermem" script attempts to determine the maximum process size.  Some
                     23: editing of the script may be necessary (see the comments).  [It should work
                     24: fine on 4.3 bsd.] If you can't get it to work at all, just create file
                     25: "USERMEM" containing the maximum process size in decimal.
                     26: 
                     27: The following preprocessor symbols control the compilation of "compress.c":
                     28: 
                     29:        o USERMEM               Maximum process memory on the system
                     30:        o SACREDMEM             Amount to reserve for other proceses
                     31:        o SIGNED_COMPARE_SLOW   Unsigned compare instructions are faster
                     32:        o NO_UCHAR              Don't use "unsigned char" types
                     33:        o BITS                  Overrules default set by USERMEM-SACREDMEM
                     34:        o vax                   Generate inline assembler
                     35:        o interdata             Defines SIGNED_COMPARE_SLOW
                     36:        o M_XENIX               Makes arrays < 65536 bytes each
                     37:        o pdp11                 BITS=12, NO_UCHAR
                     38:        o z8000                 BITS=12
                     39:        o pcxt                  BITS=12
                     40:        o BSD4_2                Allow long filenames ( > 14 characters) &
                     41:                                Call setlinebuf(stderr)
                     42: 
                     43: The difference "usermem-sacredmem" determines the maximum BITS that can be
                     44: specified with the "-b" flag.
                     45: 
                     46: memory: at least               BITS
                     47: ------  -- -----                ----
                     48:      433,484                    16
                     49:      229,600                    15
                     50:      127,536                    14
                     51:       73,464                    13
                     52:            0                    12
                     53: 
                     54: The default is BITS=16.
                     55: 
                     56: The maximum bits can be overrulled by specifying "-DBITS=bits" at
                     57: compilation time.
                     58: 
                     59: WARNING: files compressed on a large machine with more bits than allowed by 
                     60: a version of compress on a smaller machine cannot be decompressed!  Use the
                     61: "-b12" flag to generate a file on a large machine that can be uncompressed 
                     62: on a 16-bit machine.
                     63: 
                     64: The output of compress 4.0 is fully compatible with that of compress 3.0.
                     65: In other words, the output of compress 4.0 may be fed into uncompress 3.0 or
                     66: the output of compress 3.0 may be fed into uncompress 4.0.
                     67: 
                     68: The output of compress 4.0 not compatible with that of
                     69: compress 2.0.  However, compress 4.0 still accepts the output of
                     70: compress 2.0.  To generate output that is compatible with compress
                     71: 2.0, use the undocumented "-C" flag.
                     72: 
                     73:        -from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85
                     74: --------------------------------
                     75: 
                     76: Enclosed is compress version 3.0 with the following changes:
                     77: 
                     78: 1.     "Block" compression is performed.  After the BITS run out, the
                     79:        compression ratio is checked every so often.  If it is decreasing,
                     80:        the table is cleared and a new set of substrings are generated.
                     81: 
                     82:        This makes the output of compress 3.0 not compatible with that of
                     83:        compress 2.0.  However, compress 3.0 still accepts the output of
                     84:        compress 2.0.  To generate output that is compatible with compress
                     85:        2.0, use the undocumented "-C" flag.
                     86: 
                     87: 2.     A quiet "-q" flag has been added for use by the news system.
                     88: 
                     89: 3.     The character chaining has been deleted and the program now uses
                     90:        hashing.  This improves the speed of the program, especially
                     91:        during decompression.  Other speed improvements have been made,
                     92:        such as using putc() instead of fwrite().
                     93: 
                     94: 4.     A large table is used on large machines when a relatively small
                     95:        number of bits is specified.  This saves much time when compressing
                     96:        for a 16-bit machine on a 32-bit virtual machine.  Note that the
                     97:        speed improvement only occurs when the input file is > 30000
                     98:        characters, and the -b BITS is less than or equal to the cutoff
                     99:        described below.
                    100: 
                    101: Most of these changes were made by James A. Woods (ames!jaw).  Thank you
                    102: James!
                    103: 
                    104: To compile compress:
                    105: 
                    106:        cc -O -DUSERMEM=usermem -o compress compress.c
                    107: 
                    108: Where "usermem" is the amount of physical user memory available (in bytes).  
                    109: If any physical memory is to be reserved for other processes, put in 
                    110: "-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved.
                    111: 
                    112: The difference "usermem-sacredmem" determines the maximum BITS that can be
                    113: specified, and the cutoff bits where the large+fast table is used.
                    114: 
                    115: memory: at least               BITS            cutoff
                    116: ------  -- -----                ----            ------
                    117:    4,718,592                    16               13
                    118:    2,621,440                    16               12
                    119:    1,572,864                    16               11
                    120:    1,048,576                    16               10
                    121:      631,808                    16               --
                    122:      329,728                    15               --
                    123:      178,176                    14               --
                    124:       99,328                    13               --
                    125:            0                    12               --
                    126: 
                    127: The default memory size is 750,000 which gives a maximum BITS=16 and no
                    128: large+fast table.
                    129: 
                    130: The maximum bits can be overruled by specifying "-DBITS=bits" at
                    131: compilation time.
                    132: 
                    133: If your machine doesn't support unsigned characters, define "NO_UCHAR" 
                    134: when compiling.
                    135: 
                    136: If your machine has "int" as 16-bits, define "SHORT_INT" when compiling.
                    137: 
                    138: After compilation, move "compress" to a standard executable location, such 
                    139: as /usr/local.  Then:
                    140:        cd /usr/local
                    141:        ln compress uncompress
                    142:        ln compress zcat
                    143: 
                    144: On machines that have a fixed stack size (such as Perkin-Elmer), set the
                    145: stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
                    146: 
                    147: Next, install the manual (compress.l).
                    148:        cp compress.l /usr/man/manl
                    149:        cd /usr/man/manl
                    150:        ln compress.l uncompress.l
                    151:        ln compress.l zcat.l
                    152: 
                    153:                - or -
                    154: 
                    155:        cp compress.l /usr/man/man1/compress.1
                    156:        cd /usr/man/man1
                    157:        ln compress.1 uncompress.1
                    158:        ln compress.1 zcat.1
                    159: 
                    160:                                        regards,
                    161:                                        petsd!joe
                    162: 
                    163: Here is a note from the net:
                    164: 
                    165: >From hplabs!pesnta!amd!turtlevax!ken Sat Jan  5 03:35:20 1985
                    166: Path: ames!hplabs!pesnta!amd!turtlevax!ken
                    167: From: [email protected] (Ken Turkowski)
                    168: Newsgroups: net.sources
                    169: Subject: Re: Compress release 3.0 : sample Makefile
                    170: Organization: CADLINC, Inc. @ Menlo Park, CA
                    171: 
                    172: In the compress 3.0 source recently posted to mod.sources, there is a
                    173: #define variable which can be set for optimum performance on a machine
                    174: with a large amount of memory.  A program (usermem) to calculate the
                    175: useable amount of physical user memory is enclosed, as well as a sample
                    176: 4.2bsd Vax Makefile for compress.
                    177: 
                    178: Here is the README file from the previous version of compress (2.0):
                    179: 
                    180: >Enclosed is compress.c version 2.0 with the following bugs fixed:
                    181: >
                    182: >1.    The packed files produced by compress are different on different
                    183: >      machines and dependent on the vax sysgen option.
                    184: >              The bug was in the different byte/bit ordering on the
                    185: >              various machines.  This has been fixed.
                    186: >
                    187: >              This version is NOT compatible with the original vax posting
                    188: >              unless the '-DCOMPATIBLE' option is specified to the C
                    189: >              compiler.  The original posting has a bug which I fixed, 
                    190: >              causing incompatible files.  I recommend you NOT to use this
                    191: >              option unless you already have a lot of packed files from
                    192: >              the original posting by thomas.
                    193: >2.    The exit status is not well defined (on some machines) causing the
                    194: >      scripts to fail.
                    195: >              The exit status is now 0,1 or 2 and is documented in
                    196: >              compress.l.
                    197: >3.    The function getopt() is not available in all C libraries.
                    198: >              The function getopt() is no longer referenced by the
                    199: >              program.
                    200: >4.    Error status is not being checked on the fwrite() and fflush() calls.
                    201: >              Fixed.
                    202: >
                    203: >The following enhancements have been made:
                    204: >
                    205: >1.    Added facilities of "compact" into the compress program.  "Pack",
                    206: >      "Unpack", and "Pcat" are no longer required (no longer supplied).
                    207: >2.    Installed work around for C compiler bug with "-O".
                    208: >3.    Added a magic number header (\037\235).  Put the bits specified
                    209: >      in the file.
                    210: >4.    Added "-f" flag to force overwrite of output file.
                    211: >5.    Added "-c" flag and "zcat" program.  'ln compress zcat' after you
                    212: >      compile.
                    213: >6.    The 'uncompress' script has been deleted; simply 
                    214: >      'ln compress uncompress' after you compile and it will work.
                    215: >7.    Removed extra bit masking for machines that support unsigned
                    216: >      characters.  If your machine doesn't support unsigned characters,
                    217: >      define "NO_UCHAR" when compiling.
                    218: >
                    219: >Compile "compress.c" with "-O -o compress" flags.  Move "compress" to a
                    220: >standard executable location, such as /usr/local.  Then:
                    221: >      cd /usr/local
                    222: >      ln compress uncompress
                    223: >      ln compress zcat
                    224: >
                    225: >On machines that have a fixed stack size (such as Perkin-Elmer), set the
                    226: >stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
                    227: >
                    228: >Next, install the manual (compress.l).
                    229: >      cp compress.l /usr/man/manl             - or -
                    230: >      cp compress.l /usr/man/man1/compress.1
                    231: >
                    232: >Here is the README that I sent with my first posting:
                    233: >
                    234: >>Enclosed is a modified version of compress.c, along with scripts to make it
                    235: >>run identically to pack(1), unpack(1), an pcat(1).  Here is what I
                    236: >>(petsd!joe) and a colleague (petsd!peora!srd) did:
                    237: >>
                    238: >>1. Removed VAX dependencies.
                    239: >>2. Changed the struct to separate arrays; saves mucho memory.
                    240: >>3. Did comparisons in unsigned, where possible.  (Faster on Perkin-Elmer.)
                    241: >>4. Sorted the character next chain and changed the search to stop
                    242: >>prematurely.  This saves a lot on the execution time when compressing.
                    243: >>
                    244: >>This version is totally compatible with the original version.  Even though
                    245: >>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit
                    246: >>machine, due to the size of the arrays.
                    247: >>
                    248: >>Here is the README file from the original author:
                    249: >> 
                    250: >>>Well, with all this discussion about file compression (for news batching
                    251: >>>in particular) going around, I decided to implement the text compression
                    252: >>>algorithm described in the June Computer magazine.  The author claimed
                    253: >>>blinding speed and good compression ratios.  It's certainly faster than
                    254: >>>compact (but, then, what wouldn't be), but it's also the same speed as
                    255: >>>pack, and gets better compression than both of them.  On 350K bytes of
                    256: >>>unix-wizards, compact took about 8 minutes of CPU, pack took about 80
                    257: >>>seconds, and compress (herein) also took 80 seconds.  But, compact and
                    258: >>>pack got about 30% compression, whereas compress got over 50%.  So, I
                    259: >>>decided I had something, and that others might be interested, too.
                    260: >>>
                    261: >>>As is probably true of compact and pack (although I haven't checked),
                    262: >>>the byte order within a word is probably relevant here, but as long as
                    263: >>>you stay on a single machine type, you should be ok.  (Can anybody
                    264: >>>elucidate on this?)  There are a couple of asm's in the code (extv and
                    265: >>>insv instructions), so anyone porting it to another machine will have to
                    266: >>>deal with this anyway (and could probably make it compatible with Vax
                    267: >>>byte order at the same time).  Anyway, I've linted the code (both with
                    268: >>>and without -p), so it should run elsewhere.  Note the longs in the
                    269: >>>code, you can take these out if you reduce BITS to <= 15.
                    270: >>>
                    271: >>>Have fun, and as always, if you make good enhancements, or bug fixes,
                    272: >>>I'd like to see them.
                    273: >>>
                    274: >>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas)
                    275: >>
                    276: >>                                     regards,
                    277: >>                                     joe
                    278: >>
                    279: >>--
                    280: >>Full-Name:  Joseph M. Orost
                    281: >>UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe
                    282: >>US Mail:    MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724
                    283: >>Phone:      (201) 870-5844

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.