Annotation of researchv9/cmd/compress/README3.0, revision 1.1.1.1

1.1       root        1: Enclosed is compress version 3.0 with the following changes:
                      2: 
                      3: 1.     "Block" compression is performed.  After the BITS run out, the
                      4:        compression ratio is checked every so often.  If it is decreasing,
                      5:        the table is cleared and a new set of substrings are generated.
                      6: 
                      7:        This makes the output of compress 3.0 not compatable with that of
                      8:        compress 2.0.  However, compress 3.0 still accepts the output of
                      9:        compress 2.0.  To generate output that is compatable with compress
                     10:        2.0, use the undocumented "-C" flag.
                     11: 
                     12: 2.     A quiet "-q" flag has been added for use by the news system.
                     13: 
                     14: 3.     The character chaining has been deleted and the program now uses
                     15:        hashing.  This improves the speed of the program, especially
                     16:        during decompression.  Other speed improvements have been made,
                     17:        such as using putc() instead of fwrite().
                     18: 
                     19: 4.     A large table is used on large machines when a relatively small
                     20:        number of bits is specified.  This saves much time when compressing
                     21:        for a 16-bit machine on a 32-bit virtual machine.  Note that the
                     22:        speed improvement only occurs when the input file is > 30000
                     23:        characters, and the -b BITS is less than or equal to the cutoff
                     24:        described below.
                     25: 
                     26: Most of these changes were made by James A. Woods (ames!jaw).  Thank you
                     27: James!
                     28: 
                     29: Version 3.0 has been beta tested on many machines.
                     30: 
                     31: To compile compress:
                     32: 
                     33:        cc -O -DUSERMEM=usermem -o compress compress.c
                     34: 
                     35: Where "usermem" is the amount of physical user memory available (in bytes).  
                     36: If any physical memory is to be reserved for other processes, put in 
                     37: "-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved.
                     38: 
                     39: The difference "usermem-sacredmem" determines the maximum BITS that can be
                     40: specified, and the cutoff bits where the large+fast table is used.
                     41: 
                     42: memory: at least               BITS            cutoff
                     43: ------  -- -----                ----            ------
                     44:    4,718,592                    16               13
                     45:    2,621,440                    16               12
                     46:    1,572,864                    16               11
                     47:    1,048,576                    16               10
                     48:      631,808                    16               --
                     49:      329,728                    15               --
                     50:      178,176                    14               --
                     51:       99,328                    13               --
                     52:            0                    12               --
                     53: 
                     54: The default memory size is 750,000 which gives a maximum BITS=16 and no
                     55: large+fast table.
                     56: 
                     57: The maximum bits can be overrulled by specifying "-DBITS=bits" at
                     58: compilation time.
                     59: 
                     60: If your machine doesn't support unsigned characters, define "NO_UCHAR" 
                     61: when compiling.
                     62: 
                     63: If your machine has "int" as 16-bits, define "SHORT_INT" when compiling.
                     64: 
                     65: After compilation, move "compress" to a standard executable location, such 
                     66: as /usr/local.  Then:
                     67:        cd /usr/local
                     68:        ln compress uncompress
                     69:        ln compress zcat
                     70: 
                     71: On machines that have a fixed stack size (such as Perkin-Elmer), set the
                     72: stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
                     73: 
                     74: Next, install the manual (compress.l).
                     75:        cp compress.l /usr/man/manl
                     76:        cd /usr/man/manl
                     77:        ln compress.l uncompress.l
                     78:        ln compress.l zcat.l
                     79: 
                     80:                - or -
                     81: 
                     82:        cp compress.l /usr/man/man1/compress.1
                     83:        cd /usr/man/man1
                     84:        ln compress.1 uncompress.1
                     85:        ln compress.1 zcat.1
                     86: 
                     87: The zmore shell script and manual page are for use on systems that have a
                     88: "more(1)" program.  Install the shell script and the manual page in a "bin"
                     89: and "man" directory, respectively.  If your system doesn't have the
                     90: "more(1)" program, just skip "zmore".
                     91: 
                     92:                                        regards,
                     93:                                        petsd!joe
                     94: 
                     95: Here is the README file from the previous version of compress (2.0):
                     96: 
                     97: >Enclosed is compress.c version 2.0 with the following bugs fixed:
                     98: >
                     99: >1.    The packed files produced by compress are different on different
                    100: >      machines and dependent on the vax sysgen option.
                    101: >              The bug was in the different byte/bit ordering on the
                    102: >              various machines.  This has been fixed.
                    103: >
                    104: >              This version is NOT compatible with the original vax posting
                    105: >              unless the '-DCOMPATIBLE' option is specified to the C
                    106: >              compiler.  The original posting has a bug which I fixed, 
                    107: >              causing incompatible files.  I recommend you NOT to use this
                    108: >              option unless you already have a lot of packed files from
                    109: >              the original posting by thomas.
                    110: >2.    The exit status is not well defined (on some machines) causing the
                    111: >      scripts to fail.
                    112: >              The exit status is now 0,1 or 2 and is documented in
                    113: >              compress.l.
                    114: >3.    The function getopt() is not available in all C libraries.
                    115: >              The function getopt() is no longer referenced by the
                    116: >              program.
                    117: >4.    Error status is not being checked on the fwrite() and fflush() calls.
                    118: >              Fixed.
                    119: >
                    120: >The following enhancements have been made:
                    121: >
                    122: >1.    Added facilities of "compact" into the compress program.  "Pack",
                    123: >      "Unpack", and "Pcat" are no longer required (no longer supplied).
                    124: >2.    Installed work around for C compiler bug with "-O".
                    125: >3.    Added a magic number header (\037\235).  Put the bits specified
                    126: >      in the file.
                    127: >4.    Added "-f" flag to force overwrite of output file.
                    128: >5.    Added "-c" flag and "zcat" program.  'ln compress zcat' after you
                    129: >      compile.
                    130: >6.    The 'uncompress' script has been deleted; simply 
                    131: >      'ln compress uncompress' after you compile and it will work.
                    132: >7.    Removed extra bit masking for machines that support unsigned
                    133: >      characters.  If your machine doesn't support unsigned characters,
                    134: >      define "NO_UCHAR" when compiling.
                    135: >
                    136: >Compile "compress.c" with "-O -o compress" flags.  Move "compress" to a
                    137: >standard executable location, such as /usr/local.  Then:
                    138: >      cd /usr/local
                    139: >      ln compress uncompress
                    140: >      ln compress zcat
                    141: >
                    142: >On machines that have a fixed stack size (such as Perkin-Elmer), set the
                    143: >stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
                    144: >
                    145: >Next, install the manual (compress.l).
                    146: >      cp compress.l /usr/man/manl             - or -
                    147: >      cp compress.l /usr/man/man1/compress.1
                    148: >
                    149: >Here is the README that I sent with my first posting:
                    150: >
                    151: >>Enclosed is a modified version of compress.c, along with scripts to make it
                    152: >>run identically to pack(1), unpack(1), an pcat(1).  Here is what I
                    153: >>(petsd!joe) and a colleague (petsd!peora!srd) did:
                    154: >>
                    155: >>1. Removed VAX dependencies.
                    156: >>2. Changed the struct to separate arrays; saves mucho memory.
                    157: >>3. Did comparisons in unsigned, where possible.  (Faster on Perkin-Elmer.)
                    158: >>4. Sorted the character next chain and changed the search to stop
                    159: >>prematurely.  This saves a lot on the execution time when compressing.
                    160: >>
                    161: >>This version is totally compatible with the original version.  Even though
                    162: >>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit
                    163: >>machine, due to the size of the arrays.
                    164: >>
                    165: >>Here is the README file from the original author:
                    166: >> 
                    167: >>>Well, with all this discussion about file compression (for news batching
                    168: >>>in particular) going around, I decided to implement the text compression
                    169: >>>algorithm described in the June Computer magazine.  The author claimed
                    170: >>>blinding speed and good compression ratios.  It's certainly faster than
                    171: >>>compact (but, then, what wouldn't be), but it's also the same speed as
                    172: >>>pack, and gets better compression than both of them.  On 350K bytes of
                    173: >>>unix-wizards, compact took about 8 minutes of CPU, pack took about 80
                    174: >>>seconds, and compress (herein) also took 80 seconds.  But, compact and
                    175: >>>pack got about 30% compression, whereas compress got over 50%.  So, I
                    176: >>>decided I had something, and that others might be interested, too.
                    177: >>>
                    178: >>>As is probably true of compact and pack (although I haven't checked),
                    179: >>>the byte order within a word is probably relevant here, but as long as
                    180: >>>you stay on a single machine type, you should be ok.  (Can anybody
                    181: >>>elucidate on this?)  There are a couple of asm's in the code (extv and
                    182: >>>insv instructions), so anyone porting it to another machine will have to
                    183: >>>deal with this anyway (and could probably make it compatible with Vax
                    184: >>>byte order at the same time).  Anyway, I've linted the code (both with
                    185: >>>and without -p), so it should run elsewhere.  Note the longs in the
                    186: >>>code, you can take these out if you reduce BITS to <= 15.
                    187: >>>
                    188: >>>Have fun, and as always, if you make good enhancements, or bug fixes,
                    189: >>>I'd like to see them.
                    190: >>>
                    191: >>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas)
                    192: >>
                    193: >>                                     regards,
                    194: >>                                     joe
                    195: >>
                    196: >>--
                    197: >>Full-Name:  Joseph M. Orost
                    198: >>UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe
                    199: >>US Mail:    MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724
                    200: >>Phone:      (201) 870-5844

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.