Annotation of 43BSD/ucb/compress/README, revision 1.1

1.1     ! root        1: 
        !             2:        @(#)README      5.3 (Berkeley) 9/17/85
        !             3: 
        !             4: Compress version 4.0 improvements over 3.0:
        !             5:        o compress() speedup (10-50%) by changing division hash to xor
        !             6:        o decompress() speedup (5-10%)
        !             7:        o Memory requirements reduced (3-30%)
        !             8:        o Stack requirements reduced to less than 4kb
        !             9:        o Removed 'Big+Fast' compress code (FBITS) because of compress speedup
        !            10:        o Portability mods for Z8000 and PC/XT (but not zeus 3.2)
        !            11:        o Default to 'quiet' mode
        !            12:        o Unification of 'force' flags
        !            13:        o Manual page overhaul
        !            14:        o Portability enhancement for M_XENIX
        !            15:        o Removed text on #else and #endif
        !            16:        o Added "-V" switch to print version and options
        !            17:        o Added #defines for SIGNED_COMPARE_SLOW
        !            18:        o Added Makefile and "usermem" program
        !            19:        o Removed all floating point computations
        !            20:        o New programs: [deleted]
        !            21: 
        !            22: The "usermem" script attempts to determine the maximum process size.  Some
        !            23: editing of the script may be necessary (see the comments).  [It should work
        !            24: fine on 4.3 bsd.] If you can't get it to work at all, just create file
        !            25: "USERMEM" containing the maximum process size in decimal.
        !            26: 
        !            27: The following preprocessor symbols control the compilation of "compress.c":
        !            28: 
        !            29:        o USERMEM               Maximum process memory on the system
        !            30:        o SACREDMEM             Amount to reserve for other proceses
        !            31:        o SIGNED_COMPARE_SLOW   Unsigned compare instructions are faster
        !            32:        o NO_UCHAR              Don't use "unsigned char" types
        !            33:        o BITS                  Overrules default set by USERMEM-SACREDMEM
        !            34:        o vax                   Generate inline assembler
        !            35:        o interdata             Defines SIGNED_COMPARE_SLOW
        !            36:        o M_XENIX               Makes arrays < 65536 bytes each
        !            37:        o pdp11                 BITS=12, NO_UCHAR
        !            38:        o z8000                 BITS=12
        !            39:        o pcxt                  BITS=12
        !            40:        o BSD4_2                Allow long filenames ( > 14 characters) &
        !            41:                                Call setlinebuf(stderr)
        !            42: 
        !            43: The difference "usermem-sacredmem" determines the maximum BITS that can be
        !            44: specified with the "-b" flag.
        !            45: 
        !            46: memory: at least               BITS
        !            47: ------  -- -----                ----
        !            48:      433,484                    16
        !            49:      229,600                    15
        !            50:      127,536                    14
        !            51:       73,464                    13
        !            52:            0                    12
        !            53: 
        !            54: The default is BITS=16.
        !            55: 
        !            56: The maximum bits can be overrulled by specifying "-DBITS=bits" at
        !            57: compilation time.
        !            58: 
        !            59: WARNING: files compressed on a large machine with more bits than allowed by 
        !            60: a version of compress on a smaller machine cannot be decompressed!  Use the
        !            61: "-b12" flag to generate a file on a large machine that can be uncompressed 
        !            62: on a 16-bit machine.
        !            63: 
        !            64: The output of compress 4.0 is fully compatible with that of compress 3.0.
        !            65: In other words, the output of compress 4.0 may be fed into uncompress 3.0 or
        !            66: the output of compress 3.0 may be fed into uncompress 4.0.
        !            67: 
        !            68: The output of compress 4.0 not compatible with that of
        !            69: compress 2.0.  However, compress 4.0 still accepts the output of
        !            70: compress 2.0.  To generate output that is compatible with compress
        !            71: 2.0, use the undocumented "-C" flag.
        !            72: 
        !            73:        -from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85
        !            74: --------------------------------
        !            75: 
        !            76: Enclosed is compress version 3.0 with the following changes:
        !            77: 
        !            78: 1.     "Block" compression is performed.  After the BITS run out, the
        !            79:        compression ratio is checked every so often.  If it is decreasing,
        !            80:        the table is cleared and a new set of substrings are generated.
        !            81: 
        !            82:        This makes the output of compress 3.0 not compatible with that of
        !            83:        compress 2.0.  However, compress 3.0 still accepts the output of
        !            84:        compress 2.0.  To generate output that is compatible with compress
        !            85:        2.0, use the undocumented "-C" flag.
        !            86: 
        !            87: 2.     A quiet "-q" flag has been added for use by the news system.
        !            88: 
        !            89: 3.     The character chaining has been deleted and the program now uses
        !            90:        hashing.  This improves the speed of the program, especially
        !            91:        during decompression.  Other speed improvements have been made,
        !            92:        such as using putc() instead of fwrite().
        !            93: 
        !            94: 4.     A large table is used on large machines when a relatively small
        !            95:        number of bits is specified.  This saves much time when compressing
        !            96:        for a 16-bit machine on a 32-bit virtual machine.  Note that the
        !            97:        speed improvement only occurs when the input file is > 30000
        !            98:        characters, and the -b BITS is less than or equal to the cutoff
        !            99:        described below.
        !           100: 
        !           101: Most of these changes were made by James A. Woods (ames!jaw).  Thank you
        !           102: James!
        !           103: 
        !           104: To compile compress:
        !           105: 
        !           106:        cc -O -DUSERMEM=usermem -o compress compress.c
        !           107: 
        !           108: Where "usermem" is the amount of physical user memory available (in bytes).  
        !           109: If any physical memory is to be reserved for other processes, put in 
        !           110: "-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved.
        !           111: 
        !           112: The difference "usermem-sacredmem" determines the maximum BITS that can be
        !           113: specified, and the cutoff bits where the large+fast table is used.
        !           114: 
        !           115: memory: at least               BITS            cutoff
        !           116: ------  -- -----                ----            ------
        !           117:    4,718,592                    16               13
        !           118:    2,621,440                    16               12
        !           119:    1,572,864                    16               11
        !           120:    1,048,576                    16               10
        !           121:      631,808                    16               --
        !           122:      329,728                    15               --
        !           123:      178,176                    14               --
        !           124:       99,328                    13               --
        !           125:            0                    12               --
        !           126: 
        !           127: The default memory size is 750,000 which gives a maximum BITS=16 and no
        !           128: large+fast table.
        !           129: 
        !           130: The maximum bits can be overruled by specifying "-DBITS=bits" at
        !           131: compilation time.
        !           132: 
        !           133: If your machine doesn't support unsigned characters, define "NO_UCHAR" 
        !           134: when compiling.
        !           135: 
        !           136: If your machine has "int" as 16-bits, define "SHORT_INT" when compiling.
        !           137: 
        !           138: After compilation, move "compress" to a standard executable location, such 
        !           139: as /usr/local.  Then:
        !           140:        cd /usr/local
        !           141:        ln compress uncompress
        !           142:        ln compress zcat
        !           143: 
        !           144: On machines that have a fixed stack size (such as Perkin-Elmer), set the
        !           145: stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
        !           146: 
        !           147: Next, install the manual (compress.l).
        !           148:        cp compress.l /usr/man/manl
        !           149:        cd /usr/man/manl
        !           150:        ln compress.l uncompress.l
        !           151:        ln compress.l zcat.l
        !           152: 
        !           153:                - or -
        !           154: 
        !           155:        cp compress.l /usr/man/man1/compress.1
        !           156:        cd /usr/man/man1
        !           157:        ln compress.1 uncompress.1
        !           158:        ln compress.1 zcat.1
        !           159: 
        !           160:                                        regards,
        !           161:                                        petsd!joe
        !           162: 
        !           163: Here is a note from the net:
        !           164: 
        !           165: >From hplabs!pesnta!amd!turtlevax!ken Sat Jan  5 03:35:20 1985
        !           166: Path: ames!hplabs!pesnta!amd!turtlevax!ken
        !           167: From: [email protected] (Ken Turkowski)
        !           168: Newsgroups: net.sources
        !           169: Subject: Re: Compress release 3.0 : sample Makefile
        !           170: Organization: CADLINC, Inc. @ Menlo Park, CA
        !           171: 
        !           172: In the compress 3.0 source recently posted to mod.sources, there is a
        !           173: #define variable which can be set for optimum performance on a machine
        !           174: with a large amount of memory.  A program (usermem) to calculate the
        !           175: useable amount of physical user memory is enclosed, as well as a sample
        !           176: 4.2bsd Vax Makefile for compress.
        !           177: 
        !           178: Here is the README file from the previous version of compress (2.0):
        !           179: 
        !           180: >Enclosed is compress.c version 2.0 with the following bugs fixed:
        !           181: >
        !           182: >1.    The packed files produced by compress are different on different
        !           183: >      machines and dependent on the vax sysgen option.
        !           184: >              The bug was in the different byte/bit ordering on the
        !           185: >              various machines.  This has been fixed.
        !           186: >
        !           187: >              This version is NOT compatible with the original vax posting
        !           188: >              unless the '-DCOMPATIBLE' option is specified to the C
        !           189: >              compiler.  The original posting has a bug which I fixed, 
        !           190: >              causing incompatible files.  I recommend you NOT to use this
        !           191: >              option unless you already have a lot of packed files from
        !           192: >              the original posting by thomas.
        !           193: >2.    The exit status is not well defined (on some machines) causing the
        !           194: >      scripts to fail.
        !           195: >              The exit status is now 0,1 or 2 and is documented in
        !           196: >              compress.l.
        !           197: >3.    The function getopt() is not available in all C libraries.
        !           198: >              The function getopt() is no longer referenced by the
        !           199: >              program.
        !           200: >4.    Error status is not being checked on the fwrite() and fflush() calls.
        !           201: >              Fixed.
        !           202: >
        !           203: >The following enhancements have been made:
        !           204: >
        !           205: >1.    Added facilities of "compact" into the compress program.  "Pack",
        !           206: >      "Unpack", and "Pcat" are no longer required (no longer supplied).
        !           207: >2.    Installed work around for C compiler bug with "-O".
        !           208: >3.    Added a magic number header (\037\235).  Put the bits specified
        !           209: >      in the file.
        !           210: >4.    Added "-f" flag to force overwrite of output file.
        !           211: >5.    Added "-c" flag and "zcat" program.  'ln compress zcat' after you
        !           212: >      compile.
        !           213: >6.    The 'uncompress' script has been deleted; simply 
        !           214: >      'ln compress uncompress' after you compile and it will work.
        !           215: >7.    Removed extra bit masking for machines that support unsigned
        !           216: >      characters.  If your machine doesn't support unsigned characters,
        !           217: >      define "NO_UCHAR" when compiling.
        !           218: >
        !           219: >Compile "compress.c" with "-O -o compress" flags.  Move "compress" to a
        !           220: >standard executable location, such as /usr/local.  Then:
        !           221: >      cd /usr/local
        !           222: >      ln compress uncompress
        !           223: >      ln compress zcat
        !           224: >
        !           225: >On machines that have a fixed stack size (such as Perkin-Elmer), set the
        !           226: >stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
        !           227: >
        !           228: >Next, install the manual (compress.l).
        !           229: >      cp compress.l /usr/man/manl             - or -
        !           230: >      cp compress.l /usr/man/man1/compress.1
        !           231: >
        !           232: >Here is the README that I sent with my first posting:
        !           233: >
        !           234: >>Enclosed is a modified version of compress.c, along with scripts to make it
        !           235: >>run identically to pack(1), unpack(1), an pcat(1).  Here is what I
        !           236: >>(petsd!joe) and a colleague (petsd!peora!srd) did:
        !           237: >>
        !           238: >>1. Removed VAX dependencies.
        !           239: >>2. Changed the struct to separate arrays; saves mucho memory.
        !           240: >>3. Did comparisons in unsigned, where possible.  (Faster on Perkin-Elmer.)
        !           241: >>4. Sorted the character next chain and changed the search to stop
        !           242: >>prematurely.  This saves a lot on the execution time when compressing.
        !           243: >>
        !           244: >>This version is totally compatible with the original version.  Even though
        !           245: >>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit
        !           246: >>machine, due to the size of the arrays.
        !           247: >>
        !           248: >>Here is the README file from the original author:
        !           249: >> 
        !           250: >>>Well, with all this discussion about file compression (for news batching
        !           251: >>>in particular) going around, I decided to implement the text compression
        !           252: >>>algorithm described in the June Computer magazine.  The author claimed
        !           253: >>>blinding speed and good compression ratios.  It's certainly faster than
        !           254: >>>compact (but, then, what wouldn't be), but it's also the same speed as
        !           255: >>>pack, and gets better compression than both of them.  On 350K bytes of
        !           256: >>>unix-wizards, compact took about 8 minutes of CPU, pack took about 80
        !           257: >>>seconds, and compress (herein) also took 80 seconds.  But, compact and
        !           258: >>>pack got about 30% compression, whereas compress got over 50%.  So, I
        !           259: >>>decided I had something, and that others might be interested, too.
        !           260: >>>
        !           261: >>>As is probably true of compact and pack (although I haven't checked),
        !           262: >>>the byte order within a word is probably relevant here, but as long as
        !           263: >>>you stay on a single machine type, you should be ok.  (Can anybody
        !           264: >>>elucidate on this?)  There are a couple of asm's in the code (extv and
        !           265: >>>insv instructions), so anyone porting it to another machine will have to
        !           266: >>>deal with this anyway (and could probably make it compatible with Vax
        !           267: >>>byte order at the same time).  Anyway, I've linted the code (both with
        !           268: >>>and without -p), so it should run elsewhere.  Note the longs in the
        !           269: >>>code, you can take these out if you reduce BITS to <= 15.
        !           270: >>>
        !           271: >>>Have fun, and as always, if you make good enhancements, or bug fixes,
        !           272: >>>I'd like to see them.
        !           273: >>>
        !           274: >>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas)
        !           275: >>
        !           276: >>                                     regards,
        !           277: >>                                     joe
        !           278: >>
        !           279: >>--
        !           280: >>Full-Name:  Joseph M. Orost
        !           281: >>UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe
        !           282: >>US Mail:    MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724
        !           283: >>Phone:      (201) 870-5844

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.