David Scott's BIJECTIVE LZW COMPRESSION METHODS

THE STORY

files updated on December 26,2002
The first Basic Binary Two Root Bijective LZW file and/or string compressor called LZW1



files updated on December 2003
The next Basic Binary Two Root Bijective LZW file and/or string compressori called LZW3S



THIS PAGE UPDATED DEC 2003 This Page will discuss several bijective LZW style compressors. However it will take time to put then all up so sarting with just the basic one this chirstmas and add other in the next few days or weeks.



LZW1

This is a bijective LZW with only 2 Roots 0 and 1 so its slow since each occurence of a binary string causes two new strings to occur in the table. The old string + "1" and the old string + "0". There is more than one way to make a LZW bijective this was the way I chose to do it However I will have others that use a different method. The reason is that while building the tables you can really only use a node once. Example if "010" with a value "12" and a "1" follows you create "0101" with next availible value say "50" in the old method you leave "010" as "12" but you really never need it again so I would create a string "0100" as the new "12". This changes allows it to build two new strings each time a string occurs until table filled. I allowed 2**17 values for the strings (actaully I assign the new string that occured the old value of previous string and other one gets new value) this compresses the "Calagary Corpus" to 1,997,265 bytes. If you change to larger table size it compresses smaller. Since on some of the files the table gets filled up. I experimented with the large file Patrick used in big compression challenge. The biggest table where my code run was 2**22 Its too big for my machine after that. Each time the so called random file tested the bigger the table the smaller the expansion. Note I usually test the "Calagary Corpus" files by first compressing then then decompress result to see if if matchers the original. And then I do a pass to check bijectivity by decompressing files and then see if they come back to original. In many cases the files expand rather dramaticlly and I could not run this test with all files since it crashes machine on large files so be careful especailly decompressing with long strings of ones or zeros or any small repeated pattern.

**NEW** 2002 dec 26 changed 3 lines to cause a "*" to appear when compressing for each 8000 bits out.

You can compress any file directly by
LZW1 -c file.in file.out
to decompress change "-c" to "-d"
you can also use this to compress strings if you wish.
Example a file X of 6 ascii 0's :000000
ozb X Y
LZW1 -c Y X
bzo X Z
would create a file X of 4 ascii 0's :0000
if you used 5 zeros instead :00000
it would compress to :001
**NOTE**

ozb only runs til a non "0" or "1" is read.

USE AT YOUR OWN RISK IN A DIECTORY BY ITSELF

ENJOY PURVEYOR OF FINE BIJECTIVE COMPRESSION CODE.

Merry Christmas 2002 and a Happy New Year 2003 DAVID A. SCOTT



LZW3S

What follows it the readme file of LZW3s more to follow later



   *** LZW3S created by David A. Scott Dec 2003 ***
This is updated version of LZW1.exe calling it
LZW3S.exe
It is much slower then LZW1 it is also a bijective
2 root LZW type of compressor. However it compresses
each of the 18 file "Calagary Corpus" files set to a
smaller size than LZW1 even though it using smaller
table space. The compressed size is 1,983,894 bytes.
LZW1 compresses the "Calagary Corpus" to 1,997,265
bytes.

It also does not blow up on uncompressing common
repetive files which caused LZW1 to crash. In fact
all the Calgary Corpus files decompressed to less
space than the previous version. This was achieved
by changing various leaf nodes on the fly during
compression and decompression along with a psuedo
random dithering. And the two added processes occur
even when table space filled.
 The best point is that it could be used as a last
pass in an ecncryption process before encryption
takes place.
Example: to compress and encrypt with full AES bijectively

LZW3s -c filein temp1
BICOM -d temp1 temp2
BICOM -p password fileout

filein input file to be compressed with LZW3s and encrypted
  with AES
BICOM is by Matt Timmermans and can be found on the internet
password is just what it says
fileout is the compressed encrypted output file.

The reverse would be:

BICOM -d -p password fileout tempa
BICOM tempa tempb
LZW3s -c tempb fileoutt

note file:
tempa same as temp2
tempb same as temp1
fileoutt same as filein

  There are many many ways to modify
the above and still get bijective file
encryption using full size AES

 Another useful feature is whole file Whittening
before encryption. This means data is spread through
out the whole file you could replace the file
with whittened file it may be shorter or longer than
original.
 

To whitten a file:
LZW3s -d filein temp1
reverse temp1 temp2
LZW3s -c temp2 fileout

where reverse is my reverse at 
http://biejctive.dogma.net/
There is a link to Matts page there.

To unwhitten the file just do
same procedure again using previous
output file as input.

 The following are the LZW3s compressed lengths
BIB      LZ3        79,717 
BOOK1    LZ3       495,821 
BOOK2    LZ3       411,866
GEO      LZ3        77,163 
NEWS     LZ3       279,981  
OBJ1     LZ3        17,069  
OBJ2     LZ3       181,076 
PAPER1   LZ3        42,058 
PAPER2   LZ3        60,013  
PAPER3   LZ3        35,784  
PAPER4   LZ3        11,080  
PAPER5   LZ3        10,226  
PAPER6   LZ3        30,884  
PIC      LZ3        60,978  
PROGC    LZ3        32,493  
PROGL    LZ3        51,287  
PROGP    LZ3        36,720  
TRANS    LZ3        69,678  
        18 file(s)      1,983,894 bytes
**Note all the above uncompress back to the originals

 The following are the LZW3s uncompressed lengths
BIB      LZB       144,185  
BOOK1    LZB     1,014,293  
BOOK2    LZB       817,057  
GEO      LZB       140,224  
NEWS     LZB       486,726  
OBJ1     LZB        28,411  
OBJ2     LZB       336,047  
PAPER1   LZB        67,720  
PAPER2   LZB       106,478  
PAPER3   LZB        59,997  
PAPER4   LZB        16,912  
PAPER5   LZB        15,306  
PAPER6   LZB        48,360  
PIC      LZB       659,262  
PROGC    LZB        54,159  
PROGL    LZB       101,121  
PROGP    LZB        63,736  
TRANS    LZB       124,268  
        18 file(s)      4,284,262 bytes
**Note all the above compress back to the originals

THE ORIGINALS
BIB                111,261  
BOOK1              768,771  
BOOK2              610,856  
GEO                102,400  
NEWS               377,109  
OBJ1                21,504  
OBJ2               246,814  
PAPER1              53,161  
PAPER2              82,199  
PAPER3              46,526  
PAPER4              13,286  
PAPER5              11,954  
PAPER6              38,105  
PIC                513,216  
PROGC               39,611  
PROGL               71,646  
PROGP               49,379  
TRANS               93,695  
        18 file(s)      3,251,493 bytes

**USE AT YOUR OWN RISK**
place in a seperate directory and test
if crital stuff after you compress make
sure the uncompress goes back to same file
use only as

** to compreess
LZW3s.exe -c filein fileout
** to uncompress
LZW3s.exe -d filein fileout
-c could be -cd compress and dump verbose
-d could be -dd uncompress and dump verbose

**Note all suff on from a PC using DJGPP port of
GNU C

HAVE A MERRY CHRISMAS
DAVID A. SCOTT




ENTER here for MY Home Page