"ONE to ONE" COMPRESS

This is an attempt to tell any one how to end a huffman compressed file on byte boundaries such that any file can be thought of as a compressed file so that the compression/decompression routines will work for all finite files. First I will define what I think is needed and then I will tell you what I did.
All files are in eight bit bytes. The Huffman table can be any vaild table including dynamic huffman tables. There should between 128 and 513 symbols represented by the table. This is to force a tree path that has at least an 8 bit stream of bits to represent the least likely occuring symbol and to force the shortest path to a leaf from any internal node to a leaf of eight bits or less. I don't count the top node as an internal node. In my code there are exactly 256 leaves all the time. Note the numbers between 128 and 513 were to guarantee the "one to one" nature of the compression decompress routines. A tree as small as 9 leaves could work if there was a path of 8 in it. Also a tree as large as any finite N number of leaves could be such that the shortest path to any leaf from an internal node is 8 or less.

The rules for the current method are as follows ( looking at decompression since like describing inverse of a nonsingular marix it might be better to look at matrix multiply)

Fact one: My huffman tree such that the all zero token is at least 8 bits long.
Fact two: My huffman tree such that the shortest path form any internal node to a leaf never contains a string of more than 8 ones next to each other.
Fact three: A one byte huffman file has no hidden bits and decompresses to one byte since the start tree has 256 leaves all at a lenght of 8 bits.

Rule one: To decompress if the tokens end such that file being decompressed end on a full byte your are done no problem.
Example ...1 11110000 positions labeled 0 to 7 and in position 3 the token S = 10000 is last decoded you have ended decompression last symbol out is a "S".
Rule two: If the last huffman token does not start in the last 7 bits of last byte of compressed file and yet when at end of byte your at an internal node of the huffman tree you supply the missing ones since there are at most 8 ones that where chopped off during the compression of that compressed file.
Example ...1 11110000 the token E = 111100001111 was last token used in making the compressed file and started on bit 0 of the last byte of file so the last symbol out of the decompression is "E" the trailing ones where chopped off.
Rule Three: If the last huffman token does start in the last 7 bits of the last byte and ends at an internal node of the tree but the portion of the token on the last byte of compressed file has at least one or more bits that are the one bit. Then as in rule two you supply the hidden ones that where chopped off the missing file.
Example ...1 11110000 postion 3 the start of last token and in this case X = 100001 so in this case the symbol used in the rule 1 example could not have existed in the current huffman tree so token did not end at file end and the hidden ones would be supplied and last symbol out is "X"
Rule Four: ( hardest rule of all) If the last byte has the last token start in the last 7 bits of the last byte and only contains zeros on that byte . You assume that there are hidden ones that where chopped off during compression. Only if the the file that is one token shorter than this one would have had hidden bits cutoff. This rule means that the all zero start of a last token in the last 7 bits could depend on what would have happened to a the next shorter file and that file status could be a function even farther back in the file. It was the lack of focus on my part that made me miss this recursive rule for a while. And I kept getting more and more complicated specail cases. I hope this solves this.
Example ...1 11110000 M =000011 starts in position 4 but previous token was F = 11111 so since if file shorter by one token cut off would have occurred there fore the M the last appearant token is used and is the last charater out..
Rule Five: The only case left is the case where a token starts some where in the last 7 postions of the last byte and contains all zero bits in its positions of that last byte. And that the previous file had it ended on the previous token would not have ones cut off during compression. In this case and only this case the 1 to 7 trailing zeros which are not a complete symbol mark the end of file and are not used to form a new character for the output file.
Example ...1 11110000 M=0011and starts in position 6 but the existance of this M not important since F = 1100 is previous token and no cutoff would have occured so the 0000 at end tells us that the F was the actual last character in the uncompressed file.

here they are ASCII ARMOR 10 NEW NO PAGE YET

here they are ASCII ARMOR 26 NEW NO PAGE YET

David's View of BWTS (Burrows Wheeler Transform Scottifed)
updated jun 11,2008

ENTER here LINK TO MY ENCRYPTION PAGES

ENTER here MY NEW STYLE BIJECTIVE ARITHMETIC CODING
updated july 28,2004

ENTER here MY NEW STYLE BIJECTIVE REVERSE ARITHMETIC CODING
updated August 10,2004

ENTER here SCOTT STYLE BIEJCTIVE LZW COMPRESSION

ENTER here VITTER STYLE ADAPTIVE HUFFMAN COMPRESSOR

ENTER here FIRST ORDER BIJECTIVE STATIC ENGLISH COMPRESSOR

ENTER here SECOND ORDER BIJECTIVE STATIC ENGLISH COMPRESSOR

OUTDATED:MODED FAST RUSSIAN HUFFMAN CODE
this works on my machine but not on everyones I plan to update this by moding his current version some day

ENTER here THE BIJECTIVE COMPRESSION PROGRAM THAT STARTED IT ALL

ENTER here BWT COMPRESSION FOR ENCRYPTION

ENTER here for dicussion of the "DSC" The data sensitive combiner

ENTER here for alternate ''one to one'' HUFFMAN CODING

ENTER here for my best''one to one'' conditioned HUFFMAN CODING

OUTDATED: ENTER here for conditioned HUFFMAN CODING

OUTDATED: ENTER here for ''one to one'' conditioned HUFFMAN CODING

ENTER here for "one to one" FINITELY ODD Transforms and "one to one adaptive Huffman Compression

UNADULTERATED RLE COMPRESSION page is here

ENTER here for rejected on first pass ACM paper

OTHER PEOPLE WEB SITES THAT UNDERSTAND HOW COMPRESSION SHOULD BE DONE IF ONE IS ENCYPTING AFTER COMPRESSION

ENTER here for Tim Tyler's site

ENTER here for Matt Timmermans site

David A. Scott, biject.bwts@gmail.com Locations of visitors to this page