# David Scott's FOF ONE TO ONE HUFFMAN COMPRESSION

files updated on December 20,1999

### My main concern is to create compression routines that are of great use to the person who some day may want to compress data before encryption is used. I feel very strongly that a compression routine should be "one to one" on the previous page I give the rules for h2com.exe; However this was for 256 symbol adaptive huffman compression. By FOF ONE TO ONE HUFFMAN COMPRESSION I mean using the Finitely Odd File format to an 8 bit byte transform. It is actaully very easy to do huffman compression and end the data stream in such a way that if one does not need worry about byte boundaries. It is rather easy to make it into a finitely odd bit stream and then use the Transforms to change it to files. Here is the trick to do it. And I wish to thank Matt Timmermans for his version of bijective arithmetic compression what follows is how would would do something similar in huffman compression. The correct way to do huffman compression and to waste not bits and make it a finitely odd bit stream ( means that at least one bit set and last bit that counts for anything is a 1). Take any huffman table but best to assign it in an almost normal form. Where the all zeros is on the right and is the lowest frequently occuring symbol and the all ones symbol is the highest frequency occurring symbol. normal the highest probability ones occur on the left and lowest occur on the right. All ones paths on tree come down on left and zero for the right paths. Here is the symbols results for an example tree: A = 11 B = 10 C = 011 D = 010 E = 001 F = 000 Expected probability of A > B > C > D > E >F for this example The Above is a stand tree lets assume static compression and you have sent the static table to a friend but you want to compress the file to 8 bit bytes files in a One to One way (bijective) You also are using the same table for this example The first thing one should do to save space is reoder the internal values that are pairs that don't include the all Zeroes symbol or the all Ones symbol. In a special way in this case there is only one internal pair C and D. These are an internal pair due to the fact they have the same path through the tree except the last split and do not contain the all zeroes (F) or all ones (A) symbol. To check if C and D should change places. Make sure that the higher probability one has a short short path in the tree. What do I mean by short path. I mean go down the tree and eliminte the trailing ones or zeros and see which has the shortest path in this Case the C has a short path of the single bit 0 while D has the short path 01 and since C occurs more often than D these two don't have to switch positions in the tree. Every symbol but the All zeros and all ones have a short path. We code the last symbol if it is the all zero symbol as all zeros no change. But for the internal symbols we only use the huffman code until there would be a tail of opposite zero-one-ness. If we don't stop at an internal node of which there is the same number as internal symbols -2 ( minus two because of the all zero and all one symbol) Then the last symbol is the all one symbol and no space is wasted. If we get to the leaf of the all ones symbol then it counts as a extra symbol if preceeding the series of the all ones token there is the start of file or any token other than the all zeros symbol. Same example as above Internal huffman values meaning another symbol is after it A = 11 B= 10 C = 011 D = 010 E = 001 F = 000 Now the table if it is the last symbol in the compressed file A = * special if F in front then 11 or groups of 11 count for each A A = * if not F in front than there is one more A than sets of 11 also the empty symbol at start means one A. We use the nothingness to indicate a lone A being compressed B= 1 C = 0 D = 01 E = 00 F = 000 note the all zero symbol has no changes To confuse you more lets add the symbol 1 as the last symbol in the file to make it a finitely odd file and go from starting up in binary. Note in this case since A nothing if only character compressed file then you would have the lone 1 1 means an A 0 1 means a C 1 1 means a B 00 1 means E 01 1 means D 10 1 means BA 11 1 means AA 000 1 means F 001 1 means EA 010 1 means DA 011 1 means CA 100 1 means BE 101 1 means BB 110 1 means AC 111 1 means AB .... 1111111 1 means AAAB 00000000 1 means FFE Now lets convert this to a 8 bit file using method to convert a finitely odd file to 8 bit byte files. This allows a person to use real files so that every possible value uniquely maps to a set of symbols in a bijective way so that infomation is not wasted as in most compression programs 00000000 is A 10000000 is C 01000000 is B 11000000 is E 10100000 is D 00100000 is BA 01100000 is AA 11100000 is F 11010000 is EA 10010000 is DA 10110000 is CA 00010000 is BE 00110000 is BB 01010000 is AC 01110000 is AB ... 01111111 is AAAB 11111111 is FFE Below is a the file showing some results converting to and from finitely odd files. Here is int2fin.exe, fin2int.exe these routine can be used to change bases in a "one to one" way. Of course you don't need this if using scott16u or scott19u since they use any byte multiple. At this point you may want the compression routines that go to from FOF files here they are

```

This is set of files to convert to and from finitely odd files.
A finitely  odd file can be thought of as a fraction that is .1
to .xxx...xxx10000 rest are zeros so the number is betweem
0 and 1 not including zero or one. Trailing zeros mean nothing
so a file as "34 00" is same as "34 00 00".
The programs are all run by  "program.exe  filein fileout"
PROGRAMS GO TO FINITELY ODD
int2fin.exe   Converts from files made of any number of bytes
even2fin.exe  Converts from even byte length files
odd2fin.exe   Converts from odd byte length files
idea2fin.exe  Converts from 8 bytes multiple file

PROGRAMS FROM FINITELY ODD FILES to
fin2int.exe  Converts to byte file of any length
fin2even.exe Converts to even numbered byte files
fin2odd.exe  Converts to odd numbered byte files
fin2idea.exe Converts to 8 byte multiple files

AS example lets say we have inputs file alwasy odd
and we want to use IDEA to encrypt.
0000  20 74 68 69 73 20 69 73 20 61 6E 20 6F 64 64 20  * this is an odd *
0010  66 69 6C 65 21  .  .  .  .  .  .  .  .  .  .  .  *file!*
number of bytes is 21

Step one is convert the odd bytes numbered file to FOF (finitely
odd fraction) format. I will make a scott16v shortly that
will work with fintely odd files so the encryption will run
on these strange length bit files and then comvert result to bytes
but for know lets go on.
run odd2fin.exe
0000  A0 F4 68 E9 73 A0 69 F3 20 E1 6E A0 6F E4 64 A0  *..h.s.i. .n.o.d.*
0010  66 E9 6C E5 21  .  .  .  .  .  .  .  .  .  .  .  *f.l.!*
number of bytes is 21

now convert the finitely odd file to one that is a multiple of
8.  I know for this example the results for file that not already
in a multiple of 8 tend to have 00 bytes at end of file. One could
run a whitening pass to change this. But you do have a unique mapping
of the odd files to and from the 8 byte multiple files.
run fin2idea.exe
0000  20 F4 68 E9 73 A0 69 F3 C0 E1 6E A0 6F E4 64 A0  * .h.s.i...n.o.d.*
0010  A6 E9 6C E5 21 00 00 00  .  .  .  .  .  .  .  .  *..l.!...*
number of bytes is 24

```