files updated on December 20,1999
By FOF ONE TO ONE HUFFMAN COMPRESSION I mean using the Finitely Odd File format to an 8 bit byte transform.
It is actaully very easy to do huffman compression and end the
data stream in such a way that if one does not need worry about
byte boundaries. It is rather easy to make it into a finitely odd bit
stream and then use the Transforms to change it to files. Here is
the trick to do it. And I wish to thank Matt Timmermans for his version
of bijective arithmetic compression what follows is how would would
do something similar in huffman compression.
The correct way to do huffman compression and to waste not bits
and make it a finitely odd bit stream ( means that at least one bit
set and last bit that counts for anything is a 1). Take any huffman
table but best to assign it in an almost normal form. Where the all
zeros is on the right and is the lowest frequently occuring symbol
and the all ones symbol is the highest frequency occurring symbol.
normal the highest probability ones occur on the left and lowest
occur on the right. All ones paths on tree come down on left and
zero for the right paths.
Here is the symbols results for an example tree:
A = 11
B = 10
C = 011
D = 010
E = 001
F = 000
Expected probability of A > B > C > D > E >F for this example
The Above is a stand tree lets assume static compression and you have sent the static table to a friend but you want to compress the file to 8 bit bytes files in a One to One way (bijective)
You also are using the same table for this example The first thing one should do to save space is reoder the internal values that are pairs that don't include the all Zeroes symbol or the all Ones symbol. In a special way in this case there is only one internal pair C and D. These are an internal pair due to the fact they have the same path through the tree except the last split and do not contain the all zeroes (F) or all ones (A) symbol. To check if C and D should change places. Make sure that the higher probability one has a short short path in the tree. What do I mean by short path. I mean go down the tree and eliminte the trailing ones or zeros and see which has the shortest path in this Case the C has a short path of the single bit 0 while D has the short path 01 and since C occurs more often than D these two don't have to switch positions in the tree. Every symbol but the All zeros and all ones have a short path. We code the last symbol if it is the all zero symbol as all zeros no change. But for the internal symbols we only use the huffman code until there would be a tail of opposite zero-one-ness. If we don't stop at an internal node of which there is the same number as internal symbols -2 ( minus two because of the all zero and all one symbol) Then the last symbol is the all one symbol and no space is wasted. If we get to the leaf of the all ones symbol then it counts as a extra symbol if preceeding the series of the all ones token there is the start of file or any token other than the all zeros symbol.
Same example as above
Internal huffman values meaning another symbol is after it
A = 11
B= 10
C = 011
D = 010
E = 001
F = 000
Now the table if it is the last symbol in the compressed file
A = * special if F in front then 11 or groups of 11 count for each A
A = * if not F in front than there is one more A than sets of 11 also the empty symbol
at start means one A. We use the nothingness to indicate a lone A being compressed
B= 1
C = 0
D = 01
E = 00
F = 000 note the all zero symbol has no changes
To confuse you more lets add the symbol 1 as the last symbol
in the file to make it a finitely odd file and go from starting up in
binary. Note in this case since A nothing if only character
compressed file then you would have the lone 1
1 means an A
0 1 means a C
1 1 means a B
00 1 means E
01 1 means D
10 1 means BA
11 1 means AA
000 1 means F
001 1 means EA
010 1 means DA
011 1 means CA
100 1 means BE
101 1 means BB
110 1 means AC
111 1 means AB
....
1111111 1 means AAAB
00000000 1 means FFE
Now lets convert this to a 8 bit file using method to convert a finitely odd file to 8 bit byte files.
This allows a person to use real files so that every possible value uniquely maps to a set
of symbols in a bijective way so that infomation is not wasted as in most compression programs
00000000 is A
10000000 is C
01000000 is B
11000000 is E
10100000 is D
00100000 is BA
01100000 is AA
11100000 is F
11010000 is EA
10010000 is DA
10110000 is CA
00010000 is BE
00110000 is BB
01010000 is AC
01110000 is AB
...
01111111 is AAAB
11111111 is FFE
Below is a the file showing some results converting to and from finitely odd files. Here is int2fin.exe, fin2int.exe these routine can be used to change bases in a "one to one" way. Of course you don't need this if using scott16u or scott19u since they use any byte multiple. At this point you may want the compression routines that go to from FOF files here they are
This is set of files to convert to and from finitely odd files. A finitely odd file can be thought of as a fraction that is .1 to .xxx...xxx10000 rest are zeros so the number is betweem 0 and 1 not including zero or one. Trailing zeros mean nothing so a file as "34 00" is same as "34 00 00". The programs are all run by "program.exe filein fileout" PROGRAMS GO TO FINITELY ODD int2fin.exe Converts from files made of any number of bytes even2fin.exe Converts from even byte length files odd2fin.exe Converts from odd byte length files idea2fin.exe Converts from 8 bytes multiple file PROGRAMS FROM FINITELY ODD FILES to fin2int.exe Converts to byte file of any length fin2even.exe Converts to even numbered byte files fin2odd.exe Converts to odd numbered byte files fin2idea.exe Converts to 8 byte multiple files AS example lets say we have inputs file alwasy odd and we want to use IDEA to encrypt. 0000 20 74 68 69 73 20 69 73 20 61 6E 20 6F 64 64 20 * this is an odd * 0010 66 69 6C 65 21 . . . . . . . . . . . *file!* number of bytes is 21 Step one is convert the odd bytes numbered file to FOF (finitely odd fraction) format. I will make a scott16v shortly that will work with fintely odd files so the encryption will run on these strange length bit files and then comvert result to bytes but for know lets go on. run odd2fin.exe 0000 A0 F4 68 E9 73 A0 69 F3 20 E1 6E A0 6F E4 64 A0 *..h.s.i. .n.o.d.* 0010 66 E9 6C E5 21 . . . . . . . . . . . *f.l.!* number of bytes is 21 now convert the finitely odd file to one that is a multiple of 8. I know for this example the results for file that not already in a multiple of 8 tend to have 00 bytes at end of file. One could run a whitening pass to change this. But you do have a unique mapping of the odd files to and from the 8 byte multiple files. run fin2idea.exe 0000 20 F4 68 E9 73 A0 69 F3 C0 E1 6E A0 6F E4 64 A0 * .h.s.i...n.o.d.* 0010 A6 E9 6C E5 21 00 00 00 . . . . . . . . *..l.!...* number of bytes is 24