files updated on June 25, 2002
This page describes a static entropy coder with
first order frequencies based on ENGLISH. The
complete package is in the
bijective first order english entropy compressor so
that any 8-bit binary file can be thought of as
a compressed file made entirely of the letters:
The arithmetic compressor, as stated, was based on trying to compress
bijectively and to produce close to fairly optimal code when using the huffman
positions in interval. The choice and technique may need
more tuning, but it is bijective. The assigning of the values for the huffman code does make a difference
in the length of compressed files. Here is a simple proof John Savard used:
H = 0001 L = 01011
I used one based on higher probable symbols having the most trailing zeros which leads to:
H = 0110 L = 10110
If I use his table, "HL" or "LH" would compress to a two byte file. If my choice is used,
it only takes one byte. If one only looks at the first 7 most commonly used symbols, and takes
any two pairs (49 cases), then my choices of huffmun would compress all the cases to a one byte file. His
would compress all but 13 cases, so out the gate it's a bad choice.
I first line up the symbols in order of most common. Then, from the following sequences
I assign using the appropriate lengths.
1000000.... tail of infinite zeros
0100000.....
1100000.....
0010000.....
1010000.....
0110000.....
1110000.....
0001000..... and etc.
this will make the most common symbol 1000.... a leading 1 followed by all zeros
this will make the least common symbol 000.... all zeroes
next to last symbol will be same as last symbol except last bit is a 1.
Here is the optimal static huffman table use in the code:
0 E = 100 1 T = 010 2 A = 1100 3 O = 0010 4 I = 1010 5 H = 0110 6 N = 1110 Note the most common symbols have trailing zeros 7 S = 0001 8 R = 1101 9 D = 00110 10 L = 10110 11 U = 01110 12 M = 11110 13 W = 00001 14 C = 00111 15 Y = 10111 16 F = 011110 17 G = 111110 18 P = 000001 19 B = 011111 20 V = 111111 21 K = 0000001 22 X = 00000001 23 J = 000000001 24 Q = 0000000001 Note only Q and Z can't be compressed to a singe byte 25 Z = 0000000000 Some notes about bijective huffman compression - if the compressed file is one byte long it could contain any letter but Q or Z. The huffman table was made from the following weights E 12.32 S 6.28 C 2.48 K 0.80 T 9.05 R 5.72 Y 2.11 X 0.15 A 8.17 D 4.31 F 2.09 J 0.10 O 7.81 L 3.97 G 1.82 Q 0.09 I 6.89 U 3.04 P 1.56 Z 0.05 H 6.68 M 2.77 B 1.45 N 6.62 W 2.64 V 1.02 >From John Savard's site and attributed to Jim Gillogly. since integer weights needed to be multiplied by 100 and it added up to 9999 when the symbol huf_huf is defined amd ari_huf is not, I used the power of two weights to get it to behave as a huffman. the bottom weights where 1, next level 2, and so on for a total weight of 1024
There are several ways to make a file bijective. Here is one way to do it:
Assign the huffman codes in the optimal order from the previous table. First,
let's pretend the file to compress does not contain a "Z". Then, every
symbol when compressed will have at least one "1" bit. You just write the
file and leave off trailing "0"s. That way the last BYTE of file will not contain
the all zero byte case. This is in fact what happens to the file when it does
not contain a "Z", so it's easy to check. Next, what if file contains a "Z"
I could (but this is wrong) just add and EOF symbol that is all zeros
so the huffman table would be as before, but the "Z" entry would be one bit
longer and end in a "1" bit. This would mean every time "Z" is used, the file length
will increase by one bit. Eight "Z"s and you have added another byte in the output
file, so this is hardly optimal.
Instead you use the table as is. Only if the "all zero compression token occurs last"
do you tack on an extra bit. This saves a lot of space, but it's not done yet. There is
still the problem that this can't be bijective since the last byte will always have at least one
bit set. To be bijective, this will not include all files even those that end in all zeroes.
At this point, I will explain two schemes: (i) the one I used and (ii) a better one that I might
add to a new package but will take more code.
First, the one I used from the Huffman point of view:
If the input file ends " in all zero token " the "Z" I add a "E" to the file.
if the file ends in a string of "Z" followed by one or more "E" I add an extra
"E" that way the all zero token is never last and the operation is so far reversible.
Next, I have to write the output file. I do the same thing in reverse, but work on full bytes.
I group the files in byte chunks with any trailing zeroes dropped. At this point, the file
can't end in the all zero byte, but if it ends in something other than 0x80 you're done.
If it ends in a tail with all zero bytes followed by one or more 0x80 bytes you drop the
last 0x80 byte and you're done.
Now, from the arithmetic point of view:
It's just like the Huffman except the "Z" symbol is usually not the all "zero symbol"
most of the time. It's just the way Arithmetic goes about updating "high and "low".
The "Z" symbol appears at a spot where it's assigned the all zero symbol, but this is generally
not the case with pure Arithmetic since after each symbol is processed the high and
low are random looking values. While in the pure huffman the high ends up at
maxhigh and low at zero. This is where the 2 methods sort of appear different because of the non
power of two weights. You will see the effect of this later. Anyway, in the Huffman
when a "the all zero symbol" -which now may not be "Z"- occurs, at end an "E" is forced to follow.
It's the same in pure arithmetic
except in huffman it was 3 bits of "100". In arithmetic, I force the most common symbol
E to be the last but on average it's not 3 bits. Sometimes 2, sometimes 4, it varies
and it's seldom "100", it's more like "CC"
Another more optimal way to end the static huffman file is as follows:
if it ends in "Z" go to "ZE" as before, but if it ends in "ZE" got to "ZT".
That is, instead of tacking on an extra "E" each time, bump up to the next
most common symbol. After a while, "ZX" goes to "ZQ" and
"ZQ" goes to "ZQE" and the bad tail "ZQQQE" goes to "ZQQQT"
and so on. The "flag" of doing something is only on if a string of "Q"s
follows the "Z". This greatly adds to the complexity of code. I would
do the same thing to bytes on final conversion. This code is meant only
as a learning tool. The alternate ending though more optimal only saves
space. Sometimes if the file ends in "ZEE" it loses some space
in cases like "ZK" which map to "ZX" instead of leaving it alone.
I plan to write code using what I think are more optimal for the huffman at a later time.
The goal here was to write a simple bijective huffman-arithmetic coder.
One of internal differences in this set of code is the fact that in the
high-low updating routines everything is done with 62 bit arithmetic.
The huffman always ends up outputting the tokens "0" and "1"
and the high ends up max and the low ends up at zero. And the
free end value tacked in is "00". On the other hand, in the arithmetic
the inner routine outputs the tokens "0" "1" and "C" The " C"
is like half way in between and will either become a "0" or "1"
at a latter time. It's like the coding of future symbols yet to come
affect what it will actually become. Sometimes it becomes a "0"
and sometimes a "1". The high and low states take on random
looking values after each processing. The High will always start with
a "1" bit and the low with a "0" bit the difference between the two
values will vary from the full length of table to just a tiny bit larger than
one fourth the window. So the state of compression at any bit is
the place in output string and the value of the high and low.
The free ends used with this arithmetic are "00" "10" "01" "11"
these should be all that are needed since the most common
symbol is small enough. If it was a larger weight more ends or a
different ending method would be needed. I feel for arithmetic
you're better off with 25 2 state 62bit bijective encoders. Yes, I will
eventually make a web page for it too. This is just a simple exercise
to get a matching true huffman so one can see the real differences
between what we call huffman or arithmetic and yet do it in a clean
bijective way. The way it should be done. This
means pure arithmetic is better suited for compression before encryption
if the first part of a huffman compressed file is lost. It would be easy for an enemy
to recover the lost part since the string is only a function of trailing bits,
while in arithmetic it is also a function of the hidden 'high and "low" at
that particular point in the file, and they could be almost anything. So it's
like having an extra secret key worth over one hundred bits of key space.
Another point is that this huffman compression is very very slow. It's done only to
show that you could use a arithmetic compressor to do the huffman
and it's for comparison purposes.
test file TW1 These are the only characters these compressors work with 0000 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 50 *ABCDEFGHIJKLMNOP* 0010 51 52 53 54 55 56 57 58 59 5A . . . . . . *QRSTUVWXYZ* number of bytes is 26 huffman compress of TW1 0000 C7 CE 68 F7 CD 40 10 36 F7 10 20 0E 8A 77 E1 01 *..h..@.6.. ..w..* 0010 B8 01 . . . . . . . . . . . . . . *..* number of bytes is 18 huffman uncompress of TW1 0000 54 57 54 53 4F 53 49 4F 4F 4F 49 44 54 43 54 54 *TWTSOSIOOOIDTCTT* 0010 53 4F 54 45 49 54 54 52 44 4F 48 49 55 54 46 49 *SOTEITTRDOHIUTFI* 0020 57 54 4F 49 54 54 45 52 54 45 54 49 49 4C 54 59 *WTOITTERTETIILTY* 0030 54 41 4F 41 49 52 . . . . . . . . . . *TAOAIR* number of bytes is 54 arithmetic compress of TW1 0000 C7 0F 29 B0 7B 29 D7 A3 A0 C9 0E 38 2F 68 70 96 *..).{).....8/hp.* 0010 21 BC 40 . . . . . . . . . . . . . *!.@* number of bytes is 19 arithmetic uncompress of TW1 0000 43 4C 4F 52 57 52 4E 45 54 47 48 54 55 41 4E 57 *CLORWRNETGHTUANW* 0010 57 52 45 48 54 48 45 48 52 53 54 54 54 57 54 4C *WREHTHEHRSTTTWTL* 0020 4E 50 45 4C 48 48 41 49 54 4F 4F 41 4E 4F 41 52 *NPELHHAITOOANOAR* 0030 4C 45 48 43 . . . . . . . . . . . . *LEHC* number of bytes is 52 test file TW2 on compression the last byte a function of number of Z's 0000 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* 0010 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* 0020 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* 0030 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* 0040 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* number of bytes is 80 huffman compress of TW2 matches "Z" = 0000000000 10 zeros 0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0060 00 00 00 00 . . . . . . . . . . . . *....* number of bytes is 100 arithmetic compress of TW2 note last byte not zero 0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0060 00 00 00 00 00 00 00 00 00 00 00 00 00 04 . . *..............* number of bytes is 110 arithmetic uncompress of the huffman compress of TW2 done to show last byte not Z 0000 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* 0010 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* 0020 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* 0030 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* 0040 5A 5A 5A 5A 5A 5A 5A 5A 5A 4C . . . . . . *ZZZZZZZZZL* number of bytes is 74 test file TW3 like TW2 but one byte more zeros 0000 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* 0010 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* 0020 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* 0030 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* 0040 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* 0050 5A . . . . . . . . . . . . . . . *Z* number of bytes is 81 huffman compress of TW3 note last byte not zero 0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0060 00 00 00 00 00 20 . . . . . . . . . . *..... * number of bytes is 102 arithmetic compress of TW3 0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 . *...............* number of bytes is 111 huffman uncompress of the arithmetic compress of TW3 done to show last byte not Z 0000 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* 0010 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* 0020 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* 0030 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* 0040 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* 0050 5A 5A 5A 5A 5A 5A 5A 5A 4A . . . . . . . *ZZZZZZZZJ* number of bytes is 89 **note that the huffman compress a stream of Z's smaller than pure arithmetic that's because huffman used a weight of 1/1024 = .00098 while arithmetic used a weight of 5/9999 = .0005 so it uses more space to write the zero since weights used in arithmetic assume Z rarer than in the huffman for weights used in huffman code. If you change the weights you get different answers test file TW4 it's the one that tends to compress to all 1's but again ending depends on number of bytes of all 1's and again the arithmetic assumes V is rarer so huffman beats it. 0000 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 *VVVVVVVVVVVVVVVV* 0010 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 *VVVVVVVVVVVVVVVV* 0020 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 *VVVVVVVVVVVVVVVV* 0030 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 *VVVVVVVVVVVVVVVV* 0040 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 *VVVVVVVVVVVVVVVV* number of bytes is 80 huffman compress of TW4 not I = 111111 which is 6 1's 0000 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF *................* 0010 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF *................* 0020 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF *................* 0030 FF FF FF FF FF FF FF FF FF FF FF FF . . . . *............* number of bytes is 60 arithmetic compress of TW4 0000 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF *................* 0010 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF *................* 0020 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF *................* 0030 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF *................* 0040 FF FF C0 . . . . . . . . . . . . . *...* number of bytes is 67 This next file TW5 is all F's but it shows two points one arithmetic values F higher than huffman so it will be shorter. Second since F not at end of interval like Z (the all zero case) or V (the all one case) it will look nice when compressed by huffman but will look random in the arithmetic due to internal state of high and low 0000 46 46 46 46 46 46 46 46 46 46 46 46 46 46 46 46 *FFFFFFFFFFFFFFFF* 0010 46 46 46 46 46 46 46 46 46 46 46 46 46 46 46 46 *FFFFFFFFFFFFFFFF* 0020 46 46 46 46 46 46 46 46 46 46 46 46 46 46 46 46 *FFFFFFFFFFFFFFFF* 0030 46 46 46 46 46 46 46 46 46 46 46 46 46 46 46 46 *FFFFFFFFFFFFFFFF* 0040 46 46 46 46 46 46 46 46 46 46 46 46 46 46 46 46 *FFFFFFFFFFFFFFFF* number of bytes is 80 huffman compress of TW5 note how redunant 0000 79 E7 9E 79 E7 9E 79 E7 9E 79 E7 9E 79 E7 9E 79 *y..y..y..y..y..y* 0010 E7 9E 79 E7 9E 79 E7 9E 79 E7 9E 79 E7 9E 79 E7 *..y..y..y..y..y.* 0020 9E 79 E7 9E 79 E7 9E 79 E7 9E 79 E7 9E 79 E7 9E *.y..y..y..y..y..* 0030 79 E7 9E 79 E7 9E 79 E7 9E 79 E7 9E . . . . *y..y..y..y..* number of bytes is 60 arithmetic compress of TW5 note whole thing random looking 0000 75 C6 93 95 F0 6A 45 24 B7 73 44 EF 03 4E 4E 1C *u....jE$.sD..NN.* 0010 C6 74 63 DE 70 FC 4F 0F 79 6A A2 0F F7 33 AD 98 *.tc.p.O.yj...3..* 0020 3B 7A 20 6B 02 99 09 E1 3B C8 41 36 17 75 A2 64 *;z k....;.A6.u.d* 0030 09 C7 7B 45 49 21 EE B2 . . . . . . . . *..{EI!..* number of bytes is 56 the next two sets show what happens if Z the all zero case or V the all one case occurs in a long stretch but is not the first character. It's really meant to demonstrate the random look of the arithmetic which is due to the continually changing internal high and low state. When weights pure huffman like (powers of 2 and placed in correct slots) the high is always max at end of character and low is always zero. This is not the case with arithmetic in general test file TW6 0000 46 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *FZZZZZZZZZZZZZZZ* 0010 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* 0020 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* 0030 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* 0040 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A *ZZZZZZZZZZZZZZZZ* number of bytes is 80 huffman compress of TW6 note only first byte not zero 0000 78 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *x...............* 0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0060 00 00 00 08 . . . . . . . . . . . . *....* number of bytes is 100 arithmetic compress of TW6 note not till 9th byte did it settle to zero 0000 73 50 5D CE CA B8 98 30 00 00 00 00 00 00 00 00 *sP]....0........* 0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *................* 0060 00 00 00 00 00 00 00 00 00 00 00 00 00 A0 . . *..............* number of bytes is 110 arithmetic uncompress of huffman compress of TW6 0000 46 52 4D 41 41 54 4F 4E 4E 4C 4C 57 53 48 48 41 *FRMAATONNLLWSHHA* 0010 49 48 49 4C 4E 54 4D 45 45 48 4F 45 50 45 45 53 *IHILNTMEEHOEPEES* 0020 4E 46 54 4F 4F 52 49 41 52 4F 44 41 52 45 45 57 *NFTOORIARODAREEW* 0030 41 54 4B 48 48 48 45 56 45 59 45 4C 41 50 49 4F *ATKHHHEVEYELAPIO* 0040 46 41 4E 4E 48 54 4D 47 54 41 4E 4E 54 41 53 54 *FANNHTMGTANNTAST* 0050 57 44 4E 4F 54 45 4E 59 54 4D 49 4B 4C 54 45 46 *WDNOTENYTMIKLTEF* 0060 45 45 4E 57 53 49 54 4D 49 45 53 4C 50 41 48 43 *EENWSITMIESLPAHC* 0070 4F 53 57 49 42 49 55 45 45 54 48 54 41 41 52 49 *OSWIBIUEETHTAARI* 0080 45 45 55 41 55 55 4F 4C 4E 54 41 4F 49 54 45 48 *EEUAUUOLNTAOITEH* 0090 44 4E 47 4E 49 46 4C 49 52 44 54 47 4C 54 42 53 *DNGNIFLIRDTGLTBS* 00A0 45 4C 55 52 45 41 54 48 56 49 53 49 49 45 4D 55 *ELUREATHVISIIEMU* 00B0 4F 53 4C 45 52 41 4C 45 41 52 58 54 53 53 53 49 *OSLERALEARXTSSSI* 00C0 4E . . . . . . . . . . . . . . . *N* number of bytes is 193 The above looks nasty, but if you compress it, it then goes back to exactly the huffman compressed file of TW6 TW7 0000 46 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 *FVVVVVVVVVVVVVVV* 0010 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 *VVVVVVVVVVVVVVVV* 0020 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 *VVVVVVVVVVVVVVVV* 0030 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 *VVVVVVVVVVVVVVVV* 0040 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 *VVVVVVVVVVVVVVVV* number of bytes is 80 huffman compress of TW7 0000 7B FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF *{...............* 0010 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF *................* 0020 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF *................* 0030 FF FF FF FF FF FF FF FF FF FF FF FF . . . . *............* number of bytes is 60 arithmetic compress of TW7 0000 78 AA 34 B0 A8 5C C8 9F FF FF FF FF FF FF FF FF *x.4..\..........* 0010 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF *................* 0020 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF *................* 0030 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF *................* 0040 FF FF 80 . . . . . . . . . . . . . *...* number of bytes is 67 arithmetic uncompress of huffman compress of TW7 0000 42 4E 54 45 45 54 4E 41 4E 44 50 43 54 4E 4C 4F *BNTEETNANDPCTNLO* 0010 54 4E 41 4F 4E 54 42 4E 44 55 45 48 43 56 43 47 *TNAONTBNDUEHCVCG* 0020 47 4F 49 56 44 54 41 44 45 48 54 49 45 4C 54 44 *GOIVDTADEHTIELTD* 0030 56 44 55 4E 59 49 43 49 54 59 45 41 54 57 4F 42 *VDUNYICITYEATWOB* 0040 4F 43 48 47 54 54 53 47 45 41 49 50 44 49 48 45 *OCHGTTSGEAIPDIHE* 0050 49 45 45 49 53 48 54 54 45 4E 49 52 54 47 41 54 *IEEISHTTENIRTGAT* 0060 4F 47 41 4D 54 45 4F 59 43 41 52 52 46 59 54 49 *OGAMTEOYCARRFYTI* 0070 45 4C . . . . . . . . . . . . . . *EL* number of bytes is 114 The last test is a test of error recovery. With arithmetic forget it. With huffman you've got a shot at it. test file TW8 0000 54 48 45 20 51 55 43 49 4B 20 42 52 4F 57 4E 20 *THE QUCIK BROWN * 0010 46 4F 58 20 4A 55 4D 50 53 20 4F 56 45 52 20 54 *FOX JUMPS OVER T* 0020 48 45 20 4C 41 5A 59 20 44 4F 47 0D 0A 54 48 45 *HE LAZY DOG..THE* 0030 20 51 55 43 49 4B 20 42 52 4F 57 4E 20 46 4F 58 * QUCIK BROWN FOX* 0040 20 4A 55 4D 50 53 20 4F 56 45 52 20 54 48 45 20 * JUMPS OVER THE * 0050 4C 41 5A 59 20 44 4F 47 0D 0A 54 48 45 20 51 55 *LAZY DOG..THE QU* 0060 43 49 4B 20 42 52 4F 57 4E 20 46 4F 58 20 4A 55 *CIK BROWN FOX JU* 0070 4D 50 53 20 4F 56 45 52 20 54 48 45 20 4C 41 5A *MPS OVER THE LAZ* 0080 59 20 44 4F 47 0D 0A 54 48 45 20 51 55 43 49 4B *Y DOG..THE QUCIK* 0090 20 42 52 4F 57 4E 20 46 4F 58 20 4A 55 4D 50 53 * BROWN FOX JUMPS* 00A0 20 4F 56 45 52 20 54 48 45 20 4C 41 5A 59 20 44 * OVER THE LAZY D* 00B0 4F 47 0D 0A 54 48 45 20 51 55 43 49 4B 20 42 52 *OG..THE QUCIK BR* 00C0 4F 57 4E 20 46 4F 58 20 4A 55 4D 50 53 20 4F 56 *OWN FOX JUMPS OV* 00D0 45 52 20 54 48 45 20 4C 41 5A 59 20 44 4F 47 0D *ER THE LAZY DOG.* 00E0 0A 54 48 45 20 51 55 43 49 4B 20 42 52 4F 57 4E *.THE QUCIK BROWN* 00F0 20 46 4F 58 20 4A 55 4D 50 53 20 4F 56 45 52 20 * FOX JUMPS OVER * 0100 54 48 45 20 4C 41 5A 59 20 44 4F 47 0D 0A 54 48 *THE LAZY DOG..TH* 0110 45 20 51 55 43 49 4B 20 42 52 4F 57 4E 20 46 4F *E QUCIK BROWN FO* 0120 58 20 4A 55 4D 50 53 20 4F 56 45 52 20 54 48 45 *X JUMPS OVER THE* 0130 20 4C 41 5A 59 20 44 4F 47 0D 0A 54 48 45 20 51 * LAZY DOG..THE Q* 0140 55 43 49 4B 20 42 52 4F 57 4E 20 46 4F 58 20 4A *UCIK BROWN FOX J* 0150 55 4D 50 53 20 4F 56 45 52 20 54 48 45 20 4C 41 *UMPS OVER THE LA* 0160 5A 59 20 44 4F 47 0D 0A 54 48 45 20 51 55 43 49 *ZY DOG..THE QUCI* 0170 4B 20 42 52 4F 57 4E 20 46 4F 58 20 4A 55 4D 50 *K BROWN FOX JUMP* 0180 53 20 4F 56 45 52 20 54 48 45 20 4C 41 5A 59 20 *S OVER THE LAZY * 0190 44 4F 47 0D 0A 54 48 45 20 51 55 43 49 4B 20 42 *DOG..THE QUCIK B* 01A0 52 4F 57 4E 20 46 4F 58 20 4A 55 4D 50 53 20 4F *ROWN FOX JUMPS O* 01B0 56 45 52 20 54 48 45 20 4C 41 5A 59 20 44 4F 47 *VER THE LAZY DOG* 01C0 0D 0A 54 48 45 20 51 55 43 49 4B 20 42 52 4F 57 *..THE QUCIK BROW* 01D0 4E 20 46 4F 58 20 4A 55 4D 50 53 20 4F 56 45 52 *N FOX JUMPS OVER* 01E0 20 54 48 45 20 4C 41 5A 59 20 44 4F 47 0D 0A 54 * THE LAZY DOG..T* 01F0 48 45 20 51 55 43 49 4B 20 42 52 4F 57 4E 20 46 *HE QUCIK BROWN F* 0200 4F 58 20 4A 55 4D 50 53 20 4F 56 45 52 20 54 48 *OX JUMPS OVER TH* 0210 45 20 4C 41 5A 59 20 44 4F 47 0D 0A 54 48 45 20 *E LAZY DOG..THE * 0220 51 55 43 49 4B 20 42 52 4F 57 4E 20 46 4F 58 20 *QUCIK BROWN FOX * 0230 4A 55 4D 50 53 20 4F 56 45 52 20 54 48 45 20 4C *JUMPS OVER THE L* 0240 41 5A 59 20 44 4F 47 0D 0A . . . . . . . *AZY DOG..* number of bytes is 585 This is TW8 with only characters "A through Z" rest not used and this is what comes back after you compress then uncompress 0000 54 48 45 51 55 43 49 4B 42 52 4F 57 4E 46 4F 58 *THEQUCIKBROWNFOX* 0010 4A 55 4D 50 53 4F 56 45 52 54 48 45 4C 41 5A 59 *JUMPSOVERTHELAZY* 0020 44 4F 47 54 48 45 51 55 43 49 4B 42 52 4F 57 4E *DOGTHEQUCIKBROWN* 0030 46 4F 58 4A 55 4D 50 53 4F 56 45 52 54 48 45 4C *FOXJUMPSOVERTHEL* 0040 41 5A 59 44 4F 47 54 48 45 51 55 43 49 4B 42 52 *AZYDOGTHEQUCIKBR* 0050 4F 57 4E 46 4F 58 4A 55 4D 50 53 4F 56 45 52 54 *OWNFOXJUMPSOVERT* 0060 48 45 4C 41 5A 59 44 4F 47 54 48 45 51 55 43 49 *HELAZYDOGTHEQUCI* 0070 4B 42 52 4F 57 4E 46 4F 58 4A 55 4D 50 53 4F 56 *KBROWNFOXJUMPSOV* 0080 45 52 54 48 45 4C 41 5A 59 44 4F 47 54 48 45 51 *ERTHELAZYDOGTHEQ* 0090 55 43 49 4B 42 52 4F 57 4E 46 4F 58 4A 55 4D 50 *UCIKBROWNFOXJUMP* 00A0 53 4F 56 45 52 54 48 45 4C 41 5A 59 44 4F 47 54 *SOVERTHELAZYDOGT* 00B0 48 45 51 55 43 49 4B 42 52 4F 57 4E 46 4F 58 4A *HEQUCIKBROWNFOXJ* 00C0 55 4D 50 53 4F 56 45 52 54 48 45 4C 41 5A 59 44 *UMPSOVERTHELAZYD* 00D0 4F 47 54 48 45 51 55 43 49 4B 42 52 4F 57 4E 46 *OGTHEQUCIKBROWNF* 00E0 4F 58 4A 55 4D 50 53 4F 56 45 52 54 48 45 4C 41 *OXJUMPSOVERTHELA* 00F0 5A 59 44 4F 47 54 48 45 51 55 43 49 4B 42 52 4F *ZYDOGTHEQUCIKBRO* 0100 57 4E 46 4F 58 4A 55 4D 50 53 4F 56 45 52 54 48 *WNFOXJUMPSOVERTH* 0110 45 4C 41 5A 59 44 4F 47 54 48 45 51 55 43 49 4B *ELAZYDOGTHEQUCIK* 0120 42 52 4F 57 4E 46 4F 58 4A 55 4D 50 53 4F 56 45 *BROWNFOXJUMPSOVE* 0130 52 54 48 45 4C 41 5A 59 44 4F 47 54 48 45 51 55 *RTHELAZYDOGTHEQU* 0140 43 49 4B 42 52 4F 57 4E 46 4F 58 4A 55 4D 50 53 *CIKBROWNFOXJUMPS* 0150 4F 56 45 52 54 48 45 4C 41 5A 59 44 4F 47 54 48 *OVERTHELAZYDOGTH* 0160 45 51 55 43 49 4B 42 52 4F 57 4E 46 4F 58 4A 55 *EQUCIKBROWNFOXJU* 0170 4D 50 53 4F 56 45 52 54 48 45 4C 41 5A 59 44 4F *MPSOVERTHELAZYDO* 0180 47 54 48 45 51 55 43 49 4B 42 52 4F 57 4E 46 4F *GTHEQUCIKBROWNFO* 0190 58 4A 55 4D 50 53 4F 56 45 52 54 48 45 4C 41 5A *XJUMPSOVERTHELAZ* 01A0 59 44 4F 47 54 48 45 51 55 43 49 4B 42 52 4F 57 *YDOGTHEQUCIKBROW* 01B0 4E 46 4F 58 4A 55 4D 50 53 4F 56 45 52 54 48 45 *NFOXJUMPSOVERTHE* 01C0 4C 41 5A 59 44 4F 47 . . . . . . . . . *LAZYDOG* number of bytes is 455 arithmetic compress of TW8 note looks random 0000 4C 31 D9 CC 49 B0 24 9F A0 21 DD 03 1B AB F9 B1 *L1..I.$..!......* 0010 A0 9C 79 22 70 A2 2D A3 27 9A ED 52 BF 0E 1E 36 *..y"p.-.'..R...6* 0020 19 EF DB 4C 4B 21 4C F6 73 9B 95 84 F9 45 81 DA *...LK!L.s....E..* 0030 31 8F 37 B0 21 CA 55 EE 74 A4 F7 83 55 0B EB 23 *1.7.!.U.t...U..#* 0040 D6 CD A1 FF 07 50 5F 93 F9 07 5D B0 71 3C 04 77 *.....P_...].q<.w* 0050 E4 18 1E 1D 04 DC A1 64 B4 87 51 BB 03 AC EF 76 *.......d..Q....v* 0060 2F 53 3E 77 CD 3F C6 CD 59 13 90 FE 79 06 30 38 */S>w.?..Y...y.08* 0070 28 F5 07 DC F6 22 93 36 CA 75 F1 BD 4E 00 63 8B *(....".6.u..N.c.* 0080 11 FF FF 10 B4 12 30 48 1A A5 A4 BA 9E 05 10 D2 *......0H........* 0090 C7 80 94 81 79 AB 55 7C 54 97 D2 AF 51 70 FE 9C *....y.U|T...Qp..* 00A0 9A C3 AD BE 67 4F 70 54 99 F8 CA 79 A5 E7 C8 FF *....gOpT...y....* 00B0 39 ED E3 3C D0 47 9C B4 FD 10 80 11 21 7A 19 25 *9..<.G......!z.%* 00C0 B4 9B B6 0E 51 AC DF EA 44 DB E2 E1 B0 C8 F2 6A *....Q...D......j* 00D0 DB 45 4C 25 2D 98 F3 99 72 F1 26 A5 21 49 F5 7E *.EL%-...r.&.!I.~* 00E0 31 3A ED C2 9F E2 BF 5C 71 68 86 22 22 38 39 18 *1:.....\qh.""89.* 00F0 F1 88 91 C0 01 09 9E 9F 95 9C 23 76 F5 9E B2 3B *..........#v...;* 0100 09 E3 66 BD 94 53 DC 5D EB DF D5 4A 26 0B C7 87 *..f..S.]...J&...* 0110 2C C1 79 68 B3 78 2C D4 E2 A2 07 C7 F8 C8 B3 33 *,.yh.x,........3* 0120 D2 F7 82 92 62 . . . . . . . . . . . *....b* number of bytes is 293 file just like previous but A0 in second row changed to A1 to see effect of one bit change on decompression 0000 4C 31 D9 CC 49 B0 24 9F A0 21 DD 03 1B AB F9 B1 *L1..I.$..!......* 0010 A1 9C 79 22 70 A2 2D A3 27 9A ED 52 BF 0E 1E 36 *..y"p.-.'..R...6* 0020 19 EF DB 4C 4B 21 4C F6 73 9B 95 84 F9 45 81 DA *...LK!L.s....E..* 0030 31 8F 37 B0 21 CA 55 EE 74 A4 F7 83 55 0B EB 23 *1.7.!.U.t...U..#* 0040 D6 CD A1 FF 07 50 5F 93 F9 07 5D B0 71 3C 04 77 *.....P_...].q<.w* 0050 E4 18 1E 1D 04 DC A1 64 B4 87 51 BB 03 AC EF 76 *.......d..Q....v* 0060 2F 53 3E 77 CD 3F C6 CD 59 13 90 FE 79 06 30 38 */S>w.?..Y...y.08* 0070 28 F5 07 DC F6 22 93 36 CA 75 F1 BD 4E 00 63 8B *(....".6.u..N.c.* 0080 11 FF FF 10 B4 12 30 48 1A A5 A4 BA 9E 05 10 D2 *......0H........* 0090 C7 80 94 81 79 AB 55 7C 54 97 D2 AF 51 70 FE 9C *....y.U|T...Qp..* 00A0 9A C3 AD BE 67 4F 70 54 99 F8 CA 79 A5 E7 C8 FF *....gOpT...y....* 00B0 39 ED E3 3C D0 47 9C B4 FD 10 80 11 21 7A 19 25 *9..<.G......!z.%* 00C0 B4 9B B6 0E 51 AC DF EA 44 DB E2 E1 B0 C8 F2 6A *....Q...D......j* 00D0 DB 45 4C 25 2D 98 F3 99 72 F1 26 A5 21 49 F5 7E *.EL%-...r.&.!I.~* 00E0 31 3A ED C2 9F E2 BF 5C 71 68 86 22 22 38 39 18 *1:.....\qh.""89.* 00F0 F1 88 91 C0 01 09 9E 9F 95 9C 23 76 F5 9E B2 3B *..........#v...;* 0100 09 E3 66 BD 94 53 DC 5D EB DF D5 4A 26 0B C7 87 *..f..S.]...J&...* 0110 2C C1 79 68 B3 78 2C D4 E2 A2 07 C7 F8 C8 B3 33 *,.yh.x,........3* 0120 D2 F7 82 92 62 . . . . . . . . . . . *....b* number of bytes is 293 decompress of previous file note no error recover on a one bit error so if error recover needed it's dangerous 0000 54 48 45 51 55 43 49 4B 42 52 4F 57 4E 46 4F 58 *THEQUCIKBROWNFOX* 0010 4A 55 4D 50 53 4F 56 45 52 54 41 53 48 47 57 53 *JUMPSOVERTASHGWS* 0020 52 4C 4F 54 45 55 54 4F 46 45 4F 4D 48 4F 57 57 *RLOTEUTOFEOMHOWW* 0030 4C 48 42 54 47 57 4C 50 49 4F 4E 4F 55 54 54 41 *LHBTGWLPIONOUTTA* 0040 4C 42 4F 48 53 49 52 46 49 42 41 47 4F 44 41 52 *LBOHSIRFIBAGODAR* 0050 48 54 48 4F 45 4D 56 45 4E 46 4F 47 54 41 45 54 *HTHOEMVENFOGTAET* 0060 49 48 45 4E 4F 53 4E 54 49 48 49 59 4E 45 49 57 *IHENOSNTIHIYNEIW* 0070 54 47 43 50 59 52 4F 4C 46 50 57 4E 57 45 4C 54 *TGCPYROLFPWNWELT* 0080 50 44 48 44 4E 45 57 4C 45 52 41 49 44 49 53 49 *PDHDNEWLERAIDISI* 0090 52 50 57 59 4F 43 45 53 4E 49 4E 43 56 59 41 54 *RPWYOCESNINCVYAT* 00A0 4E 54 42 54 49 43 4E 54 57 4E 53 47 44 41 55 45 *NTBTICNTWNSGDAUE* 00B0 4E 45 49 4B 4B 44 47 59 49 4E 54 4F 4B 55 49 55 *NEIKKDGYINTOKUIU* 00C0 41 4E 4F 45 41 4E 52 46 49 54 56 45 57 48 4C 45 *ANOEANRFITVEWHLE* 00D0 41 41 4F 41 56 41 45 47 49 45 41 4C 4E 47 59 54 *AAOAVAEGIEALNGYT* 00E0 48 52 41 4F 43 44 57 48 42 52 48 4D 50 4F 42 57 *HRAOCDWHBRHMPOBW* 00F0 45 4F 49 4C 45 41 45 41 4E 4F 4F 41 4B 4B 49 4F *EOILEAEANOOAKKIO* 0100 4F 4B 52 50 4F 4E 54 52 4F 4D 53 52 49 41 49 49 *OKRPONTROMSRIAII* 0110 49 4E 54 48 4C 54 4E 41 4E 54 59 54 54 49 54 43 *INTHLTNANTYTTITC* 0120 50 53 46 54 4F 53 54 4C 49 47 49 49 4E 45 41 54 *PSFTOSTLIGIINEAT* 0130 45 53 4C 59 53 54 50 41 4B 48 4F 54 4E 52 4E 4B *ESLYSTPAKHOTNRNK* 0140 4C 52 4E 52 52 4F 54 54 49 44 4F 54 53 4F 4E 41 *LRNRROTTIDOTSONA* 0150 57 46 55 48 53 44 47 43 45 41 44 54 44 44 49 45 *WFUHSDGCEADTDDIE* 0160 45 49 54 48 4F 57 45 41 45 4F 4D 54 44 48 41 4D *EITHOWEAEOMTDHAM* 0170 52 4E 4F 41 4E 52 4C 54 53 45 52 54 42 54 44 41 *RNOANRLTSERTBTDA* 0180 57 41 45 48 4F 4F 4E 42 50 4C 4C 4C 45 4E 4F 46 *WAEHOONBPLLLENOF* 0190 4F 4F 53 49 45 57 41 45 48 45 54 4D 4C 48 49 4C *OOSIEWAEHETMLHIL* 01A0 41 45 45 4D 46 48 46 41 45 50 44 4F 52 4D 45 54 *AEEMFHFAEPDORMET* 01B0 49 55 50 45 55 54 4C 57 48 4E 52 49 43 4A 53 49 *IUPEUTLWHNRICJSI* 01C0 4C 4E 4C 41 54 52 45 44 52 45 45 41 4E 52 43 55 *LNLATREDREEANRCU* 01D0 49 52 49 59 45 42 4E 45 54 4D 55 52 45 48 4C 48 *IRIYEBNETMUREHLH* 01E0 44 4F 4E 45 54 4B 44 49 52 54 4F 54 41 52 54 52 *DONETKDIRTOTARTR* 01F0 45 46 41 43 41 59 45 50 49 41 42 45 49 4B 4C 4F *EFACAYEPIABEIKLO* 0200 48 49 4C 52 4F 41 54 54 4F 49 41 55 49 54 4F 57 *HILROATTOIAUITOW* 0210 4F 48 54 4D 43 45 4F 53 49 55 45 4E 56 50 50 4C *OHTMCEOSIUENVPPL* 0220 53 48 48 . . . . . . . . . . . . . *SHH* number of bytes is 547 huffman compress of TW8 note that it repeats so any normal compressor could make it smaller on a second pass 0000 4D 00 17 1E 80 BF A4 1E 78 80 40 2E F0 22 5F CD *M.......x.@.."_.* 0010 4D 2D 80 05 CC 5F 26 80 0B 8F 40 5F D2 0F 3C 40 *M-..._&...@_..<@* 0020 20 17 78 11 2F E6 A6 96 C0 02 E6 2F 93 40 05 C7 * .x./....../.@..* 0030 A0 2F E9 07 9E 20 10 0B BC 08 97 F3 53 4B 60 01 *./... ......SK`.* 0040 73 17 C9 A0 02 E3 D0 17 F4 83 CF 10 08 05 DE 04 *s...............* 0050 4B F9 A9 A5 B0 00 B9 8B E4 D0 01 71 E8 0B FA 41 *K..........q...A* 0060 E7 88 04 02 EF 02 25 FC D4 D2 D8 00 5C C5 F2 68 *......%.....\..h* 0070 00 B8 F4 05 FD 20 F3 C4 02 01 77 81 12 FE 6A 69 *..... ....w...ji* 0080 6C 00 2E 62 F9 34 00 5C 7A 02 FE 90 79 E2 01 00 *l..b.4.\z...y...* 0090 BB C0 89 7F 35 34 B6 00 17 31 7C 9A 00 2E 3D 01 *....54...1|...=.* 00A0 7F 48 3C F1 00 80 5D E0 44 BF 9A 9A 5B 00 0B 98 *.H<...].D...[...* 00B0 BE 4D 00 17 1E 80 BF A4 1E 78 80 40 2E F0 22 5F *.M.......x.@.."_* 00C0 CD 4D 2D 80 05 CC 5F 26 80 0B 8F 40 5F D2 0F 3C *.M-..._&...@_..<* 00D0 40 20 17 78 11 2F E6 A6 96 C0 02 E6 2F 93 40 05 *@ .x./....../.@.* 00E0 C7 A0 2F E9 07 9E 20 10 0B BC 08 97 F3 53 4B 60 *../... ......SK`* 00F0 01 73 17 C9 A0 02 E3 D0 17 F4 83 CF 10 08 05 DE *.s..............* 0100 04 4B F9 A9 A5 B0 00 B9 8B E4 D0 01 71 E8 0B FA *.K..........q...* 0110 41 E7 88 04 02 EF 02 25 FC D4 D2 D8 00 5C C5 F0 *A......%.....\..* number of bytes is 288 same as file above change fist byte in second row from 4D to 4E 0000 4D 00 17 1E 80 BF A4 1E 78 80 40 2E F0 22 5F CD *M.......x.@.."_.* 0010 4E 2D 80 05 CC 5F 26 80 0B 8F 40 5F D2 0F 3C 40 *N-..._&...@_..<@* 0020 20 17 78 11 2F E6 A6 96 C0 02 E6 2F 93 40 05 C7 * .x./....../.@..* 0030 A0 2F E9 07 9E 20 10 0B BC 08 97 F3 53 4B 60 01 *./... ......SK`.* 0040 73 17 C9 A0 02 E3 D0 17 F4 83 CF 10 08 05 DE 04 *s...............* 0050 4B F9 A9 A5 B0 00 B9 8B E4 D0 01 71 E8 0B FA 41 *K..........q...A* 0060 E7 88 04 02 EF 02 25 FC D4 D2 D8 00 5C C5 F2 68 *......%.....\..h* 0070 00 B8 F4 05 FD 20 F3 C4 02 01 77 81 12 FE 6A 69 *..... ....w...ji* 0080 6C 00 2E 62 F9 34 00 5C 7A 02 FE 90 79 E2 01 00 *l..b.4.\z...y...* 0090 BB C0 89 7F 35 34 B6 00 17 31 7C 9A 00 2E 3D 01 *....54...1|...=.* 00A0 7F 48 3C F1 00 80 5D E0 44 BF 9A 9A 5B 00 0B 98 *.H<...].D...[...* 00B0 BE 4D 00 17 1E 80 BF A4 1E 78 80 40 2E F0 22 5F *.M.......x.@.."_* 00C0 CD 4D 2D 80 05 CC 5F 26 80 0B 8F 40 5F D2 0F 3C *.M-..._&...@_..<* 00D0 40 20 17 78 11 2F E6 A6 96 C0 02 E6 2F 93 40 05 *@ .x./....../.@.* 00E0 C7 A0 2F E9 07 9E 20 10 0B BC 08 97 F3 53 4B 60 *../... ......SK`* 00F0 01 73 17 C9 A0 02 E3 D0 17 F4 83 CF 10 08 05 DE *.s..............* 0100 04 4B F9 A9 A5 B0 00 B9 8B E4 D0 01 71 E8 0B FA *.K..........q...* 0110 41 E7 88 04 02 EF 02 25 FC D4 D2 D8 00 5C C5 F0 *A......%.....\..* number of bytes is 288 this show the huffman has some error recover 0000 54 48 45 51 55 43 49 4B 42 52 4F 57 4E 46 4F 58 *THEQUCIKBROWNFOX* 0010 4A 55 4D 50 53 4F 56 45 52 54 55 4F 52 45 5A 59 *JUMPSOVERTUOREZY* 0020 44 4F 47 54 48 45 51 55 43 49 4B 42 52 4F 57 4E *DOGTHEQUCIKBROWN* 0030 46 4F 58 4A 55 4D 50 53 4F 56 45 52 54 48 45 4C *FOXJUMPSOVERTHEL* 0040 41 5A 59 44 4F 47 54 48 45 51 55 43 49 4B 42 52 *AZYDOGTHEQUCIKBR* 0050 4F 57 4E 46 4F 58 4A 55 4D 50 53 4F 56 45 52 54 *OWNFOXJUMPSOVERT* 0060 48 45 4C 41 5A 59 44 4F 47 54 48 45 51 55 43 49 *HELAZYDOGTHEQUCI* 0070 4B 42 52 4F 57 4E 46 4F 58 4A 55 4D 50 53 4F 56 *KBROWNFOXJUMPSOV* 0080 45 52 54 48 45 4C 41 5A 59 44 4F 47 54 48 45 51 *ERTHELAZYDOGTHEQ* 0090 55 43 49 4B 42 52 4F 57 4E 46 4F 58 4A 55 4D 50 *UCIKBROWNFOXJUMP* 00A0 53 4F 56 45 52 54 48 45 4C 41 5A 59 44 4F 47 54 *SOVERTHELAZYDOGT* 00B0 48 45 51 55 43 49 4B 42 52 4F 57 4E 46 4F 58 4A *HEQUCIKBROWNFOXJ* 00C0 55 4D 50 53 4F 56 45 52 54 48 45 4C 41 5A 59 44 *UMPSOVERTHELAZYD* 00D0 4F 47 54 48 45 51 55 43 49 4B 42 52 4F 57 4E 46 *OGTHEQUCIKBROWNF* 00E0 4F 58 4A 55 4D 50 53 4F 56 45 52 54 48 45 4C 41 *OXJUMPSOVERTHELA* 00F0 5A 59 44 4F 47 54 48 45 51 55 43 49 4B 42 52 4F *ZYDOGTHEQUCIKBRO* 0100 57 4E 46 4F 58 4A 55 4D 50 53 4F 56 45 52 54 48 *WNFOXJUMPSOVERTH* 0110 45 4C 41 5A 59 44 4F 47 54 48 45 51 55 43 49 4B *ELAZYDOGTHEQUCIK* 0120 42 52 4F 57 4E 46 4F 58 4A 55 4D 50 53 4F 56 45 *BROWNFOXJUMPSOVE* 0130 52 54 48 45 4C 41 5A 59 44 4F 47 54 48 45 51 55 *RTHELAZYDOGTHEQU* 0140 43 49 4B 42 52 4F 57 4E 46 4F 58 4A 55 4D 50 53 *CIKBROWNFOXJUMPS* 0150 4F 56 45 52 54 48 45 4C 41 5A 59 44 4F 47 54 48 *OVERTHELAZYDOGTH* 0160 45 51 55 43 49 4B 42 52 4F 57 4E 46 4F 58 4A 55 *EQUCIKBROWNFOXJU* 0170 4D 50 53 4F 56 45 52 54 48 45 4C 41 5A 59 44 4F *MPSOVERTHELAZYDO* 0180 47 54 48 45 51 55 43 49 4B 42 52 4F 57 4E 46 4F *GTHEQUCIKBROWNFO* 0190 58 4A 55 4D 50 53 4F 56 45 52 54 48 45 4C 41 5A *XJUMPSOVERTHELAZ* 01A0 59 44 4F 47 54 48 45 51 55 43 49 4B 42 52 4F 57 *YDOGTHEQUCIKBROW* 01B0 4E 46 4F 58 4A 55 4D 50 53 4F 56 45 52 54 48 45 *NFOXJUMPSOVERTHE* 01C0 4C 41 5A 59 44 4F 47 . . . . . . . . . *LAZYDOG* number of bytes is 455Bottom line - If you're going to compress English text with a static huffman compressor, the bijective ones will always save you more space. If you don't care about encryption and want fast compression use huffman over arithmetic.
good luck,
David A. Scott