David Scott's BIJECTIVE STATIC ENTROPY COMPRESSION for FIRST ORDER ENGLISH WITHOUT SPACES
BIJECTIVE STATIC HUFFMAN and ARITHMETIC in ONE PROGRAM

files updated on June 25, 2002
This page describes a static entropy coder with first order frequencies based on ENGLISH. The complete package is in the bijective first order english entropy compressor so that any 8-bit binary file can be thought of as a compressed file made entirely of the letters:

"ABCDEFGHIJGKLMNOPQRSTUVWXYZ"

This showcases how huffman is really just a special case of arithmetic, and that the only real difference is in the weight given to each symbol. If the weights are a power of 2, then huffman compression occurs; if some entries are not a power of 2, then arithmetic compression results. The code is not meant for speed but only to show the two methods are essentially the same. If ari_huf is defined in code and huf_huf is not, then it uses the weights based on John Savard's table from Jim Gillogy. The program only changes the weights of the symbols used and its pure static arithmetic. The code does not test to see which set of weights are used. In fact, if you want, you can change the code a small bit using the huffman weights for a few character times then switch back to the other set. As long as the uncompressor switches tables at same time, it still works.

The arithmetic compressor, as stated, was based on trying to compress bijectively and to produce close to fairly optimal code when using the huffman positions in interval. The choice and technique may need more tuning, but it is bijective. The assigning of the values for the huffman code does make a difference in the length of compressed files. Here is a simple proof John Savard used:
H = 0001 L = 01011
I used one based on higher probable symbols having the most trailing zeros which leads to:
H = 0110 L = 10110
If I use his table, "HL" or "LH" would compress to a two byte file. If my choice is used, it only takes one byte. If one only looks at the first 7 most commonly used symbols, and takes any two pairs (49 cases), then my choices of huffmun would compress all the cases to a one byte file. His would compress all but 13 cases, so out the gate it's a bad choice.

I first line up the symbols in order of most common. Then, from the following sequences I assign using the appropriate lengths.
1000000.... tail of infinite zeros
0100000.....
1100000.....
0010000.....
1010000.....
0110000.....
1110000.....
0001000..... and etc.
this will make the most common symbol 1000.... a leading 1 followed by all zeros
this will make the least common symbol 000.... all zeroes
next to last symbol will be same as last symbol except last bit is a 1.
Here is the optimal static huffman table use in the code:

0  E = 100
1  T = 010
2  A = 1100
3  O = 0010
4  I = 1010
5  H = 0110
6  N = 1110   Note the most common symbols have trailing zeros
7  S = 0001
8  R = 1101
9  D = 00110
10 L = 10110
11 U = 01110
12 M = 11110
13 W = 00001
14 C = 00111
15 Y = 10111
16 F = 011110
17 G = 111110
18 P = 000001
19 B = 011111
20 V = 111111
21 K = 0000001
22 X = 00000001
23 J = 000000001  
24 Q = 0000000001 Note only Q and Z can't be compressed to a singe byte
25 Z = 0000000000
Some notes about bijective huffman compression - if the compressed
file is one byte long it could contain any letter but Q or Z.

 The huffman table was made from the following weights


E 12.32    S  6.28    C  2.48    K  0.80
T  9.05    R  5.72    Y  2.11    X  0.15
A  8.17    D  4.31    F  2.09    J  0.10
O  7.81    L  3.97    G  1.82    Q  0.09
I  6.89    U  3.04    P  1.56    Z  0.05
H  6.68    M  2.77    B  1.45
N  6.62    W  2.64    V  1.02

>From John Savard's site and attributed to Jim Gillogly.
since integer weights needed to be multiplied by 100 and it
added up to 9999 when the symbol huf_huf is defined amd ari_huf is not, I used
the power of two weights to get it to behave as a huffman.
the bottom weights where 1, next level 2, and so on for
a total weight of 1024

There are several ways to make a file bijective. Here is one way to do it:
Assign the huffman codes in the optimal order from the previous table. First, let's pretend the file to compress does not contain a "Z". Then, every symbol when compressed will have at least one "1" bit. You just write the file and leave off trailing "0"s. That way the last BYTE of file will not contain the all zero byte case. This is in fact what happens to the file when it does not contain a "Z", so it's easy to check. Next, what if file contains a "Z"

I could (but this is wrong) just add and EOF symbol that is all zeros so the huffman table would be as before, but the "Z" entry would be one bit longer and end in a "1" bit. This would mean every time "Z" is used, the file length will increase by one bit. Eight "Z"s and you have added another byte in the output file, so this is hardly optimal.
Instead you use the table as is. Only if the "all zero compression token occurs last" do you tack on an extra bit. This saves a lot of space, but it's not done yet. There is still the problem that this can't be bijective since the last byte will always have at least one bit set. To be bijective, this will not include all files even those that end in all zeroes.
At this point, I will explain two schemes: (i) the one I used and (ii) a better one that I might add to a new package but will take more code.

First, the one I used from the Huffman point of view:
If the input file ends " in all zero token " the "Z" I add a "E" to the file. if the file ends in a string of "Z" followed by one or more "E" I add an extra "E" that way the all zero token is never last and the operation is so far reversible.
Next, I have to write the output file. I do the same thing in reverse, but work on full bytes. I group the files in byte chunks with any trailing zeroes dropped. At this point, the file can't end in the all zero byte, but if it ends in something other than 0x80 you're done. If it ends in a tail with all zero bytes followed by one or more 0x80 bytes you drop the last 0x80 byte and you're done.

Now, from the arithmetic point of view:
It's just like the Huffman except the "Z" symbol is usually not the all "zero symbol" most of the time. It's just the way Arithmetic goes about updating "high and "low". The "Z" symbol appears at a spot where it's assigned the all zero symbol, but this is generally not the case with pure Arithmetic since after each symbol is processed the high and low are random looking values. While in the pure huffman the high ends up at maxhigh and low at zero. This is where the 2 methods sort of appear different because of the non power of two weights. You will see the effect of this later. Anyway, in the Huffman when a "the all zero symbol" -which now may not be "Z"- occurs, at end an "E" is forced to follow. It's the same in pure arithmetic except in huffman it was 3 bits of "100". In arithmetic, I force the most common symbol E to be the last but on average it's not 3 bits. Sometimes 2, sometimes 4, it varies and it's seldom "100", it's more like "CC"

Another more optimal way to end the static huffman file is as follows: if it ends in "Z" go to "ZE" as before, but if it ends in "ZE" got to "ZT". That is, instead of tacking on an extra "E" each time, bump up to the next most common symbol. After a while, "ZX" goes to "ZQ" and "ZQ" goes to "ZQE" and the bad tail "ZQQQE" goes to "ZQQQT" and so on. The "flag" of doing something is only on if a string of "Q"s follows the "Z". This greatly adds to the complexity of code. I would do the same thing to bytes on final conversion. This code is meant only as a learning tool. The alternate ending though more optimal only saves space. Sometimes if the file ends in "ZEE" it loses some space in cases like "ZK" which map to "ZX" instead of leaving it alone. I plan to write code using what I think are more optimal for the huffman at a later time. The goal here was to write a simple bijective huffman-arithmetic coder.

One of internal differences in this set of code is the fact that in the high-low updating routines everything is done with 62 bit arithmetic. The huffman always ends up outputting the tokens "0" and "1" and the high ends up max and the low ends up at zero. And the free end value tacked in is "00". On the other hand, in the arithmetic the inner routine outputs the tokens "0" "1" and "C" The " C" is like half way in between and will either become a "0" or "1" at a latter time. It's like the coding of future symbols yet to come affect what it will actually become. Sometimes it becomes a "0" and sometimes a "1". The high and low states take on random looking values after each processing. The High will always start with a "1" bit and the low with a "0" bit the difference between the two values will vary from the full length of table to just a tiny bit larger than one fourth the window. So the state of compression at any bit is the place in output string and the value of the high and low. The free ends used with this arithmetic are "00" "10" "01" "11" these should be all that are needed since the most common symbol is small enough. If it was a larger weight more ends or a different ending method would be needed. I feel for arithmetic you're better off with 25 2 state 62bit bijective encoders. Yes, I will eventually make a web page for it too. This is just a simple exercise to get a matching true huffman so one can see the real differences between what we call huffman or arithmetic and yet do it in a clean bijective way. The way it should be done. This means pure arithmetic is better suited for compression before encryption if the first part of a huffman compressed file is lost. It would be easy for an enemy to recover the lost part since the string is only a function of trailing bits, while in arithmetic it is also a function of the hidden 'high and "low" at that particular point in the file, and they could be almost anything. So it's like having an extra secret key worth over one hundred bits of key space.

Another point is that this huffman compression is very very slow. It's done only to show that you could use a arithmetic compressor to do the huffman and it's for comparison purposes.

Which is Better?

It depends on what you want. They both pack fully to the output file and are fully bijective. Which is best as far as length goes depends on which set of weights more closely match the data you are compressing. If you want error recovery, then the huffman is for you. If you want speed, the huffman is also for you - just use a faster program than this one. If you want to encrypt later down the line, the arithmetic is for you. As it stands, the huffman table value used allows for very nice optimal endings, while the basic arithmetic is only the fallout and better bijective endings are possible. I have other bijective english encoders that use N-1 2 state 62 bit arithmetic encoders. This code is based on a single N state 62 bit arithmetic encoder for comparison purposes.

test file TW1  These are the only characters these compressors work with
0000  41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 50  *ABCDEFGHIJKLMNOP*
0010  51 52 53 54 55 56 57 58 59 5A  .  .  .  .  .  .  *QRSTUVWXYZ*
 number of bytes is 26 
huffman compress of TW1 
0000  C7 CE 68 F7 CD 40 10 36 F7 10 20 0E 8A 77 E1 01  *..h..@.6.. ..w..*
0010  B8 01  .  .  .  .  .  .  .  .  .  .  .  .  .  .  *..*
 number of bytes is 18 
huffman uncompress of TW1 
0000  54 57 54 53 4F 53 49 4F 4F 4F 49 44 54 43 54 54  *TWTSOSIOOOIDTCTT*
0010  53 4F 54 45 49 54 54 52 44 4F 48 49 55 54 46 49  *SOTEITTRDOHIUTFI*
0020  57 54 4F 49 54 54 45 52 54 45 54 49 49 4C 54 59  *WTOITTERTETIILTY*
0030  54 41 4F 41 49 52  .  .  .  .  .  .  .  .  .  .  *TAOAIR*
 number of bytes is 54 
arithmetic compress of TW1 
0000  C7 0F 29 B0 7B 29 D7 A3 A0 C9 0E 38 2F 68 70 96  *..).{).....8/hp.*
0010  21 BC 40  .  .  .  .  .  .  .  .  .  .  .  .  .  *!.@*
 number of bytes is 19 
arithmetic uncompress of TW1 
0000  43 4C 4F 52 57 52 4E 45 54 47 48 54 55 41 4E 57  *CLORWRNETGHTUANW*
0010  57 52 45 48 54 48 45 48 52 53 54 54 54 57 54 4C  *WREHTHEHRSTTTWTL*
0020  4E 50 45 4C 48 48 41 49 54 4F 4F 41 4E 4F 41 52  *NPELHHAITOOANOAR*
0030  4C 45 48 43  .  .  .  .  .  .  .  .  .  .  .  .  *LEHC*
 number of bytes is 52 


test file TW2  on compression the last byte a function of number of Z's
0000  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
0010  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
0020  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
0030  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
0040  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
 number of bytes is 80 
huffman compress of TW2 matches "Z" = 0000000000  10 zeros
0000  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0010  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0020  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0030  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0040  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0060  00 00 00 00  .  .  .  .  .  .  .  .  .  .  .  .  *....*
 number of bytes is 100 
arithmetic compress of TW2 note last byte not zero
0000  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0010  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0020  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0030  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0040  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0060  00 00 00 00 00 00 00 00 00 00 00 00 00 04  .  .  *..............*
 number of bytes is 110 
arithmetic uncompress of the huffman compress of TW2
done to show last byte not Z
0000  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
0010  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
0020  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
0030  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
0040  5A 5A 5A 5A 5A 5A 5A 5A 5A 4C  .  .  .  .  .  .  *ZZZZZZZZZL*
 number of bytes is 74 

test file TW3 like TW2 but one byte more zeros
0000  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
0010  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
0020  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
0030  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
0040  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
0050  5A  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  *Z*
 number of bytes is 81 
huffman compress of TW3 note last byte not zero
0000  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0010  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0020  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0030  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0040  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0060  00 00 00 00 00 20  .  .  .  .  .  .  .  .  .  .  *..... *
 number of bytes is 102 
arithmetic compress of TW3
0000  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0010  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0020  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0030  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0040  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0060  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  .  *...............*
 number of bytes is 111 
huffman uncompress of the arithmetic compress of TW3
done to show last byte not Z
0000  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
0010  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
0020  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
0030  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
0040  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
0050  5A 5A 5A 5A 5A 5A 5A 5A 4A  .  .  .  .  .  .  .  *ZZZZZZZZJ*
 number of bytes is 89 
**note that the huffman compress a stream of Z's 
smaller than pure arithmetic that's because huffman used
a weight of 1/1024 = .00098 while arithmetic used
a weight of 5/9999 = .0005 so it uses more space to
write the zero since weights used in arithmetic assume
Z rarer than in the huffman for weights used in huffman code.
If you change the weights you get different answers

test file TW4 it's the one that tends to compress to  all 1's
but again ending depends on number of bytes of all 1's
and again the arithmetic assumes V is rarer so huffman
beats it.
0000  56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56  *VVVVVVVVVVVVVVVV*
0010  56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56  *VVVVVVVVVVVVVVVV*
0020  56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56  *VVVVVVVVVVVVVVVV*
0030  56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56  *VVVVVVVVVVVVVVVV*
0040  56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56  *VVVVVVVVVVVVVVVV*
 number of bytes is 80 
huffman compress of TW4 not I = 111111 which is 6 1's
0000  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF  *................*
0010  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF  *................*
0020  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF  *................*
0030  FF FF FF FF FF FF FF FF FF FF FF FF  .  .  .  .  *............*
 number of bytes is 60 
arithmetic compress of TW4
0000  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF  *................*
0010  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF  *................*
0020  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF  *................*
0030  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF  *................*
0040  FF FF C0  .  .  .  .  .  .  .  .  .  .  .  .  .  *...*
 number of bytes is 67 

This next file TW5 is all F's but it shows two points
one arithmetic values F higher than huffman so it will
be shorter. Second since F not at end of interval like
Z (the all zero case) or V (the all one case) it
will look nice when compressed by huffman but will look
random in the arithmetic due to internal state of high and low
0000  46 46 46 46 46 46 46 46 46 46 46 46 46 46 46 46  *FFFFFFFFFFFFFFFF*
0010  46 46 46 46 46 46 46 46 46 46 46 46 46 46 46 46  *FFFFFFFFFFFFFFFF*
0020  46 46 46 46 46 46 46 46 46 46 46 46 46 46 46 46  *FFFFFFFFFFFFFFFF*
0030  46 46 46 46 46 46 46 46 46 46 46 46 46 46 46 46  *FFFFFFFFFFFFFFFF*
0040  46 46 46 46 46 46 46 46 46 46 46 46 46 46 46 46  *FFFFFFFFFFFFFFFF*
 number of bytes is 80 
huffman compress of TW5 note how redunant
0000  79 E7 9E 79 E7 9E 79 E7 9E 79 E7 9E 79 E7 9E 79  *y..y..y..y..y..y*
0010  E7 9E 79 E7 9E 79 E7 9E 79 E7 9E 79 E7 9E 79 E7  *..y..y..y..y..y.*
0020  9E 79 E7 9E 79 E7 9E 79 E7 9E 79 E7 9E 79 E7 9E  *.y..y..y..y..y..*
0030  79 E7 9E 79 E7 9E 79 E7 9E 79 E7 9E  .  .  .  .  *y..y..y..y..*
 number of bytes is 60 
arithmetic compress of TW5 note whole thing random looking
0000  75 C6 93 95 F0 6A 45 24 B7 73 44 EF 03 4E 4E 1C  *u....jE$.sD..NN.*
0010  C6 74 63 DE 70 FC 4F 0F 79 6A A2 0F F7 33 AD 98  *.tc.p.O.yj...3..*
0020  3B 7A 20 6B 02 99 09 E1 3B C8 41 36 17 75 A2 64  *;z k....;.A6.u.d*
0030  09 C7 7B 45 49 21 EE B2  .  .  .  .  .  .  .  .  *..{EI!..*
 number of bytes is 56 

the next two sets show what happens if Z the all zero case
or V the all one case occurs in a long stretch but is
not the first character. It's really meant to demonstrate the
random look of the arithmetic which is due to the
continually changing internal high and low state. When weights
pure huffman like (powers of 2 and placed in correct slots)
the high is always max at end of character and low is always
zero. This is not the case with arithmetic in general
test file TW6 
0000  46 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *FZZZZZZZZZZZZZZZ*
0010  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
0020  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
0030  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
0040  5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A  *ZZZZZZZZZZZZZZZZ*
 number of bytes is 80 
huffman compress of TW6 note only first byte not zero
0000  78 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *x...............*
0010  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0020  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0030  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0040  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0060  00 00 00 08  .  .  .  .  .  .  .  .  .  .  .  .  *....*
 number of bytes is 100 
arithmetic compress of TW6 note not till 9th byte did it settle to zero
0000  73 50 5D CE CA B8 98 30 00 00 00 00 00 00 00 00  *sP]....0........*
0010  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0020  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0030  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0040  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  *................*
0060  00 00 00 00 00 00 00 00 00 00 00 00 00 A0  .  .  *..............*
 number of bytes is 110 
arithmetic uncompress of huffman compress of TW6
0000  46 52 4D 41 41 54 4F 4E 4E 4C 4C 57 53 48 48 41  *FRMAATONNLLWSHHA*
0010  49 48 49 4C 4E 54 4D 45 45 48 4F 45 50 45 45 53  *IHILNTMEEHOEPEES*
0020  4E 46 54 4F 4F 52 49 41 52 4F 44 41 52 45 45 57  *NFTOORIARODAREEW*
0030  41 54 4B 48 48 48 45 56 45 59 45 4C 41 50 49 4F  *ATKHHHEVEYELAPIO*
0040  46 41 4E 4E 48 54 4D 47 54 41 4E 4E 54 41 53 54  *FANNHTMGTANNTAST*
0050  57 44 4E 4F 54 45 4E 59 54 4D 49 4B 4C 54 45 46  *WDNOTENYTMIKLTEF*
0060  45 45 4E 57 53 49 54 4D 49 45 53 4C 50 41 48 43  *EENWSITMIESLPAHC*
0070  4F 53 57 49 42 49 55 45 45 54 48 54 41 41 52 49  *OSWIBIUEETHTAARI*
0080  45 45 55 41 55 55 4F 4C 4E 54 41 4F 49 54 45 48  *EEUAUUOLNTAOITEH*
0090  44 4E 47 4E 49 46 4C 49 52 44 54 47 4C 54 42 53  *DNGNIFLIRDTGLTBS*
00A0  45 4C 55 52 45 41 54 48 56 49 53 49 49 45 4D 55  *ELUREATHVISIIEMU*
00B0  4F 53 4C 45 52 41 4C 45 41 52 58 54 53 53 53 49  *OSLERALEARXTSSSI*
00C0  4E  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  *N*
 number of bytes is 193 
The above looks nasty, but if you compress it, it then 
goes back to exactly the huffman compressed file of TW6

TW7
0000  46 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56  *FVVVVVVVVVVVVVVV*
0010  56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56  *VVVVVVVVVVVVVVVV*
0020  56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56  *VVVVVVVVVVVVVVVV*
0030  56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56  *VVVVVVVVVVVVVVVV*
0040  56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56  *VVVVVVVVVVVVVVVV*
 number of bytes is 80 
huffman compress of TW7
0000  7B FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF  *{...............*
0010  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF  *................*
0020  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF  *................*
0030  FF FF FF FF FF FF FF FF FF FF FF FF  .  .  .  .  *............*
 number of bytes is 60 
arithmetic compress of TW7
0000  78 AA 34 B0 A8 5C C8 9F FF FF FF FF FF FF FF FF  *x.4..\..........*
0010  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF  *................*
0020  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF  *................*
0030  FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF  *................*
0040  FF FF 80  .  .  .  .  .  .  .  .  .  .  .  .  .  *...*
 number of bytes is 67 
arithmetic uncompress of huffman compress of TW7
0000  42 4E 54 45 45 54 4E 41 4E 44 50 43 54 4E 4C 4F  *BNTEETNANDPCTNLO*
0010  54 4E 41 4F 4E 54 42 4E 44 55 45 48 43 56 43 47  *TNAONTBNDUEHCVCG*
0020  47 4F 49 56 44 54 41 44 45 48 54 49 45 4C 54 44  *GOIVDTADEHTIELTD*
0030  56 44 55 4E 59 49 43 49 54 59 45 41 54 57 4F 42  *VDUNYICITYEATWOB*
0040  4F 43 48 47 54 54 53 47 45 41 49 50 44 49 48 45  *OCHGTTSGEAIPDIHE*
0050  49 45 45 49 53 48 54 54 45 4E 49 52 54 47 41 54  *IEEISHTTENIRTGAT*
0060  4F 47 41 4D 54 45 4F 59 43 41 52 52 46 59 54 49  *OGAMTEOYCARRFYTI*
0070  45 4C  .  .  .  .  .  .  .  .  .  .  .  .  .  .  *EL*
 number of bytes is 114 

The last test is a test of error recovery. With arithmetic
forget it. With huffman you've got a shot at it. 
test file TW8
0000  54 48 45 20 51 55 43 49 4B 20 42 52 4F 57 4E 20  *THE QUCIK BROWN *
0010  46 4F 58 20 4A 55 4D 50 53 20 4F 56 45 52 20 54  *FOX JUMPS OVER T*
0020  48 45 20 4C 41 5A 59 20 44 4F 47 0D 0A 54 48 45  *HE LAZY DOG..THE*
0030  20 51 55 43 49 4B 20 42 52 4F 57 4E 20 46 4F 58  * QUCIK BROWN FOX*
0040  20 4A 55 4D 50 53 20 4F 56 45 52 20 54 48 45 20  * JUMPS OVER THE *
0050  4C 41 5A 59 20 44 4F 47 0D 0A 54 48 45 20 51 55  *LAZY DOG..THE QU*
0060  43 49 4B 20 42 52 4F 57 4E 20 46 4F 58 20 4A 55  *CIK BROWN FOX JU*
0070  4D 50 53 20 4F 56 45 52 20 54 48 45 20 4C 41 5A  *MPS OVER THE LAZ*
0080  59 20 44 4F 47 0D 0A 54 48 45 20 51 55 43 49 4B  *Y DOG..THE QUCIK*
0090  20 42 52 4F 57 4E 20 46 4F 58 20 4A 55 4D 50 53  * BROWN FOX JUMPS*
00A0  20 4F 56 45 52 20 54 48 45 20 4C 41 5A 59 20 44  * OVER THE LAZY D*
00B0  4F 47 0D 0A 54 48 45 20 51 55 43 49 4B 20 42 52  *OG..THE QUCIK BR*
00C0  4F 57 4E 20 46 4F 58 20 4A 55 4D 50 53 20 4F 56  *OWN FOX JUMPS OV*
00D0  45 52 20 54 48 45 20 4C 41 5A 59 20 44 4F 47 0D  *ER THE LAZY DOG.*
00E0  0A 54 48 45 20 51 55 43 49 4B 20 42 52 4F 57 4E  *.THE QUCIK BROWN*
00F0  20 46 4F 58 20 4A 55 4D 50 53 20 4F 56 45 52 20  * FOX JUMPS OVER *
0100  54 48 45 20 4C 41 5A 59 20 44 4F 47 0D 0A 54 48  *THE LAZY DOG..TH*
0110  45 20 51 55 43 49 4B 20 42 52 4F 57 4E 20 46 4F  *E QUCIK BROWN FO*
0120  58 20 4A 55 4D 50 53 20 4F 56 45 52 20 54 48 45  *X JUMPS OVER THE*
0130  20 4C 41 5A 59 20 44 4F 47 0D 0A 54 48 45 20 51  * LAZY DOG..THE Q*
0140  55 43 49 4B 20 42 52 4F 57 4E 20 46 4F 58 20 4A  *UCIK BROWN FOX J*
0150  55 4D 50 53 20 4F 56 45 52 20 54 48 45 20 4C 41  *UMPS OVER THE LA*
0160  5A 59 20 44 4F 47 0D 0A 54 48 45 20 51 55 43 49  *ZY DOG..THE QUCI*
0170  4B 20 42 52 4F 57 4E 20 46 4F 58 20 4A 55 4D 50  *K BROWN FOX JUMP*
0180  53 20 4F 56 45 52 20 54 48 45 20 4C 41 5A 59 20  *S OVER THE LAZY *
0190  44 4F 47 0D 0A 54 48 45 20 51 55 43 49 4B 20 42  *DOG..THE QUCIK B*
01A0  52 4F 57 4E 20 46 4F 58 20 4A 55 4D 50 53 20 4F  *ROWN FOX JUMPS O*
01B0  56 45 52 20 54 48 45 20 4C 41 5A 59 20 44 4F 47  *VER THE LAZY DOG*
01C0  0D 0A 54 48 45 20 51 55 43 49 4B 20 42 52 4F 57  *..THE QUCIK BROW*
01D0  4E 20 46 4F 58 20 4A 55 4D 50 53 20 4F 56 45 52  *N FOX JUMPS OVER*
01E0  20 54 48 45 20 4C 41 5A 59 20 44 4F 47 0D 0A 54  * THE LAZY DOG..T*
01F0  48 45 20 51 55 43 49 4B 20 42 52 4F 57 4E 20 46  *HE QUCIK BROWN F*
0200  4F 58 20 4A 55 4D 50 53 20 4F 56 45 52 20 54 48  *OX JUMPS OVER TH*
0210  45 20 4C 41 5A 59 20 44 4F 47 0D 0A 54 48 45 20  *E LAZY DOG..THE *
0220  51 55 43 49 4B 20 42 52 4F 57 4E 20 46 4F 58 20  *QUCIK BROWN FOX *
0230  4A 55 4D 50 53 20 4F 56 45 52 20 54 48 45 20 4C  *JUMPS OVER THE L*
0240  41 5A 59 20 44 4F 47 0D 0A  .  .  .  .  .  .  .  *AZY DOG..*
 number of bytes is 585 
This is TW8 with only characters "A through Z" rest not used
and this is what comes back after you compress then uncompress
0000  54 48 45 51 55 43 49 4B 42 52 4F 57 4E 46 4F 58  *THEQUCIKBROWNFOX*
0010  4A 55 4D 50 53 4F 56 45 52 54 48 45 4C 41 5A 59  *JUMPSOVERTHELAZY*
0020  44 4F 47 54 48 45 51 55 43 49 4B 42 52 4F 57 4E  *DOGTHEQUCIKBROWN*
0030  46 4F 58 4A 55 4D 50 53 4F 56 45 52 54 48 45 4C  *FOXJUMPSOVERTHEL*
0040  41 5A 59 44 4F 47 54 48 45 51 55 43 49 4B 42 52  *AZYDOGTHEQUCIKBR*
0050  4F 57 4E 46 4F 58 4A 55 4D 50 53 4F 56 45 52 54  *OWNFOXJUMPSOVERT*
0060  48 45 4C 41 5A 59 44 4F 47 54 48 45 51 55 43 49  *HELAZYDOGTHEQUCI*
0070  4B 42 52 4F 57 4E 46 4F 58 4A 55 4D 50 53 4F 56  *KBROWNFOXJUMPSOV*
0080  45 52 54 48 45 4C 41 5A 59 44 4F 47 54 48 45 51  *ERTHELAZYDOGTHEQ*
0090  55 43 49 4B 42 52 4F 57 4E 46 4F 58 4A 55 4D 50  *UCIKBROWNFOXJUMP*
00A0  53 4F 56 45 52 54 48 45 4C 41 5A 59 44 4F 47 54  *SOVERTHELAZYDOGT*
00B0  48 45 51 55 43 49 4B 42 52 4F 57 4E 46 4F 58 4A  *HEQUCIKBROWNFOXJ*
00C0  55 4D 50 53 4F 56 45 52 54 48 45 4C 41 5A 59 44  *UMPSOVERTHELAZYD*
00D0  4F 47 54 48 45 51 55 43 49 4B 42 52 4F 57 4E 46  *OGTHEQUCIKBROWNF*
00E0  4F 58 4A 55 4D 50 53 4F 56 45 52 54 48 45 4C 41  *OXJUMPSOVERTHELA*
00F0  5A 59 44 4F 47 54 48 45 51 55 43 49 4B 42 52 4F  *ZYDOGTHEQUCIKBRO*
0100  57 4E 46 4F 58 4A 55 4D 50 53 4F 56 45 52 54 48  *WNFOXJUMPSOVERTH*
0110  45 4C 41 5A 59 44 4F 47 54 48 45 51 55 43 49 4B  *ELAZYDOGTHEQUCIK*
0120  42 52 4F 57 4E 46 4F 58 4A 55 4D 50 53 4F 56 45  *BROWNFOXJUMPSOVE*
0130  52 54 48 45 4C 41 5A 59 44 4F 47 54 48 45 51 55  *RTHELAZYDOGTHEQU*
0140  43 49 4B 42 52 4F 57 4E 46 4F 58 4A 55 4D 50 53  *CIKBROWNFOXJUMPS*
0150  4F 56 45 52 54 48 45 4C 41 5A 59 44 4F 47 54 48  *OVERTHELAZYDOGTH*
0160  45 51 55 43 49 4B 42 52 4F 57 4E 46 4F 58 4A 55  *EQUCIKBROWNFOXJU*
0170  4D 50 53 4F 56 45 52 54 48 45 4C 41 5A 59 44 4F  *MPSOVERTHELAZYDO*
0180  47 54 48 45 51 55 43 49 4B 42 52 4F 57 4E 46 4F  *GTHEQUCIKBROWNFO*
0190  58 4A 55 4D 50 53 4F 56 45 52 54 48 45 4C 41 5A  *XJUMPSOVERTHELAZ*
01A0  59 44 4F 47 54 48 45 51 55 43 49 4B 42 52 4F 57  *YDOGTHEQUCIKBROW*
01B0  4E 46 4F 58 4A 55 4D 50 53 4F 56 45 52 54 48 45  *NFOXJUMPSOVERTHE*
01C0  4C 41 5A 59 44 4F 47  .  .  .  .  .  .  .  .  .  *LAZYDOG*
 number of bytes is 455 
arithmetic compress of TW8 note looks random
0000  4C 31 D9 CC 49 B0 24 9F A0 21 DD 03 1B AB F9 B1  *L1..I.$..!......*
0010  A0 9C 79 22 70 A2 2D A3 27 9A ED 52 BF 0E 1E 36  *..y"p.-.'..R...6*
0020  19 EF DB 4C 4B 21 4C F6 73 9B 95 84 F9 45 81 DA  *...LK!L.s....E..*
0030  31 8F 37 B0 21 CA 55 EE 74 A4 F7 83 55 0B EB 23  *1.7.!.U.t...U..#*
0040  D6 CD A1 FF 07 50 5F 93 F9 07 5D B0 71 3C 04 77  *.....P_...].q<.w*
0050  E4 18 1E 1D 04 DC A1 64 B4 87 51 BB 03 AC EF 76  *.......d..Q....v*
0060  2F 53 3E 77 CD 3F C6 CD 59 13 90 FE 79 06 30 38  */S>w.?..Y...y.08*
0070  28 F5 07 DC F6 22 93 36 CA 75 F1 BD 4E 00 63 8B  *(....".6.u..N.c.*
0080  11 FF FF 10 B4 12 30 48 1A A5 A4 BA 9E 05 10 D2  *......0H........*
0090  C7 80 94 81 79 AB 55 7C 54 97 D2 AF 51 70 FE 9C  *....y.U|T...Qp..*
00A0  9A C3 AD BE 67 4F 70 54 99 F8 CA 79 A5 E7 C8 FF  *....gOpT...y....*
00B0  39 ED E3 3C D0 47 9C B4 FD 10 80 11 21 7A 19 25  *9..<.G......!z.%*
00C0  B4 9B B6 0E 51 AC DF EA 44 DB E2 E1 B0 C8 F2 6A  *....Q...D......j*
00D0  DB 45 4C 25 2D 98 F3 99 72 F1 26 A5 21 49 F5 7E  *.EL%-...r.&.!I.~*
00E0  31 3A ED C2 9F E2 BF 5C 71 68 86 22 22 38 39 18  *1:.....\qh.""89.*
00F0  F1 88 91 C0 01 09 9E 9F 95 9C 23 76 F5 9E B2 3B  *..........#v...;*
0100  09 E3 66 BD 94 53 DC 5D EB DF D5 4A 26 0B C7 87  *..f..S.]...J&...*
0110  2C C1 79 68 B3 78 2C D4 E2 A2 07 C7 F8 C8 B3 33  *,.yh.x,........3*
0120  D2 F7 82 92 62  .  .  .  .  .  .  .  .  .  .  .  *....b*
 number of bytes is 293 
file just like previous but A0 in second row changed to
A1 to see effect of one bit change on decompression 
0000  4C 31 D9 CC 49 B0 24 9F A0 21 DD 03 1B AB F9 B1  *L1..I.$..!......*
0010  A1 9C 79 22 70 A2 2D A3 27 9A ED 52 BF 0E 1E 36  *..y"p.-.'..R...6*
0020  19 EF DB 4C 4B 21 4C F6 73 9B 95 84 F9 45 81 DA  *...LK!L.s....E..*
0030  31 8F 37 B0 21 CA 55 EE 74 A4 F7 83 55 0B EB 23  *1.7.!.U.t...U..#*
0040  D6 CD A1 FF 07 50 5F 93 F9 07 5D B0 71 3C 04 77  *.....P_...].q<.w*
0050  E4 18 1E 1D 04 DC A1 64 B4 87 51 BB 03 AC EF 76  *.......d..Q....v*
0060  2F 53 3E 77 CD 3F C6 CD 59 13 90 FE 79 06 30 38  */S>w.?..Y...y.08*
0070  28 F5 07 DC F6 22 93 36 CA 75 F1 BD 4E 00 63 8B  *(....".6.u..N.c.*
0080  11 FF FF 10 B4 12 30 48 1A A5 A4 BA 9E 05 10 D2  *......0H........*
0090  C7 80 94 81 79 AB 55 7C 54 97 D2 AF 51 70 FE 9C  *....y.U|T...Qp..*
00A0  9A C3 AD BE 67 4F 70 54 99 F8 CA 79 A5 E7 C8 FF  *....gOpT...y....*
00B0  39 ED E3 3C D0 47 9C B4 FD 10 80 11 21 7A 19 25  *9..<.G......!z.%*
00C0  B4 9B B6 0E 51 AC DF EA 44 DB E2 E1 B0 C8 F2 6A  *....Q...D......j*
00D0  DB 45 4C 25 2D 98 F3 99 72 F1 26 A5 21 49 F5 7E  *.EL%-...r.&.!I.~*
00E0  31 3A ED C2 9F E2 BF 5C 71 68 86 22 22 38 39 18  *1:.....\qh.""89.*
00F0  F1 88 91 C0 01 09 9E 9F 95 9C 23 76 F5 9E B2 3B  *..........#v...;*
0100  09 E3 66 BD 94 53 DC 5D EB DF D5 4A 26 0B C7 87  *..f..S.]...J&...*
0110  2C C1 79 68 B3 78 2C D4 E2 A2 07 C7 F8 C8 B3 33  *,.yh.x,........3*
0120  D2 F7 82 92 62  .  .  .  .  .  .  .  .  .  .  .  *....b*
 number of bytes is 293 
decompress of previous file note no error recover on
a one bit error so if error recover needed it's dangerous
0000  54 48 45 51 55 43 49 4B 42 52 4F 57 4E 46 4F 58  *THEQUCIKBROWNFOX*
0010  4A 55 4D 50 53 4F 56 45 52 54 41 53 48 47 57 53  *JUMPSOVERTASHGWS*
0020  52 4C 4F 54 45 55 54 4F 46 45 4F 4D 48 4F 57 57  *RLOTEUTOFEOMHOWW*
0030  4C 48 42 54 47 57 4C 50 49 4F 4E 4F 55 54 54 41  *LHBTGWLPIONOUTTA*
0040  4C 42 4F 48 53 49 52 46 49 42 41 47 4F 44 41 52  *LBOHSIRFIBAGODAR*
0050  48 54 48 4F 45 4D 56 45 4E 46 4F 47 54 41 45 54  *HTHOEMVENFOGTAET*
0060  49 48 45 4E 4F 53 4E 54 49 48 49 59 4E 45 49 57  *IHENOSNTIHIYNEIW*
0070  54 47 43 50 59 52 4F 4C 46 50 57 4E 57 45 4C 54  *TGCPYROLFPWNWELT*
0080  50 44 48 44 4E 45 57 4C 45 52 41 49 44 49 53 49  *PDHDNEWLERAIDISI*
0090  52 50 57 59 4F 43 45 53 4E 49 4E 43 56 59 41 54  *RPWYOCESNINCVYAT*
00A0  4E 54 42 54 49 43 4E 54 57 4E 53 47 44 41 55 45  *NTBTICNTWNSGDAUE*
00B0  4E 45 49 4B 4B 44 47 59 49 4E 54 4F 4B 55 49 55  *NEIKKDGYINTOKUIU*
00C0  41 4E 4F 45 41 4E 52 46 49 54 56 45 57 48 4C 45  *ANOEANRFITVEWHLE*
00D0  41 41 4F 41 56 41 45 47 49 45 41 4C 4E 47 59 54  *AAOAVAEGIEALNGYT*
00E0  48 52 41 4F 43 44 57 48 42 52 48 4D 50 4F 42 57  *HRAOCDWHBRHMPOBW*
00F0  45 4F 49 4C 45 41 45 41 4E 4F 4F 41 4B 4B 49 4F  *EOILEAEANOOAKKIO*
0100  4F 4B 52 50 4F 4E 54 52 4F 4D 53 52 49 41 49 49  *OKRPONTROMSRIAII*
0110  49 4E 54 48 4C 54 4E 41 4E 54 59 54 54 49 54 43  *INTHLTNANTYTTITC*
0120  50 53 46 54 4F 53 54 4C 49 47 49 49 4E 45 41 54  *PSFTOSTLIGIINEAT*
0130  45 53 4C 59 53 54 50 41 4B 48 4F 54 4E 52 4E 4B  *ESLYSTPAKHOTNRNK*
0140  4C 52 4E 52 52 4F 54 54 49 44 4F 54 53 4F 4E 41  *LRNRROTTIDOTSONA*
0150  57 46 55 48 53 44 47 43 45 41 44 54 44 44 49 45  *WFUHSDGCEADTDDIE*
0160  45 49 54 48 4F 57 45 41 45 4F 4D 54 44 48 41 4D  *EITHOWEAEOMTDHAM*
0170  52 4E 4F 41 4E 52 4C 54 53 45 52 54 42 54 44 41  *RNOANRLTSERTBTDA*
0180  57 41 45 48 4F 4F 4E 42 50 4C 4C 4C 45 4E 4F 46  *WAEHOONBPLLLENOF*
0190  4F 4F 53 49 45 57 41 45 48 45 54 4D 4C 48 49 4C  *OOSIEWAEHETMLHIL*
01A0  41 45 45 4D 46 48 46 41 45 50 44 4F 52 4D 45 54  *AEEMFHFAEPDORMET*
01B0  49 55 50 45 55 54 4C 57 48 4E 52 49 43 4A 53 49  *IUPEUTLWHNRICJSI*
01C0  4C 4E 4C 41 54 52 45 44 52 45 45 41 4E 52 43 55  *LNLATREDREEANRCU*
01D0  49 52 49 59 45 42 4E 45 54 4D 55 52 45 48 4C 48  *IRIYEBNETMUREHLH*
01E0  44 4F 4E 45 54 4B 44 49 52 54 4F 54 41 52 54 52  *DONETKDIRTOTARTR*
01F0  45 46 41 43 41 59 45 50 49 41 42 45 49 4B 4C 4F  *EFACAYEPIABEIKLO*
0200  48 49 4C 52 4F 41 54 54 4F 49 41 55 49 54 4F 57  *HILROATTOIAUITOW*
0210  4F 48 54 4D 43 45 4F 53 49 55 45 4E 56 50 50 4C  *OHTMCEOSIUENVPPL*
0220  53 48 48  .  .  .  .  .  .  .  .  .  .  .  .  .  *SHH*
 number of bytes is 547 
huffman compress of TW8 note that it repeats so
any normal compressor could make it smaller on a second pass
0000  4D 00 17 1E 80 BF A4 1E 78 80 40 2E F0 22 5F CD  *M.......x.@.."_.*
0010  4D 2D 80 05 CC 5F 26 80 0B 8F 40 5F D2 0F 3C 40  *M-..._&...@_..<@*
0020  20 17 78 11 2F E6 A6 96 C0 02 E6 2F 93 40 05 C7  * .x./....../.@..*
0030  A0 2F E9 07 9E 20 10 0B BC 08 97 F3 53 4B 60 01  *./... ......SK`.*
0040  73 17 C9 A0 02 E3 D0 17 F4 83 CF 10 08 05 DE 04  *s...............*
0050  4B F9 A9 A5 B0 00 B9 8B E4 D0 01 71 E8 0B FA 41  *K..........q...A*
0060  E7 88 04 02 EF 02 25 FC D4 D2 D8 00 5C C5 F2 68  *......%.....\..h*
0070  00 B8 F4 05 FD 20 F3 C4 02 01 77 81 12 FE 6A 69  *..... ....w...ji*
0080  6C 00 2E 62 F9 34 00 5C 7A 02 FE 90 79 E2 01 00  *l..b.4.\z...y...*
0090  BB C0 89 7F 35 34 B6 00 17 31 7C 9A 00 2E 3D 01  *....54...1|...=.*
00A0  7F 48 3C F1 00 80 5D E0 44 BF 9A 9A 5B 00 0B 98  *.H<...].D...[...*
00B0  BE 4D 00 17 1E 80 BF A4 1E 78 80 40 2E F0 22 5F  *.M.......x.@.."_*
00C0  CD 4D 2D 80 05 CC 5F 26 80 0B 8F 40 5F D2 0F 3C  *.M-..._&...@_..<*
00D0  40 20 17 78 11 2F E6 A6 96 C0 02 E6 2F 93 40 05  *@ .x./....../.@.*
00E0  C7 A0 2F E9 07 9E 20 10 0B BC 08 97 F3 53 4B 60  *../... ......SK`*
00F0  01 73 17 C9 A0 02 E3 D0 17 F4 83 CF 10 08 05 DE  *.s..............*
0100  04 4B F9 A9 A5 B0 00 B9 8B E4 D0 01 71 E8 0B FA  *.K..........q...*
0110  41 E7 88 04 02 EF 02 25 FC D4 D2 D8 00 5C C5 F0  *A......%.....\..*
 number of bytes is 288 
same as file above change fist byte in second row from
4D to 4E
0000  4D 00 17 1E 80 BF A4 1E 78 80 40 2E F0 22 5F CD  *M.......x.@.."_.*
0010  4E 2D 80 05 CC 5F 26 80 0B 8F 40 5F D2 0F 3C 40  *N-..._&...@_..<@*
0020  20 17 78 11 2F E6 A6 96 C0 02 E6 2F 93 40 05 C7  * .x./....../.@..*
0030  A0 2F E9 07 9E 20 10 0B BC 08 97 F3 53 4B 60 01  *./... ......SK`.*
0040  73 17 C9 A0 02 E3 D0 17 F4 83 CF 10 08 05 DE 04  *s...............*
0050  4B F9 A9 A5 B0 00 B9 8B E4 D0 01 71 E8 0B FA 41  *K..........q...A*
0060  E7 88 04 02 EF 02 25 FC D4 D2 D8 00 5C C5 F2 68  *......%.....\..h*
0070  00 B8 F4 05 FD 20 F3 C4 02 01 77 81 12 FE 6A 69  *..... ....w...ji*
0080  6C 00 2E 62 F9 34 00 5C 7A 02 FE 90 79 E2 01 00  *l..b.4.\z...y...*
0090  BB C0 89 7F 35 34 B6 00 17 31 7C 9A 00 2E 3D 01  *....54...1|...=.*
00A0  7F 48 3C F1 00 80 5D E0 44 BF 9A 9A 5B 00 0B 98  *.H<...].D...[...*
00B0  BE 4D 00 17 1E 80 BF A4 1E 78 80 40 2E F0 22 5F  *.M.......x.@.."_*
00C0  CD 4D 2D 80 05 CC 5F 26 80 0B 8F 40 5F D2 0F 3C  *.M-..._&...@_..<*
00D0  40 20 17 78 11 2F E6 A6 96 C0 02 E6 2F 93 40 05  *@ .x./....../.@.*
00E0  C7 A0 2F E9 07 9E 20 10 0B BC 08 97 F3 53 4B 60  *../... ......SK`*
00F0  01 73 17 C9 A0 02 E3 D0 17 F4 83 CF 10 08 05 DE  *.s..............*
0100  04 4B F9 A9 A5 B0 00 B9 8B E4 D0 01 71 E8 0B FA  *.K..........q...*
0110  41 E7 88 04 02 EF 02 25 FC D4 D2 D8 00 5C C5 F0  *A......%.....\..*
 number of bytes is 288 
this show the huffman has some error recover
0000  54 48 45 51 55 43 49 4B 42 52 4F 57 4E 46 4F 58  *THEQUCIKBROWNFOX*
0010  4A 55 4D 50 53 4F 56 45 52 54 55 4F 52 45 5A 59  *JUMPSOVERTUOREZY*
0020  44 4F 47 54 48 45 51 55 43 49 4B 42 52 4F 57 4E  *DOGTHEQUCIKBROWN*
0030  46 4F 58 4A 55 4D 50 53 4F 56 45 52 54 48 45 4C  *FOXJUMPSOVERTHEL*
0040  41 5A 59 44 4F 47 54 48 45 51 55 43 49 4B 42 52  *AZYDOGTHEQUCIKBR*
0050  4F 57 4E 46 4F 58 4A 55 4D 50 53 4F 56 45 52 54  *OWNFOXJUMPSOVERT*
0060  48 45 4C 41 5A 59 44 4F 47 54 48 45 51 55 43 49  *HELAZYDOGTHEQUCI*
0070  4B 42 52 4F 57 4E 46 4F 58 4A 55 4D 50 53 4F 56  *KBROWNFOXJUMPSOV*
0080  45 52 54 48 45 4C 41 5A 59 44 4F 47 54 48 45 51  *ERTHELAZYDOGTHEQ*
0090  55 43 49 4B 42 52 4F 57 4E 46 4F 58 4A 55 4D 50  *UCIKBROWNFOXJUMP*
00A0  53 4F 56 45 52 54 48 45 4C 41 5A 59 44 4F 47 54  *SOVERTHELAZYDOGT*
00B0  48 45 51 55 43 49 4B 42 52 4F 57 4E 46 4F 58 4A  *HEQUCIKBROWNFOXJ*
00C0  55 4D 50 53 4F 56 45 52 54 48 45 4C 41 5A 59 44  *UMPSOVERTHELAZYD*
00D0  4F 47 54 48 45 51 55 43 49 4B 42 52 4F 57 4E 46  *OGTHEQUCIKBROWNF*
00E0  4F 58 4A 55 4D 50 53 4F 56 45 52 54 48 45 4C 41  *OXJUMPSOVERTHELA*
00F0  5A 59 44 4F 47 54 48 45 51 55 43 49 4B 42 52 4F  *ZYDOGTHEQUCIKBRO*
0100  57 4E 46 4F 58 4A 55 4D 50 53 4F 56 45 52 54 48  *WNFOXJUMPSOVERTH*
0110  45 4C 41 5A 59 44 4F 47 54 48 45 51 55 43 49 4B  *ELAZYDOGTHEQUCIK*
0120  42 52 4F 57 4E 46 4F 58 4A 55 4D 50 53 4F 56 45  *BROWNFOXJUMPSOVE*
0130  52 54 48 45 4C 41 5A 59 44 4F 47 54 48 45 51 55  *RTHELAZYDOGTHEQU*
0140  43 49 4B 42 52 4F 57 4E 46 4F 58 4A 55 4D 50 53  *CIKBROWNFOXJUMPS*
0150  4F 56 45 52 54 48 45 4C 41 5A 59 44 4F 47 54 48  *OVERTHELAZYDOGTH*
0160  45 51 55 43 49 4B 42 52 4F 57 4E 46 4F 58 4A 55  *EQUCIKBROWNFOXJU*
0170  4D 50 53 4F 56 45 52 54 48 45 4C 41 5A 59 44 4F  *MPSOVERTHELAZYDO*
0180  47 54 48 45 51 55 43 49 4B 42 52 4F 57 4E 46 4F  *GTHEQUCIKBROWNFO*
0190  58 4A 55 4D 50 53 4F 56 45 52 54 48 45 4C 41 5A  *XJUMPSOVERTHELAZ*
01A0  59 44 4F 47 54 48 45 51 55 43 49 4B 42 52 4F 57  *YDOGTHEQUCIKBROW*
01B0  4E 46 4F 58 4A 55 4D 50 53 4F 56 45 52 54 48 45  *NFOXJUMPSOVERTHE*
01C0  4C 41 5A 59 44 4F 47  .  .  .  .  .  .  .  .  .  *LAZYDOG*
 number of bytes is 455 


Bottom line - If you're going to compress English text with a static huffman compressor, the bijective ones will always save you more space. If you don't care about encryption and want fast compression use huffman over arithmetic.
However, the arithmetic will do better if you use correct weights. It will also run a little slower, but it has tremendous potential for encryption and one can design rather trivial encryption schemes that are extremely strong but slow.
Example: Inspired by Shaw's work on Grandview method
  1. reverse the file
  2. decompress the file with unaria the static arithmetic uncompress. It does not have to be text.
  3. change the first character some secret amount 1 to 26. example: if 3, Z becomes C
  4. at start of file insert a random character of "A to Z" - no need to remember it
  5. use aria the static arithmetic compress to compress result
Do 1 to 5 many times depending on the length of your PATH key the path key is your offset. If you used 3, it would be C; if A, it would be 1. when you send the final file if will be binary. So you could just Add a final step where you uncompress using unarix the static huffman uncompress. This will give you a file of characters A to Z. You can insert spaces if you wish to make it look like simple encryption. It will even match english to first order approximation and will confuse the enemy by letting them be smug in falsely assuming it's weak crypto.
I have reverse in package First bijective adaptive huffman compressor

good luck,
David A. Scott

ENTER here for MY Home Page