files updated on August 9,2004
This is loosely based on a discussion with David Cary see his site site
The problem is this one has a set of binary string messages but instead of being able to write ones and zeros say some other set of symbols is used say A B C Where one wants to minimize the cost of the message for some cost to A B C. One way to solve the problem is to find the probability that would give an expansion such that (weight or cost of symbol ) / ( expansion length of symbol ) is a constant. I choose 3 symbols since its the first interesting case. Its actually a mod to ARB255 it would be easy to expand to more than 3 symbols or even do 2 symbols
To use this
unarb3w filein fileout weightfile
filein file of ascii 1's and 0's only
fileout file of A's B's C's bijective replacement of filein
weightfile 3 line file of relative costs of A B C
arb3w filen fileout weightfile
reverse of previous
NEW STABLE ARB255 BIJECTIVE COMPRESSOR arb255.zip
When the wrights are 1 2 2 you actually have the standard huffman
expansion of
A = 1
B = 01
C = 00
for most of the 0's and 1's when not at end of string
however when EOS occurs there are many ways to make a bijection
choosing a diffrent method will change the ends and for the program here
it actually might be better off. You can change bit_byts if a different transform
is to be used.
What the code does is it actually go through two bijective transforms.
The first changes the 1 0 string to an infinite finitely odd file.
then you use the arithmetic decompressor to get another infinite finitely
odd file and bijectively transform that to the symbol space.
Since at each stage your always trying to map things of different
length something has to give and it can occur twice.
Example level 3 strings fom the unique binary 1 1 1 to 0 1 1
go to level 3 and level 4 of the finitely odd file.
the level 3 finitely odd map to levels 2 and 3 in w122
and level 4 fintiely odd map to levels 3 and 4 in w122
The point is every sting of ones and zeros has a unique expansion
in A B C where the weights except for the level shifts at the very
end of string match the way one wants for messages order.
W122 is the old classical huffman weighting
W111 is the classical 3 symbols of all equal weight.
if you have binary in 1 0 and wishs to change to a 3 symbol
code this is what you would use.
W 12 38 55 is just a random possible weighting one might come across.
Now for a practical example of where Reverse Arithmetic would be used
its a bit stream example for those who are interested in that sort of thing.
Suppose one is building a Mars Rover and that its trying to communicate
to earh with a long stream of data. Suppose it can only send 0's and 1's
suppose when your ready to seend the data its highly compressed and appears
to all statistical tests to be a random string of 0's and 1's Suppose the cost
of sending a one is 1 unit and the cost of sending a 0 is 2 units. What is best
way to uncompress the stream using an arithemtic compressor like arb2x
to send the data for the least cost. I wrote the unarb3w for a 3 symbol case
this is the much easier 2 symbol case. But I think its an instructive example
one could easily mod arb2x to do the whole thing with arb3w as a guide.
The poor solution many noble people might first try is to weight the arithmetic
uncompressor for sending such that 1 uses 2/3 the space and 0 uses 1/3
the space. Call this method one. It the wrong way of doing it but lets look
at it.
look at the transmitted 0 and 1
The length of a 0 in original input bits is ln(3)/ln(2) = 1.58496250072115618145373894394782
The length of a 1 in original input bits is ln(3/2)/ln(2) = 0.584962500721156181453738943947817
note since random one expects 1 to occur 2/3 of time in output bit stream
and 0 to ocurr 1/3 of the time so the actually this give in output bit stream
a length of output bits for every input bit to 1/0.91829583405448951478707227728
this means for every million bits in one expects to send 1088973.68681807866307862797911068
call this X characters out the cost for these characters is
1(2/3)( X) + 2(1/3)(X) is (4/3(X) = 1451964.91575743821743817063881333
to summare if cost of sending 1 is 1 and cost of send 0 is 2 if one use the
2: 1 split. and get 1/3 of time a zero and 2/3 of time a one.
For every 1,000,000 character in you send 1,088,974 character out
at a cost of 1,451,965
Know lets look at the opitmal serial solution.
let p1 = 0.618033988749894848204586834365638 for a 1
and p2 = 0.381966011250105151795413165634362 for a 0
the length of 0 in original input bits is ln(1/p2)/ln2 = 1.38848382726123460347758053379719
the length of 1 in original input bits is ln(1/p1)ln(2) = 0.694241913630617301738790266900085
multiply by probability of occurance and get
length of output bits for every input bit = 1/0.959418728222744199142863005822959
this means for every million bit in one expects to send 1042297.76903816517762117429172565
now using probabilityes the cost to send is 1288350.89532754775244959416772444
in this case the optimal solution case 2 allows you to send the 1,000,000 bits
in 1,042,297 bits which is done with not only less chararters by several thousand
bits. But by a lower cost. of 1,288,351
In case your wondering if no reverse arithmetic compression done in one million bits you
would expect half zero and half one so the case cost would be
1/2( 1 millon ) + 2(1/2)( 1 million) 1.5(1million) which is 1,500,000
if no chanage the cost is
1,500,000.
if use the noble guess cost is
1,451,965
if you do it right the cost is
1,288,351
This is an example of reverse arithmetic.
This is also a real world example to I say this since
in Ham radio morse code is based on sending dots and dashes.
a dot is length X of a burst of RF followed by slience of the same
length. A dash is a burst of RF 3 times the length of a dot and
the same amount of silence this means the dot is half the time
of a dash. If one sending a true stream of dots and dashes one
does not need the inter character gap. That occurs when one
is sending individual characters. looking at the number
ratio of the ones and zeros ( dots and dashes ) one could get
an idea how much thought was given to the design. If
there is roughly an equal number of dots and dashes then
no thought of saving transmitting power was used. If the
ratio is two to one. Then what I call an average amount of
consideration was used. If the stream is such that it
closely matches the ratio of dots and dashes by the weights
done in a reverse arithmetic compression
then there is a high degress of thought given to the trasmission
since if it came from battery system they are conserving power
this is something a spy might use from an intelligent nation.
I say this since its the right way to do it. I don't actually think
the US smart enough to look for such signals or use such signals.
We are not hungry enough to bother with it.
Let me explain this thought a little farther. If your a spy
you want to limit the time of the transmission. If the device
sends dots and dashes in a burst with the timing above.
if one occurs roughly 0.618 of the time and the other
0.312 of the time then you have some one using state of
the art equipment. For this is the ratio to send the most
information in the shortest time when the dot dash
ratio is 2:1.
MORE LATER
if i get enough feed back
Postion infinite finitely odd value Unque Binary string weights 1,2,2 weights 1,1,1 weights 12,38,55 1. [1]100000... (1) 1 (1)A (1)A (12)A 2. [2]010000... (1) 0 (2)B (2)A A (38)B 3. [2]110000... (2) 1 1 (2)A A (1)B (24)A A 4. [3]001000... (2) 0 0 (2)C (2)A B (55)C 5. [3]101000... (2) 1 0 (3)A B (2)B B (36)A A A 6. [3]011000... (2) 0 1 (3)B A (2)B A (50)A B 7. [3]111000... (3) 1 1 1 (3)A A A (1)C (48)A A A A 8. [4]000100... (3) 0 0 0 (4)C B (3)A A A (67)C A 9. [4]100100... (3) 1 0 0 (3)A C (2)B C (62)A B A 10. [4]01010... (3) 0 1 0 (4)B B (3)A B A (62)B A A 11. [4]110100.. (3) 1 1 0 (4)A A B (2)C B (74)A A A B 12. [4]001100.. (3) 0 0 1 (3)C A (2)A C (50)B A 13. [4]101100.. (3) 1 0 1 (4)A B A (2)C A (62)A A B 14. [4]011100.. (3) 0 1 1 (4)B A A (3)B A A (67)A C 15. [4]111100.. (4) 1 1 1 1 (4)A A A A (3)C B A (60)A A A A A 16. [5]000010.. (5) 0 0 0 0 (4)C C (4)A A A A (93)C B 17. [5]100010.. (5) 1 0 0 0 (5)A C B (3)B B B (86)A B A A A