David' Scott's Reverse Arthmetic

files updated on August 9,2004

This is loosely based on a discussion with David Cary see his site site

The problem is this one has a set of binary string messages but instead of being able to write ones and zeros say some other set of symbols is used say A B C Where one wants to minimize the cost of the message for some cost to A B C. One way to solve the problem is to find the probability that would give an expansion such that (weight or cost of symbol ) / ( expansion length of symbol ) is a constant. I choose 3 symbols since its the first interesting case. Its actually a mod to ARB255 it would be easy to expand to more than 3 symbols or even do 2 symbols

To use this

unarb3w filein fileout weightfile

filein file of ascii 1's and 0's only
fileout file of A's B's C's bijective replacement of filein
weightfile 3 line file of relative costs of A B C

arb3w filen fileout weightfile

reverse of previous

REVERSE ARITHMETIC USING ARB2X CELL arb3w.zip

NEW STABLE ARB255 BIJECTIVE COMPRESSOR arb255.zip


Table 1: The encoding process
Postion infinite finitely odd value Unque Binary stringweights 1,2,2weights 1,1,1 weights 12,38,55
1.[1]100000...(1) 1(1)A(1)A(12)A
2.[2]010000...(1) 0(2)B(2)A A(38)B
3.[2]110000...(2) 1 1(2)A A(1)B(24)A A
4.[3]001000...(2) 0 0 (2)C(2)A B(55)C
5.[3]101000...(2) 1 0 (3)A B(2)B B(36)A A A
6.[3]011000...(2) 0 1 (3)B A(2)B A(50)A B
7.[3]111000...(3) 1 1 1 (3)A A A(1)C(48)A A A A
8.[4]000100...(3) 0 0 0 (4)C B(3)A A A(67)C A
9.[4]100100...(3) 1 0 0 (3)A C(2)B C(62)A B A
10.[4]01010...(3) 0 1 0 (4)B B(3)A B A(62)B A A
11.[4]110100..(3) 1 1 0 (4)A A B(2)C B(74)A A A B
12.[4]001100..(3) 0 0 1 (3)C A(2)A C(50)B A
13.[4]101100..(3) 1 0 1 (4)A B A(2)C A(62)A A B
14.[4]011100..(3) 0 1 1 (4)B A A(3)B A A(67)A C
15.[4]111100..(4) 1 1 1 1 (4)A A A A(3)C B A(60)A A A A A
16.[5]000010..(5) 0 0 0 0 (4)C C(4)A A A A(93)C B
17.[5]100010..(5) 1 0 0 0 (5)A C B(3)B B B(86)A B A A A
When the wrights are 1 2 2 you actually have the standard huffman expansion of A = 1 B = 01 C = 00 for most of the 0's and 1's when not at end of string however when EOS occurs there are many ways to make a bijection choosing a diffrent method will change the ends and for the program here it actually might be better off. You can change bit_byts if a different transform is to be used. What the code does is it actually go through two bijective transforms. The first changes the 1 0 string to an infinite finitely odd file. then you use the arithmetic decompressor to get another infinite finitely odd file and bijectively transform that to the symbol space. Since at each stage your always trying to map things of different length something has to give and it can occur twice. Example level 3 strings fom the unique binary 1 1 1 to 0 1 1 go to level 3 and level 4 of the finitely odd file. the level 3 finitely odd map to levels 2 and 3 in w122 and level 4 fintiely odd map to levels 3 and 4 in w122 The point is every sting of ones and zeros has a unique expansion in A B C where the weights except for the level shifts at the very end of string match the way one wants for messages order. W122 is the old classical huffman weighting W111 is the classical 3 symbols of all equal weight. if you have binary in 1 0 and wishs to change to a 3 symbol code this is what you would use. W 12 38 55 is just a random possible weighting one might come across. Now for a practical example of where Reverse Arithmetic would be used its a bit stream example for those who are interested in that sort of thing. Suppose one is building a Mars Rover and that its trying to communicate to earh with a long stream of data. Suppose it can only send 0's and 1's suppose when your ready to seend the data its highly compressed and appears to all statistical tests to be a random string of 0's and 1's Suppose the cost of sending a one is 1 unit and the cost of sending a 0 is 2 units. What is best way to uncompress the stream using an arithemtic compressor like arb2x to send the data for the least cost. I wrote the unarb3w for a 3 symbol case this is the much easier 2 symbol case. But I think its an instructive example one could easily mod arb2x to do the whole thing with arb3w as a guide. The poor solution many noble people might first try is to weight the arithmetic uncompressor for sending such that 1 uses 2/3 the space and 0 uses 1/3 the space. Call this method one. It the wrong way of doing it but lets look at it. look at the transmitted 0 and 1 The length of a 0 in original input bits is ln(3)/ln(2) = 1.58496250072115618145373894394782 The length of a 1 in original input bits is ln(3/2)/ln(2) = 0.584962500721156181453738943947817 note since random one expects 1 to occur 2/3 of time in output bit stream and 0 to ocurr 1/3 of the time so the actually this give in output bit stream a length of output bits for every input bit to 1/0.91829583405448951478707227728 this means for every million bits in one expects to send 1088973.68681807866307862797911068 call this X characters out the cost for these characters is 1(2/3)( X) + 2(1/3)(X) is (4/3(X) = 1451964.91575743821743817063881333 to summare if cost of sending 1 is 1 and cost of send 0 is 2 if one use the 2: 1 split. and get 1/3 of time a zero and 2/3 of time a one. For every 1,000,000 character in you send 1,088,974 character out at a cost of 1,451,965 Know lets look at the opitmal serial solution. let p1 = 0.618033988749894848204586834365638 for a 1 and p2 = 0.381966011250105151795413165634362 for a 0 the length of 0 in original input bits is ln(1/p2)/ln2 = 1.38848382726123460347758053379719 the length of 1 in original input bits is ln(1/p1)ln(2) = 0.694241913630617301738790266900085 multiply by probability of occurance and get length of output bits for every input bit = 1/0.959418728222744199142863005822959 this means for every million bit in one expects to send 1042297.76903816517762117429172565 now using probabilityes the cost to send is 1288350.89532754775244959416772444 in this case the optimal solution case 2 allows you to send the 1,000,000 bits in 1,042,297 bits which is done with not only less chararters by several thousand bits. But by a lower cost. of 1,288,351 In case your wondering if no reverse arithmetic compression done in one million bits you would expect half zero and half one so the case cost would be 1/2( 1 millon ) + 2(1/2)( 1 million) 1.5(1million) which is 1,500,000 if no chanage the cost is 1,500,000. if use the noble guess cost is 1,451,965 if you do it right the cost is 1,288,351 This is an example of reverse arithmetic. This is also a real world example to I say this since in Ham radio morse code is based on sending dots and dashes. a dot is length X of a burst of RF followed by slience of the same length. A dash is a burst of RF 3 times the length of a dot and the same amount of silence this means the dot is half the time of a dash. If one sending a true stream of dots and dashes one does not need the inter character gap. That occurs when one is sending individual characters. looking at the number ratio of the ones and zeros ( dots and dashes ) one could get an idea how much thought was given to the design. If there is roughly an equal number of dots and dashes then no thought of saving transmitting power was used. If the ratio is two to one. Then what I call an average amount of consideration was used. If the stream is such that it closely matches the ratio of dots and dashes by the weights done in a reverse arithmetic compression then there is a high degress of thought given to the trasmission since if it came from battery system they are conserving power this is something a spy might use from an intelligent nation. I say this since its the right way to do it. I don't actually think the US smart enough to look for such signals or use such signals. We are not hungry enough to bother with it. Let me explain this thought a little farther. If your a spy you want to limit the time of the transmission. If the device sends dots and dashes in a burst with the timing above. if one occurs roughly 0.618 of the time and the other 0.312 of the time then you have some one using state of the art equipment. For this is the ratio to send the most information in the shortest time when the dot dash ratio is 2:1.
MORE LATER if i get enough feed back
ENTER here for MY Home Page