David' Scott's FOCUSED HUFFMAN COMPRESSION

files updated on September 23,1999

My main concern is to create compression routines that are of great use to the person who some day may want to compress data before encryption is used. I feel very strongly that a compression routine should be "one to one" on the previous page I give the rules for h2com.exe; However this is not the only set of rules. In this section I discuss h2coma.exe(focused adaptive huffman compression files for cryptographic uses). The main difference between h2com.exe and h2coma.exe is that the second program requires 2 outfiles. These files will be the same length as each other. The first output file of both methods is the same but the second output is what I called focused. You can do a source compare of both sources to see the diefference and you can. The main advantabe of this is that the second pass of huffman coding can be a different lenght than the first pass. if your using h3com.exe for the second pass you can sometimes get shorter files on this second pass.
What is more inrteresting is that one can actaully do "lossy text compression" and recover the whole text. Let me explain. You can run h2coma.exe reverse both output files and see which is shortest after you run h3com.exe and then only encrypt the shortest file. The advantage of this is that an attacker would have to do more work in trying to break your method since he can not be sure which method was used. It also makes the job of the person your sending a message to little harder since when decompresses he is getting 2 different candidate files. Since one is alwasy using the one that results in the shortest length when compressed. He can first try with software to see if the procedure lead to the same file. There are 3 possibilites. One only one of the candidate files compresses to the compressed text used. Great your done you know which file it is. Two both file compress to the same text. This case requires both candidate files to be decrypted and then the user may have to eyeball which one is correct. Well I don't call it lossy for nothing and there is a very slim chance that two reasonable message could seem valid. But the chance is close to zero one should not worry about it. But if you do worry check before you send message and slightly change it till there can be no confusion. The third possibility occurs when some one is guessing a key and the decompression lead to two files neither one of which when compressed by this method comes back to the candidate file. In this case the attacker knows that his key was invalid. Well I said the method was not "one to one" But felt like discussing the concept.

file hex dumps follow:

FILE that is input to h2coma.exe
0000  0D 0A 54 48 45 20 51 55 49 43 4B 20 42 52 4F 57  *..THE QUICK BROW*
0010  4E 20 46 4F 58 20 4A 55 4D 50 45 44 20 4F 56 45  *N FOX JUMPED OVE*
0020  52 20 54 48 45 20 4C 41 5A 59 20 53 4C 4F 57 20  *R THE LAZY SLOW *
0030  44 4F 47 0D 0A 54 48 45 20 51 55 49 43 4B 20 42  *DOG..THE QUICK B*
0040  52 4F 57 4E 20 46 4F 58 20 4A 55 4D 50 45 44 20  *ROWN FOX JUMPED *
0050  4F 56 45 52 20 54 48 45 20 4C 41 5A 59 20 53 4C  *OVER THE LAZY SL*
0060  4F 57 20 44 4F 47 0D 0A 54 48 45 20 51 55 49 43  *OW DOG..THE QUIC*
0070  4B 20 42 52 4F 57 4E 20 46 4F 58 20 4A 55 4D 50  *K BROWN FOX JUMP*
0080  45 44 20 4F 56 45 52 20 54 48 45 20 4C 41 5A 59  *ED OVER THE LAZY*
0090  20 53 4C 4F 57 20 44 4F 47 0D 0A 54 48 45 20 51  * SLOW DOG..THE Q*
00A0  55 49 43 4B 20 42 52 4F 57 4E 20 46 4F 58 20 4A  *UICK BROWN FOX J*
00B0  55 4D 50 45 44 20 4F 56 45 52 20 54 48 45 20 4C  *UMPED OVER THE L*
00C0  41 5A 59 20 53 4C 4F 57 20 44 4F 47 0D 0A  .  .  *AZY SLOW DOG..*
 number of bytes is 206 
FILE X it is first output file of h2coma.exe
0000  F2 04 1A 57 30 65 47 51 C7 5A 86 89 0F A6 FD 83  *...W0eGQ.Z......*
0010  8A 88 C3 BE DB 06 C3 13 E7 93 49 A0 FB A0 5B 25  *..........I...[%*
0020  CF A2 A3 3C 2A 5D 8B 30 25 06 40 8D 10 7C 4A F4  *...<*].0%.@..|J.*
0030  B1 1A 1B E0 C5 E0 94 22 01 88 C8 0D 41 40 4C 0D  *......."....A@L.*
0040  17 B8 5A 14 8B 02 71 C5 79 18 15 45 29 56 F7 20  *..Z...q.y..E)V. *
0050  80 C2 F3 87 0B 06 08 0C 95 30 26 08 20 30 27 BE  *.........0&. 0'.*
0060  03 07 80 C0 28 20 57 9F 02 A8 A5 2A 5F 04 10 70  *....( W....*_..p*
0070  BE C3 85 83 04 06 4A F0 26 08 20 30 27 BE 03 07  *......J.&. 0'...*
0080  80 C0 28 20 57 9B 02 A8 A5 2B BE 08 20 E1 7D 87  *..( W....+.. .}.*
0090  0B 06 08 00  .  .  .  .  .  .  .  .  .  .  .  .  *....*
 number of bytes is 148 
FILE Y it is second output file of h2coma.exe
0000  F2 38 16 56 B0 65 4B 50 C7 7A 85 89 17 A7 FD 85  *.8.V.eKP.z......*
0010  8A C8 C5 BE BB 05 C3 13 E7 93 A9 A0 FB A0 1B 21  *...............!*
0020  CF A2 A3 3C 0A 5D 8B 30 21 06 40 95 10 7C 4A F4  *...<.].0!.@..|J.*
0030  31 9A 1B E0 C5 E0 94 22 01 B1 5A 0D 00 40 5C 05  *1......"..Z..@\.*
0040  17 A8 82 14 98 02 70 44 79 18 1D 45 A9 56 D6 20  *......pDy..E.V. *
0050  81 C2 F3 DD 3B 06 08 8C 11 71 26 E9 BE 73 26 BF  *....;....q&..s&.*
0060  E7 75 F9 DC 2F BE 47 9B B3 A8 B5 2A 5B B6 DB 70  *.u../.G....*[..p*
0070  BF 6E DD 83 04 46 08 F1 27 EB BE 73 26 BF E7 75  *.n...F..'..s&..u*
0080  F9 DC 2F BE 47 9D B3 A8 B5 2B 37 6D B6 E1 7E DD  *../.G....+7m..~.*
0090  BB 06 08 80  .  .  .  .  .  .  .  .  .  .  .  .  *....*
 number of bytes is 148 
FILE X after (reverse) (h3com) and (reverse)
0000  FF F7 FE FC 1E 20 87 B7 F9 A0 EA 2D 2B BC 32 54  *..... .....-+.2T*
0010  63 F8 9F BF FA FC E1 E8 7F 77 F7 CF 0F F5 DF F1  *c........w......*
0020  7C 7A 3C D1 BB 41 C3 C8 C5 83 4A B3 04 F7 26 49  *|z<..A....J...&I*
0030  8B AF 4B B4 A1 3E DE 59 BA A1 E7 31 B4 7A 18 7B  *..K..>.Y...1.z.{*
0040  4E 30 8C FE 5E 97 96 69 C4 C9 70 2B 4E 3A 00 0F  *N0..^..i..p+N:..*
0050  DE 7F 3E 5C 8D 69 BF 9F 62 DB BD AC DF 51 F8 F1  *..>\.i..b....Q..*
0060  C6 1E 99 10 18 D1 73 06 3C F3 7C 7D 3F 6C CD E2  *......s.<.|}?l..*
0070  31 2D 04 6E B0 14 C7 3B 8C 42 E0 43 D5 EC 81 EC  *1-.n...;.B.C....*
0080  6F 51 67 51 60 63 38 98 2B F0 30 23 25 19 7C D5  *oQgQ`c8.+.0#%.|.*
0090  D2 CA 91  .  .  .  .  .  .  .  .  .  .  .  .  .  *...*
 number of bytes is 147 
FILE Y after (reverse) (h3com) and (reverse)
0000  7F F7 FE A2 11 40 8F 24 C9 64 6A 25 2B A6 31 5C  *.....@.$.dj%+.1\*
0010  20 F5 08 C1 A2 86 10 01 A3 2F 0A 6A 87 4F DE A2  * ......../.j.O..*
0020  3E 5B C4 D2 FA 92 48 D0 0C 51 81 2D 92 36 78 CF  *>[....H..Q.-.6x.*
0030  9E 46 97 75 F4 66 C5 A6 7D 3B F7 B8 DD 5A 2F 86  *.F.u.f..};...Z/.*
0040  1E FE 33 25 3F 88 40 C7 1C 27 C9 4A 8A E5 DD D1  *..3%?.@..'.J....*
0050  B4 D5 E5 FB 98 58 CC 7C DB 30 0C 5C FD 8E 35 E9  *.....X.|.0.\..5.*
0060  EB C3 21 C9 C4 C9 A0 A2 6C 17 C5 5D 51 28 79 43  *..!.....l..]Q(yC*
0070  B5 4C 26 72 05 CE C6 55 1A BE E3 B3 4A 6D 0B 7A  *.L&r...U....Jm.z*
0080  BF FB 54 4D 0F 66 8A C6 66 BF E9 02 43 87 A5 E2  *..TM.f..f...C...*
0090  FB 89 04 86 51  .  .  .  .  .  .  .  .  .  .  .  *....Q*
 number of bytes is 149 
Above shows how focusing can change the file
length of the file that results from a second 
huffman pass.

ENTER here for MY Home Page