David Scott's BIJECTIVE STATIC ARITHMETIC COMPRESSION for SECOND ORDER ENGLISH with SPACES


files updated on June 10, 2006


This page describes a static entropy coder with second order frequencies based on ENGLISH. The complete package is in the bijective second order english arithmetic compressor so that any 8-bit binary file can be thought of as a compressed file made entirely of the letters:
"ABCDEFGHIJGKLMNOPQRSTUVWXYZ"
and single embedded spaces -- not leading or trailing spaces.

If you need to know, this is based on a slight editing of my first order bijective entropy compressor . Go to that page for further information. The tables for this compressor came from http://www.data-compression.com/english.html. I made this since some people wanted spaces and felt that I might as well make it second order.

HERE ARE SOME EXAMPLES:
TEST FILE 1 TAKEN FROM John Savard Site with added spaces
0000  20 4E 4F 57 20 49 53 20 20 54 48 45 20 54 49 4D  * NOW IS  THE TIM*
0010  45 20 0D 0A  .  .  .  .  .  .  .  .  .  .  .  .  *E ..*
 number of bytes is 20 
THIS IS FIRST ORDER BIJECTIVE ARITHMETIC COMPRESSION OF FILE 1
0000  E2 B9 5C FF 06 B4  .  .  .  .  .  .  .  .  .  .  *..\...*
 number of bytes is 6 
THIS IS THE UNCOMPRESS NOTE ONLY USES LETTERS "A-Z"
0000  4E 4F 57 49 53 54 48 45 54 49 4D 45  .  .  .  .  *NOWISTHETIME*
 number of bytes is 12 
THIS IS SECOND ORDER BIJECTIVE ARITHMETIC COMPRESSION OF FILE 1
NOTE 6 BYTES LIKE THE FIRST ORDER BUT INCLUDES 3 SPACES
0000  8A B9 8A 46 E5 28  .  .  .  .  .  .  .  .  .  .  *...F.(*
 number of bytes is 6 
THIS IS UNCOMPRESS OF ABOVE USES LETTERS "A-Z" and Single internal spaces
0000  4E 4F 57 20 49 53 20 54 48 45 20 54 49 4D 45  .  *NOW IS THE TIME*
 number of bytes is 15 

FILE 2 A TEST OF ALL THE CHARACTERS "A-Z"
0000  20 54 48 45 20 51 55 49 43 4B 20 42 52 4F 57 4E  * THE QUICK BROWN*
0010  20 46 4F 58 20 4A 55 4D 50 53 20 4F 56 45 52 20  * FOX JUMPS OVER *
0020  54 48 45 20 4C 41 5A 59 20 44 4F 47 0D 0A  .  .  *THE LAZY DOG..*
 number of bytes is 46 
THIS IS FIRST ORDER BIJECTIVE ARITHMETIC COMPRESSION OF FILE 2
0000  4C 31 D9 EC E6 2E FA AB 1C 29 10 53 97 56 98 1E  *L1.......).S.V..*
0010  A0 9C 3F 9C 10 C6 E8  .  .  .  .  .  .  .  .  .  *..?....*
 number of bytes is 23 
THE FIRST ORDER UNCOMPRESS NOTE NO SPACES
0000  54 48 45 51 55 49 43 4B 42 52 4F 57 4E 46 4F 58  *THEQUICKBROWNFOX*
0010  4A 55 4D 50 53 4F 56 45 52 54 48 45 4C 41 5A 59  *JUMPSOVERTHELAZY*
0020  44 4F 47  .  .  .  .  .  .  .  .  .  .  .  .  .  *DOG*
 number of bytes is 35 
THIS IS SECOND ORDER BIJECTIVE ARITHMETIC COMPRESSION OF FILE 2
NOTE SHORTER THAN FIRST ORDER AND INCLUDES 8 SPACE CHARACTERS
0000  CC D6 0B 7E 4F 0C 96 81 4D 45 FE E1 E1 1B DF 1C  *...~O...ME......*
0010  54 82 AE 91 30  .  .  .  .  .  .  .  .  .  .  .  *T...0*
 number of bytes is 21 
THIS IS UNCOMPRESS OF ABOVE USES LETTERS "A-Z" and Single internal spaces
0000  54 48 45 20 51 55 49 43 4B 20 42 52 4F 57 4E 20  *THE QUICK BROWN *
0010  46 4F 58 20 4A 55 4D 50 53 20 4F 56 45 52 20 54  *FOX JUMPS OVER T*
0020  48 45 20 4C 41 5A 59 20 44 4F 47  .  .  .  .  .  *HE LAZY DOG*
 number of bytes is 43 

THIS IS A TEST MESSAGE BETWEEN TO LOVERS USING SIMPLE STEALTH ENCRYPTION
IT WILL LOOK LIKE AN EASIER ENCRYPTION BUT UNLESS ONE KNOWS WHAT YOU
DID ITS VERY HARD TO BREAK
0000  20 49 20 57 49 4C 4C 20 53 45 45 20 20 59 4F 55  * I WILL SEE  YOU*
0010  20 41 54 20 4E 4F 4F 4E 20 20 44 4F 20 4E 4F 54  * AT NOON  DO NOT*
0020  20 54 45 4C 4C 20 59 4F 55 52 20 44 41 44 0D 0A  * TELL YOUR DAD..*
0030  0D 0A  .  .  .  .  .  .  .  .  .  .  .  .  .  .  *..*
 number of bytes is 50 
THIS IS FILE AFTER ARI2A.EXE 
0000  73 46 A2 F7 86 95 1C 35 34 DC D9 8E C2 2D 87 78  *sF.....54....-.x*
0010  86 FC CB CC  .  .  .  .  .  .  .  .  .  .  .  .  *....*
 number of bytes is 20 
THIS IS FILE AFTER UNARIA.EXE
0000  55 56 45 54 54 4F 48 42 45 57 45 53 53 4C 45 41  *UVETTOHBEWESSLEA*
0010  4E 52 54 41 53 4E 45 4E 46 4E 49 45 49 4F 4C 46  *NRTASNENFNIEIOLF*
0020  4D 46 52 41 54 48 49  .  .  .  .  .  .  .  .  .  *MFRATHI*
 number of bytes is 39 
THIS IS AFTER THERE SECRET ITS ADD "SEX" AND UP FIRST LETTER BY 3 THAT
MEANS THE U GOES TO X IF IT WAS AND A TO D IF  Z TO C AND SO ON
0000  53 45 58 58 56 45 54 54 4F 48 42 45 57 45 53 53  *SEXXVETTOHBEWESS*
0010  4C 45 41 4E 52 54 41 53 4E 45 4E 46 4E 49 45 49  *LEANRTASNENFNIEI*
0020  4F 4C 46 4D 46 52 41 54 48 49 0D 0A  .  .  .  .  *OLFMFRATHI..*
 number of bytes is 44 
AFTER ARIA.EXE
0000  15 9D C0 7A D3 A3 3E 8F 41 2E FB DE 06 B4 26 A7  *...z..>.A.....&.*
0010  57 8D 62 44 B1 70 A0  .  .  .  .  .  .  .  .  .  *W.bD.p.*
 number of bytes is 23 
**** FOLLOWING IS WHAT IS SENT *****
AFTER UNARI2A.EXE SEND THIS IN AN EMAIL REST OF MESSAGE COULD BE LOWER CASE
0000  41 54 48 45 47 20 53 4E 43 52 45 56 45 53 20 4E  *ATHEG SNCREVES N*
0010  20 54 48 41 47 4C 49 4E 47 20 49 4E 44 20 52 4C  * THAGLING IND RL*
0020  59 53 48 45 52 41 53 20 4D 4D 4F 57 20 48 45 20  *YSHERAS MMOW HE *
0030  53 20 53 20 49 41  .  .  .  .  .  .  .  .  .  .  *S S IA*
 number of bytes is 54 
THIS IS AFTER ARI2A.EXE YOU RUN THIS WHEN YOU GET THE MESSAGE
0000  15 9D C0 7A D3 A3 3E 8F 41 2E FB DE 06 B4 26 A7  *...z..>.A.....&.*
0010  57 8D 62 44 B1 70 A0  .  .  .  .  .  .  .  .  .  *W.bD.p.*
 number of bytes is 23 
AFTER UNARIA.EXE
0000  53 45 58 58 56 45 54 54 4F 48 42 45 57 45 53 53  *SEXXVETTOHBEWESS*
0010  4C 45 41 4E 52 54 41 53 4E 45 4E 46 4E 49 45 49  *LEANRTASNENFNIEI*
0020  4F 4C 46 4D 46 52 41 54 48 49  .  .  .  .  .  .  *OLFMFRATHI*
 number of bytes is 42 
AFTER YOU DO THE SECRET EDITING "DROP SEX AND LOWER FIRST LETTER BY 3"
0000  55 56 45 54 54 4F 48 42 45 57 45 53 53 4C 45 41  *UVETTOHBEWESSLEA*
0010  4E 52 54 41 53 4E 45 4E 46 4E 49 45 49 4F 4C 46  *NRTASNENFNIEIOLF*
0020  4D 46 52 41 54 48 49 0D 0A  .  .  .  .  .  .  .  *MFRATHI..*
 number of bytes is 41 
AFTER ARIA.EXE
0000  73 46 A2 F7 86 95 1C 35 34 DC D9 8E C2 2D 87 78  *sF.....54....-.x*
0010  86 FC CB CC  .  .  .  .  .  .  .  .  .  .  .  .  *....*
 number of bytes is 20 
AFTER UNARIA.EXE **NOTE NO LEADING OR TRAILING SPACES**
**ALSO ONLY SINGLE SPACES BETWEEN WORDS*** 
0000  49 20 57 49 4C 4C 20 53 45 45 20 59 4F 55 20 41  *I WILL SEE YOU A*
0010  54 20 4E 4F 4F 4E 20 44 4F 20 4E 4F 54 20 54 45  *T NOON DO NOT TE*
0020  4C 4C 20 59 4F 55 52 20 44 41 44  .  .  .  .  .  *LL YOUR DAD*
 number of bytes is 43 
IF YOU WANT REAL ENCRYPTION YOU CAN CHANGE THE ARRAYS IN THE SOURCE CODE
AND THE POSITIONS OF THE SLOTS WHERE THE LETTERS ARE. ALSO, YOU CAN USE
DIFFERENT STATS FOR THE FREQUENCIES.
good luck
David A. Scott
ENTER here for MY Home Page