the code for BWTS is HERE
One way to explain it. Is to contrast it with the old BWT that
uses an index.
BANANAS is a clasic example in old way
you make a list of all possible rotations then sort
the list smallest to largest The BWT is the last character in each row
The last column is BNNSAAA
BANANAS
ANANASB
ANASBAN
ASBANAN
BANANAS
NANASBA
NASBANA
SBANANA
Since any rotation of the word BANANAS sorts to same thing your have to pass an index along with the sorted string making it longer so the output would be
BNNSAAA[index]
so that the file would actaully have an increase in length if you tack the index on.
BWTS for one roations gets exactuly the same string however there is no index
ANANASB the first row is the string that in BWTS matched by the raw BWT except no INDEX
For the next step look at UNBWT for the string BNNSAAA
sort it
add output sort add ouput sort ..REPEAT TILL
A BA AN BAN ANA ANANASB
A NA AN NAN ANA ANASBAN
A NA AS NAS ASB ASBANAN
B SB BA SBA BAN BANANAS
N AN NA ANA NAN NANASBA
N AN NA ANA NAS NASBANA
S AS SB ASB SBA SBANANA
This shows for many strings the forward and reverse transformations the
same in BWT UNBWT and BWTS and UNBWTS except you save index
know to show what happens in other cases.
When I do a BWTS of BANANAS I get SNNBAAA no index
I claim that is based on how how I get the UNBWTS to go to
BANANAS and then BWTS is just reverse of UNBWTS so explaining
UNBWTS does it all
For the next step look at UNBWT for the string SNNBAAA
sort it
add output sort add ouput sort ..REPEAT TILL
A SA AN SAN ANA (ANANAS)(ANANAS)...
A NA AN NAN ANA (ANASAN)(ANASAN)...
A NA AS NAS ASB (ASANAN)(ASANAN)
B BB BB BBB BBB (B)(B)(B)(B)(B)(B)(B)...
N AN NA ANA NAN (NANASA)(NANASA)...
N AN NA ANA NAS (NASANA)(NASANA)...
S AS SA ASA SAN SANANA)(SANANA)...
The main problem with BWT is that it adds unwanted information to file
when the BWT is done. The result is that the file is always expanded and
that an index must somehow be tacked on to the file.
The above fact leads to problems if one ever wants to someday
challenge the PPM style compressors which can be made bijective.
BTW style of compressors can't be truely bijective and that means
they will never in the long run compress as well as a good PPM
style compressor.
The main problem with BWT is that for any random file over
a hundred bytes has in 99% of cases no possible inverse regardless of
what index you might want to use.
I for one believe in compression before encryption. As quantum computers
become more in the main stream its foolish to have compression encryption
methods that carry the info in the Shanon sense to the breaking of themselves.
Better to use a bijective compressor where anything can be uncompressed.
There are several bijective compressors out there and the best simple bijective
compression encryption program I am aware of is
BICOM
Before BWTS the closest one could obtain a bijective BWT was doing a
standard full BWT followed by my DSC which not adds less bits than using
Universal Numbers but sometimes shortened the file. I did BWTS since DSC
still fell short of the goal of making a truely bijective BWT.
| program | original | marks | BWTS |
| BIB | 111,261 | 29,567 | 29,563 |
| BOOK1 | 768,771 | 275,831 | 275,779 |
| BOOK2 | 610,856 | 186,592 | 186,555 |
| GEO | 102,400 | 62,120 | 62,113 |
| NEWS | 377,109 | 134,174 | 134,166 |
| OBJ1 | 21,504 | 10,857 | 10,845 |
| OBJ2 | 246,814 | 81,948 | 81,930 |
| PAPER1 | 53,161 | 17,724 | 17,710 |
| PAPER2 | 82,199 | 26,956 | 26,938 |
| PAPER3 | 46,526 | 16,995 | 16,979 |
| PAPER4 | 13,286 | 5,529 | 5,515 |
| PAPER5 | 11,954 | 5,136 | 5,122 |
| PAPER6 | 38,105 | 13,159 | 13,147 |
| PIC | 513,216 | 50,829 | 50,815 |
| PROGC | 39,611 | 13,312 | 13,302 |
| PROGL | 71,646 | 16,688 | 16,682 |
| PROGP | 49,379 | 11,404 | 11,395 |
| TRANS | 93,695 | 19,301 | 19,298 |
| total = | 3,251,493 | 978,122 | 977,854 |
HERE is the dos run stream to create the files ..\rle %1. %1.1 ..\bwt %1.1 %1.2 ..\mtf %1.2 %1.3 ..\rle %1.3 %1.4 ..\ari %1.4 %1.m ..\unari %1.m %1.4m ..\unrle %1.4m %1.3m ..\unmtf %1.3m %1.2m ..\unbwt %1.2m %1.1m ..\unrle %1.1m %1.mm ..\bwts -b200000 %1.1 %1.5 ..\mtf %1.5 %1.6 ..\rle %1.6 %1.7 ..\ari %1.7 %1.s ..\unari %1.s %1.4s ..\unrle %1.4s %1.3s ..\unmtf %1.3s %1.2s ..\unbwts -b200000 %1.2s %1.1s ..\unrle %1.1s %1.ss fc /b %1. %1.mm >> arb.log fc /b %1. %1.ss >> arb.log
This is discusion is based on Mark' BWTCODE.ZIP the file and Mark's
Page is here
here is CALGORY with simple bijective stages
29,060 BIB
252,700 BOOK1
170,573 BOOK2
62,784 GEO
125,020 NEWS
10,677 OBJ1
79,508 OBJ2
17,319 PAPER1
26,447 PAPER2
16,544 PAPER3
5,236 PAPER4
4,848 PAPER5
12,735 PAPER6
53,781 PIC
13,020 PROGC
16,515 PROGL
11,259 PROGP
18,589 TRANS
18 File(s) 926,615 bytes
The bat file used above is:
g:bwts %1. %1.1
g:mtfq %1.1 %1.2
g:rleq %1.2 %1.3
g:arb255 %1.3 %1.4
g:unarb255 %1.4 %1.5
g:unrleq %1.5 %1.6
g:unmtfq %1.6 %1.7
g:unbwts %1.7 %1.8
fc /b %1. %1.8 >> arb.log
the code for MTFQ and etc...is HERE
taking the simple 'hello word" test file and uncompressing uing
unarb255
unrleq
unmtf
unbwts
0000 48 45 4C 4C 4F 20 57 4F 52 4C 44 21 . . . . *HELLO WORLD!*
number of bytes is 12
the result is this
0000 E2 FB C3 81 26 F6 E1 B8 F7 C0 FA FA A8 . . . *....&........*
number of bytes is 13
compressing gets original file back
David A. Scott, biject.bwts@gmail.com