Compressing strings.

The Pro version of PPL comes fully loaded with different compression and decompression methods from two simple functions: Compress() and UnCompress(). Here are the different compression methods supported by PPL:

_RLE

RLE, or Run Length Encoding, is a very simple method for lossless compression. It simply replaces repeated bytes with a short description of which byte to repeat, and how many times to repeat it.

Though simple and obviously very inefficient fore general purpose compression, it can be very useful at times (it is used in JPEG compression, for instance).

_HUFFMAN

Huffman encoding is one of the best methods for lossless compression. It replaces each symbol with an alternate binary representation, whose length is determined by the frequency of the particular symbol.

Common symbols are represented by few bits, while uncommon symbols are represented by many bits.

The Huffman algorithm is optimal in the sense that changing any of the binary codings of any of the symbols will result in a less compact representation. However, it does not deal with the ordering or repetition of symbols or sequences of symbols.

_LZ

There are many different variants of the Lempel-Ziv compression scheme. The Basic Compression Library has a fairly straight forward implementation of the LZ77 algorithm (Lempel-Ziv, 1977) that performs very well, while the source code should be quite easy to follow. The LZ coder can be used for general purpose compression, and performs exceptionally well for compressing text. It can also be used in combination with the provided RLE and Huffman coders (in the order: RLE, LZ, Huffman) to gain some extra compression in most situations.

Lets take the following code:

in$ = LoadStr(AppPath$+ “MyFile.txt”, insize$);
New(out$, insize$ * 2);
outsize$ = Compress(_RLE, in$, out$, insize$);


This code will load file MyFile.txt into variable in$, return the size in bytes in variable insize$. We then need to create an output buffer that is at least equal or preferably greater that the original input buffer. We then apply the RLE compression method to in$ and outputting the result in the out$ variable returning the new size of the out$ variable in outsize$.

You can then decompress the out$ buffer to a new newin$ buffer with the following:

New(newin$, outsize$ * 2);
newinsize$ = Compress(_RLE, out$, newinsize$, outsize$);