Data-Compression.org

data compression link collection

Burrows-Wheeler Transform/Block Sorting

Block Sorting treats a body of text as a large block of strings. The strings can be sorted using a reversible algorithm of some kind, yielding a more compressible dataset. The best known example of this technique is the Burrows-Wheeler Transform, or BWT. The Burrows-Wheeler Transform refers to a reversible transformation that can make a given text more compressible. Once this transform has been applied, the text can be compressed with a combination of Move to Front encoding followed by an entropy encoder.


bsdtar, libarchive

Libarchive is a programming library that can create and read several different streaming archive formats, including most popular tar variants and the POSIX cpio format. The bsdtar program is an implementation of tar(1) that is built on top of libarchive. It started as a test harness, but is quickly moving toward becoming a candidate system tar for FreeBSD

http://people.freebsd.org/~kientzle/libarchive/

         

Posted in May 22nd, 2004

12Ghosts Zip

This package includes 12Zip and 12Zip2. The first version uses Zip compatible compression, and the second uses a BWT variant.

Version 7.0 of the package is shipping as of May, 2004

http://www.12ghosts.com/ghosts/zip.htm

         

Posted in May 14th, 2004

BWT in Matlab

Imran Akthar’s implementation of the BWT transform in Matlab. Free.

http://www.geocities.com/imran_akthar/files/bwt_matlab_code.zip

         

Posted in May 14th, 2004

Embedded BWT Compression

Tom has posted his source code for embedded BWT compression. Basically, he’s trying to pull it off with low amounts of RAM.

http://tom.iahu.ca:8080/bwt.html

         

Posted in May 4th, 2004

BAR

BAR Archiver is a free, cross-platform file compression and archiving tool written in Java. It outperforms most of the popular file
compression tools available as commercial utilities.
Compression is based on Burrows Wheeler Transformation and modified Huffman entropy coder. The compression ratio is in par
with most of the commercial grade compression utilities.

http://fermatjen.tripod.com/bar/

         

Posted in April 11th, 2004

Full-Text Searching & the Burrows-Wheeler Transform

In this article, I examine an indexing method that lets you find any character sequence in the source text in time only proportional to the sequence length using a structure that can compress the entire source text and index into less space than the text alone. This technique is exceptionally fast at detecting and counting occurrences of any string in the source text.

http://www.ddj.com/documents/s=8944/ddj0312d/0312d.htm?temp=jiRiV3fTQc

         

Posted in March 20th, 2004

WinBZip2

WinBZip2 is an small freeware utility for working with files produced by bzip2 utility (usually it’s files with .bz2 extension) and also create such files. Main features include the compressing, decompressing and testing the integrity of bzip2 files.

http://www.irnis.net/soft/winbzip2/

         

Posted in November 24th, 2003

Transform

Transform is a BWT compressor written by Michael Bone. It supports a number of very interesting features, such as automatic Base 64 representation and image output. Shareware.

Version 1.02 shipped in October, 2003.

http://users.senet.com.au/~mjbone/

         

Posted in November 8th, 2003

zipstream, bzip2stream: iostream wrappers for the zlib and bzip2 libraries

This article describes STL-compliant iostream implementations that compress and decompress using the deflate and bzip2 algorithms. It makes it really easy to use compressed streams in your C++ app.

Updated July, 2003, to fix a gzip header problem.

Updated August, 2003 to fix a couple of minor problems.

http://www.codeproject.com/vcpp/stl/zipstream.asp

         

Posted in September 28th, 2003

UHBC 1.0

UHBC is a blocksorting compressor optimized for high compression ratios. The program uses recent research results by S. Deorowicz and J. Abel for improved second stage processing after the BWT. Some extensions and sophisticated modeling provide top compression ratios, giving the best known result for Calgary Corpus (2.208 bps average). UHBC is free for non-commercial use. No source available.

Reader Grebnov I. had this to say: Compression is very good, but too slow + no source code..

ftp://ftp.elf.stuba.sk/pub/pc/pack/uhbc10.zip

* * *    

Posted in July 19th, 2003

GRZip

GRZip - is a high-performance file compressor based on Burrows-Wheeler Transform and Advanced Weighted Frequency Counting. It uses The Block-Sorting Lossless Data Compression Algorithm, which has received considerable attention over recent years for both its simplicity and effectiveness. This implementation has compression rate of 2.364 bps on the Calgary Corpus. The site has Windows, Linux, and Dos executables along with source.

http://magics.h10.ru/main.php?page=GRZip&folder=projects

* * * * *

Posted in July 19th, 2003

Text Preprocessing for Data Compression

by Jürgen Abel and William Teahan. This paper looks at a few different techniques for preprocessing data before performing text compression, and compares the gains achieved when combining the preprocessors with PPM, BWT, and LZ algorithms.

http://www.data-compression.info/JuergenAbel/Preprints/Preprint_Text_Preprocessing.pdf

* * * * *

Posted in July 19th, 2003

Xceed Streaming Compression Library

The Xceed Streaming Compression Library is a high-performance “raw” compression library. It offers the ability to compress and decompress streaming data, buffers, strings or single files and supports multiple compressed data formats. Unlike the Xceed Zip Compression Library, this ultralight library doesn’t offer Zip file handling capabilities.

http://www.xceedsoft.com/products/Stream/index.htm

* * * * *

Posted in June 21st, 2003

UFUP

UFUP advertises itself as a “decompression utility that handles common Unix package formats such as bzip2, gzip, and tar.”

http://sourceforge.net/projects/ufup/

         

Posted in May 30th, 2003

ABC - Advanced Blocksorting Compressor

ABC is a free data compression program based on the Burrows-Wheeler transformation. The source code is free for academic, research and educational use as depicted in the Abel Public License (APL). The program is developed in DELPHI as a command line program just like GZIP.

Update: Jürgen has released the source code for ABC at long last! The Delphi source is available for download from the web site and can be used under his own APL.

http://www.data-compression.info/ABC/

* * * * *

Posted in April 25th, 2003

Grundlagen des Burrows-Wheeler-Kompressionsalgorithmus

This is another German preprint of Jürgen Abel describing the principles of the Burrows-Wheeler Compression Algorithm. An
implementation of a BWT based compressor with a compression rate of 2.25 bps on the Calgary Corpus is presented. The paper will be
published in the German journal “Informatik - Forschung und Entwicklung”, Springer-Verlag Heidelberg, Association for Informatics
(Gesellschaft für Informatik eV.)

http://www.data-compression.info/JuergenAbel/Preprints/Preprint_Grundlagen_BWCA.pdf

* * * * *

Posted in April 14th, 2003

Michael Walden’s Compression Pointers

A comprehensive set of compression pointers. Unfortunately, Michael is using some sort of software that makes bookmarking into his index impossible. So instead, you must link to the main page, shown here, and locate the links to “Data Compression”. Under that he has links to General Resources, Software, and Theory.

http://www.users.voicenet.com/~mwalden/

         

Posted in March 23rd, 2003

Improvements to the Burrows-Wheeler Compression Algorithm: After BWT Stages

This preprint of Jürgen Abel gives a short introduction into the BWCA field and proposes several improvements for the WFC stage and the IF stage.
It further introduces a new RLE scheme for bypassing the run length
information around the WFC stage. The paper is the basis of the
BWCA program ABC and the revealed approach achieves a
compression rate of 2.238 bps, which is the best result for a pure
BWCA without any preprocessing before the BWT to date (March 2003).

http://www.data-compression.info/JuergenAbel/Preprints/Preprint_After_BWT_Stages.pdf

* * * * *

Posted in March 22nd, 2003

Verlustlose Datenkompression auf Grundlage der Burrows-Wheeler-Transformation

This is a preprint of a paper by Jürgen Abel describing the functionality
of a basic but quite fast Burrows-Wheeler compressor. Jürgen reports that this will be published in PIK, a German-language journal on Communications and Information Theory.

http://www.data-compression.info/JuergenAbel/Preprints/Preprint_Verlustlose_Datenkompressi
on_BWT.pdf

* * * * *

Posted in March 21st, 2003

Burrows-Wheeler Transformation / Block Sorting (BWT)

Jürgen Abel has done an enormous amount of research on the Burrows-Wheeler Transform, and has published the results on his web site. On this page you will find:

  • A summary of this compression technique.
  • Links to over 70 online papers.
  • Links to at least that many people involved in BWT research or development.
  • Extensive links to BWT source code.

This web page may now be the definitive source of information for this field.

http://www.data-compression.info/Algorithms/BWT

* * * * *

Posted in March 18th, 2003

Bzip2 classes

Gilad Novik created a pair of classes to compress and decompress data using the bzip2 format.

http://www.codeproject.com/useritems/BZ2Class.asp

         

Posted in February 25th, 2003

Compressia

Compressia is a BWT-based archive utility with particularly high compression ratios. On Calgary, Canterbury, ACT-Text and ACT-Exe it surpasses all other BWT utilities. On Canterbury corpus it also surpasses all PPM utilities. Beta version 1.0 is available on the web site as of February, 2003.

DataCompression.info reader Juan L. said It is by far the best I’ve tried.

http://www.Compressia.com

* * * * *

Posted in February 17th, 2003

BZip2 for Java

Aftex Software makes a Java version of Bzip2. This includes input and output stream classes, which can be used in other Java applications. The program has both a GUI and command line interface.

http://www.aftexsw.com/bzip.html

         

Posted in February 3rd, 2003

bwtzip: A Linear-Time Portable Research-Grade Universal Data Compressor

bwtzip is an ongoing project, distributed under the GNU General Public License, to implement a Burrows-Wheeler compressor in standard, portable C++. It is research-grade in that it is highly modularized and abstracted, so that it is simple to swap out parts of the compressor without affecting anything else. This makes it easy to experiment with different algorithms at different stages of compression.

Looks like Steven T. Lavavej released a new version of bwtzip in early February, 2003. A wide variety of improvements, most of them in implementation - not visible to the end user. A description of recent changes is found here

http://stl.caltech.edu/bwtzip.html

         

Posted in December 11th, 2002

Jürgen Abel

Jürgen is the proprietor of

www.data-compression.info
,
an excellent resource for developers and researchers. Jürgen has a good supply of links to papers, conferences, books, etc. on the site, as well as executables and source for ABC, a freeware BWT compressor he wote in Delphi.

http://www.data-compression.info/JuergenAbel/

* * * * *

Posted in December 10th, 2002

BWTCoder: Industrial strength BWT compression

This is a preliminary shot at creating an open source BWT compression engine. Things look very preliminary at this point with just a couple of files available for download and not much message traffic.

http://sourceforge.net/projects/bwtcoder/

         

Posted in December 9th, 2002

szip homepage

Szip is a freeware portable general purpose lossless compression program. It has a high speed and compression, but high memory demands (up to 20MB) too. The compression is done using a variant of blocksorting, which explains its rather high memory requirements.

Update: Michael Schindler has at long last posted the source code for szip.

http://www.compressconsult.com/szip/

* * * * *

Posted in October 31st, 2002

Embedded BWT Compression

An implementation of BWT designed with the goal of minimizing memory usage. Source code and a documentation page.

http://www.iahu.ca:8080/bwt.html

         

Posted in October 31st, 2002

Karl Malbrain - BWT Source

Karl has created a complete BWT package, and has posted the source on this site. He also has an adaptation of N. Jesper Larsson’s
Burrows-Wheeler Suffix Sorting for your perusal.

http://www.geocities.com/malbrain

         

Posted in October 30th, 2002

Sax.net Streaming Compression

Sax.net Streaming Compression helps you keep your data small and fast. Use high-performance compression and data compression code, using a class library that was designed from the ground up for integration with the Microsoft .NET framework.

In addition to being able to specify whether to prefer speed over size, Sax.net Compression offers you a choice of two compression algorithms: Industry-standard Deflate (ZIP) compression, and the newer Burrows-Wheeler (BZip2) transform, which generates especially great results when compressing XML data.

http://www.sax.net/dotnet/compression/

* * * * *

Posted in October 30th, 2002