Suffix trees are a data structure that makes it convenient to do string matching against an entire data set in O(N) time. This is really wonderful, but creaing the suffix tree isn’t always that easy.

# Adaptive Huffman Compression

C# implementation of adaptive Huffman coding. Implements both the FGK and Vitter algorithm variations. Compression provided through two public classes, AdaptiveHuffmanProvider and AdaptiveHuffmanStream. Good compression ratios for text-based data

# SuffixTree

An Open Source Suffx Tree implementation. Looks as though this might be oriented towards teaching applications, as it is an interactive application, not just an implementation of the data structure.

http://bioinformatics.rit.edu/~tex/publications/suffixtree/

# ANSI C implementation of a Suffix Tree

A C implementation of a Suffix Tree. This project also includes a Perl module that can use the C code.

http://cs.haifa.ac.il/~shlomo/suffix_tree/

# Extended Application of Suffix Trees to Data Compression

By Jesper Larsson, Published in Proceedings of the IEEE Data Compression Conference 1996. Hey Jesper, how about putting a PDF version on line?

http://www.larsson.dogma.net/dccpaper.pdf

# The Context Trees of Block Sorting Compression

By Jesper Larsson. Published in Proceedings of the IEEE Data Compression Conference 1998, PDF version of the paper.

http://www.larsson.dogma.net/lev3.pdf

# Attack of the Mutant Suffix Trees

By Jesper Larsson. Published in Proceedings of the IEEE Data Compression Conference 1998, a PDF version of the paper.

http://www.larsson.dogma.net/lic.pdf

# Notes on Suffix Sorting

by Jesper Larsson. Technical report. LU-CS-TR:98-204, LUNFD6/(NFCS-3134)/1-6/(1998). A PDF version of the paper.

http://www.larsson.dogma.net/tr204.pdf

# Suffix Trees on Words

By Jesper Larsson, Arne Andersson and Kurt Swanson. Technical report. LU-CS-TR:95-158, LUNFD6/(NFCS-3107)/1-14/(1995). Early version of a paper published in Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching 1996; to appear in Algotrithmica

http://www.larsson.dogma.net/words-alg.pdf

# Jesper Larsson’s Research Links

I am a graduate student (doktorand) in algorithms and data structures at the Department of Computer Science at Lund University. Special interests are text searching and data compression.

I use the links on this page to papers, but don’t index the page itself

http://www.larsson.dogma.net/research.html

# Jesper Larsson

Jesper Larsson spent a fair amount of his years in academia

studying Suffix Trees. His home page has links to his thesis and a few other papers on suffix trees and other string matching/Data Compression topics.

# Suffix Tree

The definition from the NIST Dictionary of Algorithms and Data Structures.

http://www.nist.gov/dads/HTML/suffixtree.html

# Suffix Array

The definition from the NIST Dictionary of Algorithms and Data Structures.

http://www.nist.gov/dads/HTML/suffixarray.html

# bwtzip: A Linear-Time Portable Research-Grade Universal Data Compressor

bwtzip is an ongoing project, distributed under the GNU General Public License, to implement a Burrows-Wheeler compressor in standard, portable C++. It is research-grade in that it is highly modularized and abstracted, so that it is simple to swap out parts of the compressor without affecting anything else. This makes it easy to experiment with different algorithms at different stages of compression.

Looks like Steven T. Lavavej released a new version of bwtzip in early February, 2003. A wide variety of improvements, most of them in implementation - not visible to the end user. A description of recent changes is found here

http://stl.caltech.edu/bwtzip.html

# From Ukkonen to McCreight and Weiner: A Unifying View of Linear-Time Suffix Tree Construction

1997, Robert Giegerich, Stefan Kurtz. *We review the linear time suffix tree constructions by Weiner, McCreight, and Ukkonen. We use the terminology of the most recent algorithm, Ukkonen’s online construction, to explain its historic predecessors. * The submitter of this paper indicates that it has user-friendly terminology, always welcome in Journal papers.

http://citeseer.nj.nec.com/giegerich97from.html

# Compressed Suffix Arrays and Suffix Trees, with Applications to Text Indexing and String Matching

R. Grossi and J. S. Vitter. “Compressed Suffix Arrays and Suffix Trees, with Applications to Text Indexing and String Matching,” Proceedings of the 32nd Annual ACM Symposium on Theory of Computing (STOC ‘00), Portland, OR, May 2000, 397-406.

http://www.cs.duke.edu/~jsv/Papers/catalog/node79.html

# Suffix Trees

A short and sweet tutorial on suffix trees. What they are and how to construct them.

http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/Suffix.html

# Opportunistic data structures with applications

Two papers in which it is show how to combine the BWT with the suffix array

data structure, in order to build a sort of compressed suffix array.

In the first paper it is proven that

the space occupancy of the compressed suffix array can be bounded in

terms of the entropy of the input string. In the second paper it is

proposed and tested a practical implementation of this data structure.

The first paper will appear in the Proc. of 41st IEEE Symposium on

Foundations of Computer Science; the second one in the Proc. of the 12th

SIAM-ACM Symposium on Discrete Algorithms.

http://www.mfn.unipmn.it/~manzini/papers/focs00.html

# Strmat

A collection of C programs that do string matching and pattern discovery. This appears to be free code by D. Gusfield, who also has a book called “Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology”.

One DCL reader commented *The strmat package is wonderful.*

http://www.cs.ucdavis.edu/~gusfield/strmat.html

# Doug McIlroy’s source code page

Links to a documented implementation of a suffix sort. This may not be a compression topic per se, but suffix trees are useful for compressing data.

http://cm.bell-labs.com/cm/cs/who/doug/source.html

# Notes on suffix tree construction

Notes on suffix tree construction, some course notes for COMP 612: Graduate Seminar in Compiler Construction, includes some pointers to important papers.

DataCompression.info user David D. had this complaint: *None of the links work on this page, all there is is a short paragraph on suffix trees. I have to agree, it’s a rather strange page.*

*Posted in November 7th, 1999
*

*
*