is an editor developed to assist a preparation of TeX/LaTeX documents.
I wrote it together with Adam Skórczyński.
DSRC—DNA Sequence Reads Compression
DNA Sequence Reads Compressor
is an application designed for compression of data files containing reads from DNA sequencing. The amount of such files can be huge, e.g., a few (or tens) of gigabytes, so a need for a robust data compression tool is clear. Usually universal compression programs like gzip or bzip2 are used for this purpose, but it is obvious that some specialized tool can work better.
GDC—Genome Fifferential Compressor
Genome Differential Compressor
is a utility designed for compression of genome collections from the same species. The amount of such collections can be huge, e.g., a few (or tens) of gigabytes, so a need for a robust data compression tool is clear. Universal compression programs like gzip or bzip2 might be used for this purpose, but it is obvious that a specialized tool can work much better, since a universal compressor does not use the properties of such data sets, e.g., long approximate repetitions at long distances.
TGC—Thousands Genome Compressor
Thousands Genome Compressor
is a tool to estimate the boundaries of compression ratio for human genome compression.
is a utility designed for counting k-mers (sequences of consecutive k symbols) in a set of reads from genome sequencing projects. K-mer counting is important for many bioinformatics applications, e.g., developing de Bruijn graph assemblers. Building de Bruijn graphs is a commonly used approach for genome assembly with data from second-generation sequencer. Unfortunately, sequencing errors (frequent in practice) results in huge memory requirements for de Bruijn graphs, as well as long build time. One of the popular approaches to handle this problem is filtering the input reads in such a way that unique k-mers (very likely obtained as a result of an error) are discarded.
Finite state automata construction
A source code which accompanies the paper
Ciura M., Deorowicz S., How to squeeze a lexicon.
In this paper, we describe a compact way of storing a lexicon using a finite state automata.