REFRESH Bioinformatics Group

FQSqueezer

FQSqueezer—What is it?

FQSqueezer is an experimental high-end compression of FASTQ files. The main goal of the tool is to offer the best possible compression ratio with running times allowing to run it even for WGS human datasets.

How good is FQSqueezer?

FQSqueezer usually offers compression ratios tens of percent better than given by the state-of-the-art tools, like FaStore, Minicom, Spring. The running times are, however, significantly longer.

Terms of use of FQSqueezer

FQSqueezer is in general a free compression program available in source code release. More details can be found out on the download page.

Publications

+ Deorowicz, S., FQSqueezer: k-mer-based compression of sequencing data, bioRxiv.org, 2019; ():Abstract.

Motivation: The amount of genomic data that needs to be stored is huge. Therefore it is not surprising that a lot of work has been done in the field of specialized data compression of FASTQ files. The existing algorithms are, however, still imperfect and the best tools produce quite large archives.
Results: We present FQSqueezer, a novel compression algorithm for sequencing data able to process single- and paired-end reads of variable lengths. It is based on the ideas from the famous prediction by partial matching and dynamic Markov coder algorithms known from the general-purpose-compressors world. The compression ratios are often tens of percent better than offered by the state-of-the-art tools.