The data which was used in our experiments descripted in the paper [KaDe]: Due to the continuous updating of data on the NCBI web site, we provide the data that we have used in our research. ################ Taxonomy linked data: ################ * taxdump.tar.gz - this file was downloaded on August 6, 2012 from ftp://ftp.ncbi.nih.gov/pub/taxonomy It includes 2 files: * nodes.dmp - represents taxonomy nodes * names.dmp - includes taxonomy names * gi_taxid_nucl.dmp.gz - this file was downloaded on December 19, 2012 from ftp://ftp.ncbi.nih.gov/pub/taxonomy It contains two columns: the GenBank identifier (gi) and taxonomy identifier (taxid). ################ Reference sequnces: ################ nt.xx.tar - xx: from 00 to 12. Compressed data included reference sequences. These files was downloaded from ftp://ftp.ncbi.nlm.nih.gov/blast/db/. The update was Jul 27, 2012. ################ Metagenomic sets: ################ * facs_full.fa - contains 100,000 reads of an average 269 bp length. These reads were acquired from the FACS web site http://facs.scilifelab.se/, originally used by Stranneheim et al. [Str]. * facs_reduced.fa - FACS_full set after reduction, which contains 93,653 reads of an average 269 bp length. * carma.fa - contains 25,000 reads of an average 265 bp length. These reads were acquired from the WebCARMA web site http://wwww.cebitec.uni-bielefeld.de/webcarma.cebitec.uni-bielefeld.de/ and originally used by Gerlach and Stoye [GeSt]. * metaphyler.fa - contains 66,841 reads of a 300 bp length. It was originally used by Liu et al. [Liu] and contained 73,086 reads from which some had no information about their origin. * phylopythia.fa - contains 114,457 unique reads of an average 961 bp length. Originally this set was used by Patil et al. [Pat] and contained 124,941 reads from which some were repeated. MetaPhyler and PhyloPythia sets courtesy of Adam Bazinet, who used them in his paper [BaCu]. Each set contains added tax number (taxid) by us. A more detailed description of data, see the article [KaDe]. HiSeq and MiSeq datasets used by us in the paper [KaDe] were downoladed from https://ccb.jhu.edu/software/kraken website. These data were originally used by Wood and Salzberg [WoSa] ################################################################################################################################ [KaDe] Kawulok J, Deorowicz S, CoMeta: Classification of metagenomes using k-mers, submitted to journal [GeSt] Gerlach W, Stoye J (2011) Taxonomic classification of metagenomic shotgun sequences with CARMA3. Nucleic acids research 39. [Str] Stranneheim H, Käller M, Allander T, Andersson B, Arvestad L, Lundeberg J (2010) Classification of DNA sequences using Bloom filters. Bioinformatics 26: 1595-1600. [Liu] Liu B, Gibbons T, Ghodsi M, Pop M (2010) MetaPhyler: Taxonomic profiling for metagenomic sequences. In: Proceedings of the 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010. pp. 95-100. [Pat] Patil KR, Haider P, Pope PB, Turnbaugh PJ, Morrison M, Scheffer T, McHardy AC (2011) Taxonomic metagenome sequence assignment with structured output models. Nature Methods 8: 191-192. [BaCu] Bazinet A, Cummings M (2012) A comparative evaluation of sequence classification programs. BMC Bioinformatics 13: 1-13. [WoSa] Wood DE, Salzberg SL: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology 2014, 15:R46.