Jakub Nalepa, Michal Kawulok

Selecting training data for support vector machines—datasets

Artificially generated datasets of points positioned on a 500x500 surface:
- 2D—random^α.bmp (other names: 2D—random)
- 2D—random^β.bmp (other names: 2D—sparse)
- 2D—chessboard^α.bmp
- 2D—chessboard^β.bmp (other names: 2D—chessboard)
- 2D—points^β.bmp
Skin dataset (real-world set derived from the ECU skin image database):
- Training set
- Validation set (sampled to the appropriate size as given in the corresponding paper)

Skin-10k dataset (real-world set derived from the ECU skin image database):
- Training set
- Test set

These datasets are free to use for research, non-commercial purposes, provided that at least one paper from the following list is cited in the works that exploit them:

Nalepa, J., Kawulok, M.: Adaptive memetic algorithm enhanced with data geometry analysis to select training data for SVMs, Neurocomputing, Volume 185, pp. 113-132, DOI: 10.1016/j.neucom.2015.12.046, 2016. (http://www.sciencedirect.com/science/article/pii/S0925231215019839)
Nalepa, J., Kawulok, M.: A memetic algorithm to select training data for support vector machines, in: Proceedings of the 2014 Conference on Genetic and Evolutionary Computation (GECCO '14), pp. 573-580. ACM, New York, NY, USA (2014) (http://dl.acm.org/citation.cfm?id=2598370)
Kawulok, M., Nalepa, J.: Support vector machines training data selection using a genetic algorithm. In: Gimelfarb, G., Hancock, E., Imiya, A., Kuijper, A., Kudo, M., Omachi, S., Windeatt, T., Yamada, K. (eds.) Structural, Syntactic, and Statistical Pattern Recognition. Lecture Notes in Computer Science, vol. 7626, pp. 557-565. Springer, Berlin (2012) (http://link.springer.com/chapter/10.1007%2F978-3-642-34166-3_61)

Updated: May 12, 2017