Jakub Nalepa, Michal Kawulok
Selecting training data for support vector machines—datasets
- Artificially generated datasets of points positioned on a 500x500 surface:
-
Skin dataset (real-world set derived from the ECU skin image database):
- Training set
- Validation set (sampled to the appropriate size as given in the corresponding paper)
Every binary data file consists of:
- Integer number (4 bytes): number of input space dimensions (N) //N = 3
-
Vectors:
- 1 byte: class label //0 or 1
- N doubles, i.e. (8·N) bytes: Vector data
-
Skin-10k dataset (real-world set derived from the ECU skin image database):
These datasets are free to use for research, non-commercial purposes, provided that at least one paper from the following list is cited in the works that exploit them:
-
Nalepa, J., Kawulok, M.: Adaptive memetic algorithm enhanced with data geometry analysis to select training data for SVMs, Neurocomputing, Volume 185, pp. 113-132, DOI: 10.1016/j.neucom.2015.12.046, 2016. (http://www.sciencedirect.com/science/article/pii/S0925231215019839)
-
Nalepa, J., Kawulok, M.: A memetic algorithm to select training data for support vector machines, in: Proceedings of the 2014 Conference on Genetic and Evolutionary Computation (GECCO '14), pp. 573-580. ACM, New York, NY, USA (2014) (http://dl.acm.org/citation.cfm?id=2598370)
- Kawulok, M., Nalepa, J.: Support vector machines training data selection using a genetic algorithm. In: Gimelfarb, G., Hancock, E., Imiya, A., Kuijper, A., Kudo, M., Omachi, S., Windeatt, T., Yamada, K. (eds.) Structural, Syntactic, and Statistical Pattern Recognition. Lecture Notes in Computer Science, vol. 7626, pp. 557-565. Springer, Berlin (2012) (http://link.springer.com/chapter/10.1007%2F978-3-642-34166-3_61)
Updated: May 12, 2017