Dawid Suchy, Krzysztof Siminski
Abstract:
Density-based spatial clustering of applications with noise (DBSCAN) is a commonly known and used algorithm for data clustering. It applies a density-based approach and can produce clusters of any shape. However, it has a drawback—its worst-case computational complexity is O(n²) with regard to the number of data items n. The paper presents GrDBSCAN: a granular modification of DBSCAN with reduced complexity. The proposed GrDBSCAN first granulates data into fuzzy granules and then runs density-based clustering on the resulting granules. The complexity of GrDBSCAN is linear with regard to the input data size and higher only for the number of granules. That number is, however, a parameter of the GrDBSCAN algorithm and is (significantly) lower than that of input data items. This results in shorter clustering time than in the case of DBSCAN. The paper is accompanied by numerical experiments. The implementation of GrDBSCAN is freely available from a public repository.
Reference:
Dawid Suchy, Krzysztof Siminski, GrDBSCAN: A granular density–based clustering algorithm, [in] International Journal of Applied Mathematics and Computer Science, 2023, volume 33, number 2, pp. 297–312.
Bibtex Entry:
@Article{id:Suchy2023GrDBSCAN,
author = {Dawid Suchy and Krzysztof Siminski},
journal = {International Journal of Applied Mathematics and Computer Science},
title = {GrDBSCAN: A granular density–based clustering algorithm},
year = {2023},
pages = {297–312},
volume = {33},
number = {2},
doi = {10.34768/amcs-2023-0022},
abstract = {Density-based spatial clustering of applications with noise (DBSCAN)
is a commonly known and used algorithm for data clustering.
It applies a density-based approach and can produce clusters
of any shape. However, it has a drawback—its worst-case computational
complexity is O(n²) with regard to the number of data items n.
The paper presents GrDBSCAN: a granular modification of DBSCAN
with reduced complexity. The proposed GrDBSCAN first granulates
data into fuzzy granules and then runs density-based clustering
on the resulting granules. The complexity of GrDBSCAN is linear with
regard to the input data size and higher only for the number of granules.
That number is, however, a parameter of the GrDBSCAN algorithm
and is (significantly) lower than that of input data items. This
results in shorter clustering time than in the case of DBSCAN.
The paper is accompanied by numerical experiments. The implementation
of GrDBSCAN is freely available from a public repository.},
}