GrDBSCAN: A granular density–based clustering algorithm

Suchy, Dawid; Siminski, Krzysztof

doi:10.34768/amcs-2023-0022

Dawid Suchy, Krzysztof Siminski

Abstract:

Density-based spatial clustering of applications with noise (DBSCAN) is a commonly known and used algorithm for data clustering. It applies a density-based approach and can produce clusters of any shape. However, it has a drawback—its worst-case computational complexity is O(n²) with regard to the number of data items n. The paper presents GrDBSCAN: a granular modification of DBSCAN with reduced complexity. The proposed GrDBSCAN first granulates data into fuzzy granules and then runs density-based clustering on the resulting granules. The complexity of GrDBSCAN is linear with regard to the input data size and higher only for the number of granules. That number is, however, a parameter of the GrDBSCAN algorithm and is (significantly) lower than that of input data items. This results in shorter clustering time than in the case of DBSCAN. The paper is accompanied by numerical experiments. The implementation of GrDBSCAN is freely available from a public repository.

Reference:

Dawid Suchy, Krzysztof Siminski, GrDBSCAN: A granular density–based clustering algorithm, [in] International Journal of Applied Mathematics and Computer Science, 2023, volume 33, number 2, pp. 297–312.

Bibtex Entry:

@Article{id:Suchy2023GrDBSCAN,
  author   = {Dawid Suchy and Krzysztof Siminski},
  journal  = {International Journal of Applied Mathematics and Computer Science},
  title    = {GrDBSCAN: A granular density–based clustering algorithm},
  year     = {2023},
  pages    = {297–312},
  volume   = {33},
  number   = {2},
  doi      = {10.34768/amcs-2023-0022},
  abstract = {Density-based spatial clustering of applications with noise (DBSCAN) 
  is a commonly known and used algorithm for data clustering. 
  It applies a density-based approach and can produce clusters 
  of any shape. However, it has a drawback—its worst-case computational 
  complexity is O(n²) with regard to the number of data items n. 
  The paper presents GrDBSCAN: a granular modification of DBSCAN 
  with reduced complexity. The proposed GrDBSCAN first granulates 
  data into fuzzy granules and then runs density-based clustering 
  on the resulting granules. The complexity of GrDBSCAN is linear with
  regard to the input data size and higher only for the number of granules. 
  That number is, however, a parameter of the GrDBSCAN algorithm 
  and is (significantly) lower than that of input data items. This 
  results in shorter clustering time than in the case of DBSCAN. 
  The paper is accompanied by numerical experiments. The implementation 
  of GrDBSCAN is freely available from a public repository.},
}