PENGARUH REDUKSI DIMENSI TERHADAP METODE PENGKLASTERAN BERBASIS CENTROID DAN METODE PENGKLASTERAN BERBASIS DENSITY DALAM PENGKLASTERAN DOKUMEN TEKS

Authors

DOI:

https://doi.org/10.21927/ijubi.v4i2.1918

Keywords:

Dimension Reduction, Clustering, k-Means, DBSCAN

Abstract

Density-based clustering is usually more effective when processing data of different densities. This method is pioneered by the Density-based Applied Noise Spatial Clustering (DBSCAN) algorithm. There is a significant difference in behavior between k-Means and DBSCAN, which is processing data that contains noise. To this end, this research studies the impact of dimensionality reduction on high-dimensional data on the clustering results of the k-Means algorithm represented by the centroid method and the clustering results of the DBSCAN algorithm represented by the density method. Although the quality of the clustering results on k-Means has been improved after the numerical reduction by Singular Value Decomposition (SVD), from the initial average distance of 1.04136 to 0.003, the statistical change is not significant or considered to be the same. Therefore, it can be concluded statistically that SVD has no effect on the quality of k-Means clustering results. On the other hand, in DBSCAN, the effect of SVD dimensionality reduction is very significant. It can change the quality of the clustering results from the initial average intra-cluster distance of 76.13480 to 13.71130 or improve the quality by 555.27%. The significant impact of SVD on SVD + k-Means optimization and SVD + DBSCAN optimization cluster calculation time changes is also shown. SVD optimization can accelerate k-Means calculation time from 3.68182 seconds to 2,09091 seconds or 1.76 times. At the same time, SVD optimization accelerates the DBSCAN calculation time from 19.40000 seconds to 0.97500 seconds or 19.89 times.

Author Biographies

Muhammad Ihsan Jambak, Universitas Sriwijaya

Program Studi Manajemen Informatika

Fakultas Ilmu Komputer

Rusdi Efendi, Universitas Sriwijaya

Prodi Manajemen Informatika

References

J. Han, J. Pei, and M. Kamber, Data mining: concepts and techniques. Elsevier, 2011.

E. Alpaydin, Introduction to machine learning. MIT Press, 2014.

X. Jin and J. Han, "K-medoids clustering," Encyclopedia of Machine Learning and Data Mining, pp. 697-700, 2017.

S. Jun, S.-S. Park, and D.-S. Jang, "Document clustering method using dimension reduction and support vector clustering to overcome sparseness," Expert Systems with Applications, vol. 41, no. 7, pp. 3204-3212, 2014.

T. C. Chen, S. Sanga, T. Y. Chou, V. Cristini, and M. E. Edgerton, "Neural network with k-means clustering via pca for gene expression profile analysis," in 2009 World Congress on Computer Science and Information Engineering, 2009: IEEE, pp. 670-673.

M. I. Jambak, F. Mohammed, N. Hidayati, R. Efendi, and R. Primartha, "The Impacts of Singular Value Decomposition Algorithm Toward Indonesian Language Text Documents Clustering," in International Conference of Reliable Information and Communication Technology, 2018: Springer, pp. 173-183.

M. I. Jambak and A. I. I. Jambak, "Comparison of dimensional reduction using the Singular Value Decomposition Algorithm and the Self Organizing Map Algorithm in clustering result of text documents," in IOP Conference Series: Materials Science and Engineering, 2019, vol. 551, no. 1: IOP Publishing, p. 012046.

S. I. R. Hasanah, M. I. Jambak, and D. M. Saputra, "Comparison of Dimensional Reduction Using Singular Value Decomposition and Principal Component Analysis for Clustering Results of Indonesian Language Text Documents," in The 2nd International Conference of Applied Sciences, Mathematics, & Informatics (ICASMI) 2018, Bandar Lampung, Indonesia, 2018: Universitas Lampung.

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, "Indexing by latent semantic analysis," Journal of the American society for information science, vol. 41, no. 6, p. 391, 1990.

S. T. Dumais, "Latent semantic analysis," Annual Review of Information Science and Technology, vol. 38, no. 1, pp. 188-230, 2004, doi: 10.1002/aris.1440380105.

L. Kaufman and P. Rousseeuw, "Clustering by means of medoids. in ‘Y. Dodge (editor) Statistical Data Analysis based on L1 Norm’, 405-416," ed: Elsevier/North-Holland, 1987.

T. S. Madhulatha, "Comparison between k-means and k-medoids clustering algorithms," in Advances in Computing and Information Technology: Springer, 2011, pp. 472-481.

I. Assent, "Clustering high dimensional data," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 2, no. 4, pp. 340-350, 2012.

X.-S. Yang, S. Lee, S. Lee, and N. Theera-Umpon, "Information analysis of high-dimensional data and applications," Mathematical Problems in Engineering, vol. 2015, 2015.

J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of massive datasets. Cambridge university press, 2014.

A. Kaushik and S. Ghosh, "A Survey on Optimization Approaches to K-Means Clustering using Simulated Annealing," International Journal of Scientific Engineering and Technology, vol. 3, no. 7, pp. 845-847, 2014.

U. R. Raval and C. Jani, "Implementing and Improvisation of K-means Clustering," Int. J. Comput. Sci. Mob. Comput, vol. 5, no. 5, pp. 72-76, 2016.

R. Dash and R. Dash, "Comparative analysis of K-means and genetic algorithm based data clustering," International Journal of Advanced Computer and Mathematical Sciences, vol. 3, no. 2, pp. 257-265, 2012.

B. Ristevski, S. Loshkovska, S. Dzeroski, and I. Slavkov, "A Comparison of Validation Indices for Evaluation of Clustering Results of DNA Microarray Data," The 2nd International Conference on Bioinformatics and Biomedical Engineering (ICBBE), pp. 587-591, 16-18 May 2008 2008. IEEE.

M. Adriani, J. Asian, B. Nazief, S. M. Tahaghoghi, and H. E. Williams, "Stemming Indonesian: A confix-stripping approach," ACM Transactions on Asian Language Information Processing (TALIP), vol. 6, no. 4, pp. 1-33, 2007.

B. Y. Setia Pramana, Siti Mariyah, Ibnu Santoso, Rani Nooraeni, "DATA MINING dengan R Konsep Serta Implementasi," vol. 1, p. 300, 2018.

M. Syakur, B. Khotimah, E. Rochman, and B. Satoto, "Integration k-means clustering method and elbow method for identification of the best customer profile cluster," in IOP Conference Series: Materials Science and Engineering, 2018, vol. 336, no. 1: IOP Publishing, p. 012017.

Published

2021-12-31