PENGARUH REDUKSI DIMENSI TERHADAP METODE PENGKLASTERAN BERBASIS CENTROID DAN METODE PENGKLASTERAN BERBASIS DENSITY DALAM PENGKLASTERAN DOKUMEN TEKS

Muhammad Ihsan Jambak, Rusdi Efendi

Abstract


Density-based clustering is usually more effective when processing data of different densities. This method is pioneered by the Density-based Applied Noise Spatial Clustering (DBSCAN) algorithm. There is a significant difference in behavior between k-Means and DBSCAN, which is processing data that contains noise. To this end, this research studies the impact of dimensionality reduction on high-dimensional data on the clustering results of the k-Means algorithm represented by the centroid method and the clustering results of the DBSCAN algorithm represented by the density method. Although the quality of the clustering results on k-Means has been improved after the numerical reduction by Singular Value Decomposition (SVD), from the initial average distance of 1.04136 to 0.003, the statistical change is not significant or considered to be the same. Therefore, it can be concluded statistically that SVD has no effect on the quality of k-Means clustering results. On the other hand, in DBSCAN, the effect of SVD dimensionality reduction is very significant. It can change the quality of the clustering results from the initial average intra-cluster distance of 76.13480 to 13.71130 or improve the quality by 555.27%. The significant impact of SVD on SVD + k-Means optimization and SVD + DBSCAN optimization cluster calculation time changes is also shown. SVD optimization can accelerate k-Means calculation time from 3.68182 seconds to 2,09091 seconds or 1.76 times. At the same time, SVD optimization accelerates the DBSCAN calculation time from 19.40000 seconds to 0.97500 seconds or 19.89 times.

Keywords


Dimension Reduction, Clustering, k-Means, DBSCAN

Full Text:

PDF (Indonesian)

References


J. Han, J. Pei, and M. Kamber, Data mining: concepts and techniques. Elsevier, 2011.

E. Alpaydin, Introduction to machine learning. MIT Press, 2014.

X. Jin and J. Han, "K-medoids clustering," Encyclopedia of Machine Learning and Data Mining, pp. 697-700, 2017.

S. Jun, S.-S. Park, and D.-S. Jang, "Document clustering method using dimension reduction and support vector clustering to overcome sparseness," Expert Systems with Applications, vol. 41, no. 7, pp. 3204-3212, 2014.

T. C. Chen, S. Sanga, T. Y. Chou, V. Cristini, and M. E. Edgerton, "Neural network with k-means clustering via pca for gene expression profile analysis," in 2009 World Congress on Computer Science and Information Engineering, 2009: IEEE, pp. 670-673.

M. I. Jambak, F. Mohammed, N. Hidayati, R. Efendi, and R. Primartha, "The Impacts of Singular Value Decomposition Algorithm Toward Indonesian Language Text Documents Clustering," in International Conference of Reliable Information and Communication Technology, 2018: Springer, pp. 173-183.

M. I. Jambak and A. I. I. Jambak, "Comparison of dimensional reduction using the Singular Value Decomposition Algorithm and the Self Organizing Map Algorithm in clustering result of text documents," in IOP Conference Series: Materials Science and Engineering, 2019, vol. 551, no. 1: IOP Publishing, p. 012046.

S. I. R. Hasanah, M. I. Jambak, and D. M. Saputra, "Comparison of Dimensional Reduction Using Singular Value Decomposition and Principal Component Analysis for Clustering Results of Indonesian Language Text Documents," in The 2nd International Conference of Applied Sciences, Mathematics, & Informatics (ICASMI) 2018, Bandar Lampung, Indonesia, 2018: Universitas Lampung.

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, "Indexing by latent semantic analysis," Journal of the American society for information science, vol. 41, no. 6, p. 391, 1990.

S. T. Dumais, "Latent semantic analysis," Annual Review of Information Science and Technology, vol. 38, no. 1, pp. 188-230, 2004, doi: 10.1002/aris.1440380105.

L. Kaufman and P. Rousseeuw, "Clustering by means of medoids. in ‘Y. Dodge (editor) Statistical Data Analysis based on L1 Norm’, 405-416," ed: Elsevier/North-Holland, 1987.

T. S. Madhulatha, "Comparison between k-means and k-medoids clustering algorithms," in Advances in Computing and Information Technology: Springer, 2011, pp. 472-481.

I. Assent, "Clustering high dimensional data," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 2, no. 4, pp. 340-350, 2012.

X.-S. Yang, S. Lee, S. Lee, and N. Theera-Umpon, "Information analysis of high-dimensional data and applications," Mathematical Problems in Engineering, vol. 2015, 2015.

J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of massive datasets. Cambridge university press, 2014.

A. Kaushik and S. Ghosh, "A Survey on Optimization Approaches to K-Means Clustering using Simulated Annealing," International Journal of Scientific Engineering and Technology, vol. 3, no. 7, pp. 845-847, 2014.

U. R. Raval and C. Jani, "Implementing and Improvisation of K-means Clustering," Int. J. Comput. Sci. Mob. Comput, vol. 5, no. 5, pp. 72-76, 2016.

R. Dash and R. Dash, "Comparative analysis of K-means and genetic algorithm based data clustering," International Journal of Advanced Computer and Mathematical Sciences, vol. 3, no. 2, pp. 257-265, 2012.

B. Ristevski, S. Loshkovska, S. Dzeroski, and I. Slavkov, "A Comparison of Validation Indices for Evaluation of Clustering Results of DNA Microarray Data," The 2nd International Conference on Bioinformatics and Biomedical Engineering (ICBBE), pp. 587-591, 16-18 May 2008 2008. IEEE.

M. Adriani, J. Asian, B. Nazief, S. M. Tahaghoghi, and H. E. Williams, "Stemming Indonesian: A confix-stripping approach," ACM Transactions on Asian Language Information Processing (TALIP), vol. 6, no. 4, pp. 1-33, 2007.

B. Y. Setia Pramana, Siti Mariyah, Ibnu Santoso, Rani Nooraeni, "DATA MINING dengan R Konsep Serta Implementasi," vol. 1, p. 300, 2018.

M. Syakur, B. Khotimah, E. Rochman, and B. Satoto, "Integration k-means clustering method and elbow method for identification of the best customer profile cluster," in IOP Conference Series: Materials Science and Engineering, 2018, vol. 336, no. 1: IOP Publishing, p. 012017.




DOI: http://dx.doi.org/10.21927/ijubi.v4i2.1918

Refbacks

  • There are currently no refbacks.


Copyright (c) 2021 Indonesian Journal of Business Intelligence (IJUBI)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Lisensi Creative Commons
IJUBI by https://ejournal.almaata.ac.id/index.php/IJUBI is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

 

View My Stats
Indonesian Journal of Business Intelligence (IJUBI)
Department of Information System
Alma Ata University
Email: ijubi@almaata.ac.id