PENGARUH REDUKSI DIMENSI TERHADAP METODE PENGKLASTERAN BERBASIS CENTROID DAN METODE PENGKLASTERAN BERBASIS DENSITY DALAM PENGKLASTERAN DOKUMEN TEKS

Muhammad Ihsan Jambak; Rusdi Efendi

doi:10.21927/ijubi.v4i2.1918

PENGARUH REDUKSI DIMENSI TERHADAP METODE PENGKLASTERAN BERBASIS CENTROID DAN METODE PENGKLASTERAN BERBASIS DENSITY DALAM PENGKLASTERAN DOKUMEN TEKS

Authors

Muhammad Ihsan Jambak Universitas Sriwijaya http://orcid.org/0000-0003-3053-4743
Rusdi Efendi Universitas Sriwijaya

DOI:

https://doi.org/10.21927/ijubi.v4i2.1918

Keywords:

Dimension Reduction, Clustering, k-Means, DBSCAN

Abstract

Density-based clustering is usually more effective when processing data of different densities. This method is pioneered by the Density-based Applied Noise Spatial Clustering (DBSCAN) algorithm. There is a significant difference in behavior between k-Means and DBSCAN, which is processing data that contains noise. To this end, this research studies the impact of dimensionality reduction on high-dimensional data on the clustering results of the k-Means algorithm represented by the centroid method and the clustering results of the DBSCAN algorithm represented by the density method. Although the quality of the clustering results on k-Means has been improved after the numerical reduction by Singular Value Decomposition (SVD), from the initial average distance of 1.04136 to 0.003, the statistical change is not significant or considered to be the same. Therefore, it can be concluded statistically that SVD has no effect on the quality of k-Means clustering results. On the other hand, in DBSCAN, the effect of SVD dimensionality reduction is very significant. It can change the quality of the clustering results from the initial average intra-cluster distance of 76.13480 to 13.71130 or improve the quality by 555.27%. The significant impact of SVD on SVD + k-Means optimization and SVD + DBSCAN optimization cluster calculation time changes is also shown. SVD optimization can accelerate k-Means calculation time from 3.68182 seconds to 2,09091 seconds or 1.76 times. At the same time, SVD optimization accelerates the DBSCAN calculation time from 19.40000 seconds to 0.97500 seconds or 19.89 times.

Author Biographies

Muhammad Ihsan Jambak, Universitas Sriwijaya

Program Studi Manajemen Informatika

Fakultas Ilmu Komputer

Rusdi Efendi, Universitas Sriwijaya

Prodi Manajemen Informatika

References

J. Han, J. Pei, and M. Kamber, Data mining: concepts and techniques. Elsevier, 2011.

E. Alpaydin, Introduction to machine learning. MIT Press, 2014.

X. Jin and J. Han, "K-medoids clustering," Encyclopedia of Machine Learning and Data Mining, pp. 697-700, 2017.

S. Jun, S.-S. Park, and D.-S. Jang, "Document clustering method using dimension reduction and support vector clustering to overcome sparseness," Expert Systems with Applications, vol. 41, no. 7, pp. 3204-3212, 2014.

T. C. Chen, S. Sanga, T. Y. Chou, V. Cristini, and M. E. Edgerton, "Neural network with k-means clustering via pca for gene expression profile analysis," in 2009 World Congress on Computer Science and Information Engineering, 2009: IEEE, pp. 670-673.

M. I. Jambak, F. Mohammed, N. Hidayati, R. Efendi, and R. Primartha, "The Impacts of Singular Value Decomposition Algorithm Toward Indonesian Language Text Documents Clustering," in International Conference of Reliable Information and Communication Technology, 2018: Springer, pp. 173-183.

M. I. Jambak and A. I. I. Jambak, "Comparison of dimensional reduction using the Singular Value Decomposition Algorithm and the Self Organizing Map Algorithm in clustering result of text documents," in IOP Conference Series: Materials Science and Engineering, 2019, vol. 551, no. 1: IOP Publishing, p. 012046.

S. I. R. Hasanah, M. I. Jambak, and D. M. Saputra, "Comparison of Dimensional Reduction Using Singular Value Decomposition and Principal Component Analysis for Clustering Results of Indonesian Language Text Documents," in The 2nd International Conference of Applied Sciences, Mathematics, & Informatics (ICASMI) 2018, Bandar Lampung, Indonesia, 2018: Universitas Lampung.

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, "Indexing by latent semantic analysis," Journal of the American society for information science, vol. 41, no. 6, p. 391, 1990.

S. T. Dumais, "Latent semantic analysis," Annual Review of Information Science and Technology, vol. 38, no. 1, pp. 188-230, 2004, doi: 10.1002/aris.1440380105.

L. Kaufman and P. Rousseeuw, "Clustering by means of medoids. in â€˜Y. Dodge (editor) Statistical Data Analysis based on L1 Normâ€™, 405-416," ed: Elsevier/North-Holland, 1987.

T. S. Madhulatha, "Comparison between k-means and k-medoids clustering algorithms," in Advances in Computing and Information Technology: Springer, 2011, pp. 472-481.

I. Assent, "Clustering high dimensional data," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 2, no. 4, pp. 340-350, 2012.

X.-S. Yang, S. Lee, S. Lee, and N. Theera-Umpon, "Information analysis of high-dimensional data and applications," Mathematical Problems in Engineering, vol. 2015, 2015.

J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of massive datasets. Cambridge university press, 2014.

A. Kaushik and S. Ghosh, "A Survey on Optimization Approaches to K-Means Clustering using Simulated Annealing," International Journal of Scientific Engineering and Technology, vol. 3, no. 7, pp. 845-847, 2014.

U. R. Raval and C. Jani, "Implementing and Improvisation of K-means Clustering," Int. J. Comput. Sci. Mob. Comput, vol. 5, no. 5, pp. 72-76, 2016.

R. Dash and R. Dash, "Comparative analysis of K-means and genetic algorithm based data clustering," International Journal of Advanced Computer and Mathematical Sciences, vol. 3, no. 2, pp. 257-265, 2012.

B. Ristevski, S. Loshkovska, S. Dzeroski, and I. Slavkov, "A Comparison of Validation Indices for Evaluation of Clustering Results of DNA Microarray Data," The 2nd International Conference on Bioinformatics and Biomedical Engineering (ICBBE), pp. 587-591, 16-18 May 2008 2008. IEEE.

M. Adriani, J. Asian, B. Nazief, S. M. Tahaghoghi, and H. E. Williams, "Stemming Indonesian: A confix-stripping approach," ACM Transactions on Asian Language Information Processing (TALIP), vol. 6, no. 4, pp. 1-33, 2007.

B. Y. Setia Pramana, Siti Mariyah, Ibnu Santoso, Rani Nooraeni, "DATA MINING dengan R Konsep Serta Implementasi," vol. 1, p. 300, 2018.

M. Syakur, B. Khotimah, E. Rochman, and B. Satoto, "Integration k-means clustering method and elbow method for identification of the best customer profile cluster," in IOP Conference Series: Materials Science and Engineering, 2018, vol. 336, no. 1: IOP Publishing, p. 012017.

Downloads

PDF (Bahasa Indonesia)

Published

2021-12-31

Issue

Vol. 4 No. 2 (2021): Indonesian Journal of Business Intelligence (IJUBI)

Section

Articles

License

COPYRIGHT TRANSFER FORM

The copyright to this article is transferred toÂ Alma Ata University PressÂ if and when the article is accepted for publication. The undersigned hereby transfers any and all rights in and to the paper including without limitation all copyrights toÂ AAU Press. The undersigned hereby represents and warrants that the paper is original and that he/she is the author of the paper, except for material that is clearly identified as to its original source, with permission notices from the copyright owners where required. The undersigned represents that he/she has the power and authority to make and execute this assignment.

We declare that:

1. This paper has not been published in the same form elsewhere.

2. It will not be submitted anywhere else for publication prior to acceptance/rejection by this Journal.

3. A copyright permission is obtained for materials published elsewhere and which require this permission for reproduction.

Furthermore, I/We hereby transfer the unlimited rights of publication of the above mentioned paper in whole to AAU Press. The copyright transfer covers the exclusive right to reproduce and distribute the article, including reprints, translations, photographic reproductions, microform, electronic form (offline, online) or any other reproductions of similar nature.

The corresponding author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors. This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s) where applicable. After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted.

Retained Rights/Terms and Conditions

1. Authors retain all proprietary rights in any process, procedure, or article of manufacture described in the Work.

2. Authors may reproduce or authorize others to reproduce the Work or derivative works for the authors personal use or for company use, provided that the source and theÂ AAU Press copyright notice are indicated, the copies are not used in any way that impliesÂ AAU Press endorsement of a product or service of any employer, and the copies themselves are not offered for sale.

3. Although authors are permitted to re-use all or portions of the Work in other works, this does not include granting third-party requests for reprinting, republishing, or other types of re-use.

PENGARUH REDUKSI DIMENSI TERHADAP METODE PENGKLASTERAN BERBASIS CENTROID DAN METODE PENGKLASTERAN BERBASIS DENSITY DALAM PENGKLASTERAN DOKUMEN TEKS

Authors

DOI:

Keywords:

Abstract

Author Biographies

Muhammad Ihsan Jambak, Universitas Sriwijaya

Rusdi Efendi, Universitas Sriwijaya

References

Downloads

Published

Issue

Section

License

sidebarmenu

Journal Menu

LeftColumnStart

Article Template

Certificate

indexed

Total 1 Author's Countries
	Indonesia	(186)