Determining the optimal number of clusters using distance based k-means algorithm
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Faculty of Applied Sciences, South Eastern University of Sri Lanka, Sammanthura
Abstract
In the current digital era, data is generated enormously with fast growth from
different sources, and managing such huge data is a big challenge. Clustering
algorithm is able to find hidden patterns and extract useful information from
huge datasets. Among the clustering techniques, k-means clustering algorithm
is the most commonly used unsupervised classification technique to determine
the optimal number of clusters (k). However, the choice of the optimal number
of clusters (k) is a prominent problem in the process of the k-means clustering
algorithm. In most cases, clustering huge data, k is pre-determined by
researcher, and incorrectly chosen k leads to increase computational cost. In
order to obtain the optimal number of clusters, a distance-based k-mean
algorithm was proposed with a simulated dataset. In the k-means algorithm,
two distance measures were considered namely Euclidean and Manhattan
distances. The results based on simulated data reveal that the k-means
algorithm with Euclidean distance yields the optimal number of clusters
compared to Manhattan distance. Testing on real datasets shows consistent
results as the simulated ones.
Description
Keywords
Citation
Proceedings of the 11th Annual Science Research Sessions, FAS, SEUSL, Sri Lanka 15th November 2022 Scientific Engagement for Sustainable Futuristic Innovations pp. 60.