Big data is a collection of semi-structured, unstructured, and structured information gathered by businesses. Big data is used in predictive modelling, machine learning initiatives, and other advanced analytics uses. Clustering is dividing the population or data points into multiple groups, with the goal of making the data points in each group more similar to one another than to the data points outside of them. Big data is used by healthcare analysts to influence decisions about prevention, intervention, and health management. To arrange and find patterns in large data, we employ a variety of machine learning approaches, including supervised and unsupervised learning, especially clustering. However, managing the massive volumes of real-time data being generated presents difficulties for conventional approaches. There are two types of medical information: structured and unstructured. The evolution of big data has led to an increase in technological innovation, progress, and modernization. Clustering algorithms, a type of unsupervised learning in machine learning, group data points based on their similarity. In this study, the Framingham Heart Study dataset was analysed using several clustering techniques, including Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Mean Shift clustering, Agglomerative clustering, and Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH). The clustering performance was evaluated using the Silhouette Score and the Davies-Bouldin Index. A higher Silhouette Score indicates better clustering quality, while a lower Davies-Bouldin Index signifies improved clustering performance. Both evaluation metrics were computed and reported.
Volume 10 | Issue 9
Pages: 373-380