In the era of Information Technology revolution there is a scope for generation of lot of unstructured and semi-structured information which is available in the form of web documents.The number of web articles and offline records are increasing day by day.So, lot of text data has been accumulating over the period of time. Hence, organizing and analysing those text data became troublesome. So, different people are proposing different methodologies to address this issue. But all the methodologies are not supported for all the kind of text repositories. And the requirement for clustering of similar kind of that text data and records has increased for removing multiple versions of same copy or separating similar type of text documents from the huge set of repositories. It plays very important role in the era of searching for documents and information retrieval, web mining and search engine results. Here we use content clustering which is the use of group examination to content reports with the end goal that huge sums can be sorted out into important and subject explicit clustering.In order to perform this task this research work uses two existing documents clustering algorithms, specifically K-means and DBSCAN. We studied the complete implementation of DBSCAN algorithm and proposed a modified version of the existing algorithm. From our experimental results it is observed that the modified implementation of DBSCAN algorithm gives the better clustering accuracy than existing algorithms.
Volume 12 | Issue 2