American Journal of Information Systems. 2013, 1(1), 26-30
DOI: 10.12691/ajis-1-1-4
Adapt Clustering Methods for Arabic Documents

Boumedyen Shannaq1,

1Computer science and Information Technology Department, Mazoon College, “University College”, Muscat, Sultanate of Oman

Pub. Date: November 26, 2013

This research paper develops new clustering method (FWC) and further proposes a new approach to filtering data collected from internet resources. The focus of this research paper is clustering groups’ data instances into subsets in such a manner that similar instances are grouped together, while different instances belong to different groups. The instances are thereby organized into an efficient representation that characterizes the population being sampled thereby reducing the gigantic size of retrieved data. This has been done by removing dissimilar text files, and grouping similar documents into homogeneous clusters. Arabic text files of 974 MB has been collected, processed, analyzed and filtered by using common clustering methods. This new clustering methods are presented, divided into: hierarchical, partitioning, density-based, model-based and soft-computing methods. Following the methods, the challenges of performing clustering in large data sets are discussed and tested by the proposed new clustering method. Two experiments were conducted to establish the effectiveness of FWC methods and the obtained results show that the new FCW method suggested in this paper produced better results and outperformed existing clustering methods.

