Article citationsMore >>

Shannaq B., Aleksandrov V.,“ Clustering the Arabic Documents(CAD)”, Universal Journal of Applied computer Science and Technology (UNIASCIT), Vol. 1 No. 3, pp. 90-94, 2011.

has been cited by the following article:

Article

Adapt Clustering Methods for Arabic Documents

1Computer science and Information Technology Department, Mazoon College, “University College”, Muscat, Sultanate of Oman


American Journal of Information Systems. 2013, Vol. 1 No. 1, 26-30
DOI: 10.12691/ajis-1-1-4
Copyright © 2013 Science and Education Publishing

Cite this paper:
Boumedyen Shannaq. Adapt Clustering Methods for Arabic Documents. American Journal of Information Systems. 2013; 1(1):26-30. doi: 10.12691/ajis-1-1-4.

Correspondence to: Boumedyen  Shannaq, Computer science and Information Technology Department, Mazoon College, “University College”, Muscat, Sultanate of Oman. Email: aboumedyen@gmail.com

Abstract

This research paper develops new clustering method (FWC) and further proposes a new approach to filtering data collected from internet resources. The focus of this research paper is clustering groups’ data instances into subsets in such a manner that similar instances are grouped together, while different instances belong to different groups. The instances are thereby organized into an efficient representation that characterizes the population being sampled thereby reducing the gigantic size of retrieved data. This has been done by removing dissimilar text files, and grouping similar documents into homogeneous clusters. Arabic text files of 974 MB has been collected, processed, analyzed and filtered by using common clustering methods. This new clustering methods are presented, divided into: hierarchical, partitioning, density-based, model-based and soft-computing methods. Following the methods, the challenges of performing clustering in large data sets are discussed and tested by the proposed new clustering method. Two experiments were conducted to establish the effectiveness of FWC methods and the obtained results show that the new FCW method suggested in this paper produced better results and outperformed existing clustering methods.

Keywords