Article citationsMore >>

Ha, Le Quan, Sicilia-Garcia, E. I., Ming, Ji, and Smith, F. J. (2002), extension of Zipf’s law to words and phrases, in Proceedings of COLING 2002, Taipei, Taiwan.

has been cited by the following article:

Article

Investigating the Distribution of Arabic and English Keywords and Their Progress Over Different Text File Formats

1Computer science and Information Technology Department, Mazoon University College, Muscat, Sultanate of Oman


American Journal of Computing Research Repository. 2013, Vol. 1 No. 1, 1-5
DOI: 10.12691/ajcrr-1-1-1
Copyright © 2013 Science and Education Publishing

Cite this paper:
Boumedyen Shannaq. Investigating the Distribution of Arabic and English Keywords and Their Progress Over Different Text File Formats. American Journal of Computing Research Repository. 2013; 1(1):1-5. doi: 10.12691/ajcrr-1-1-1.

Correspondence to: Boumedyen  Shannaq, Computer science and Information Technology Department, Mazoon University College, Muscat, Sultanate of Oman. Email: aboumedyen@gmail.com

Abstract

This paper explicates a systematic approach of implementing text format categorization. It also emphasizes defined corpus linguistics and accordingly demonstrates how various Text files Html, Pdf, Doc and Txt format respectively could be analyzed. This work concentrates on comparing Arabic text format with English text format, for which various text formats have been considered. Hence the idea is implemented by calculating a distributed factor for the keywords distribution with respect to Arabic and English text documentation. All the text selected is from the Computer Technology domain. The text categorization process is implemented on the text collection and consists of two main corpus namely, Arabic and English text respectively. The obtained results show that the Arabic text format document is well distributed in Doc files compared to the English text document which is well distributed in Xml files. These results shall contribute in handling and building an effective Electronic Learning System for Arabic and English Texts. The results and conclusions are presented here with various graphical outputs for better understanding.

Keywords