Article citationsMore >>

Qu, W. G., Chen, X. H., Ji, G. L. Automatic Extraction of Word Collocation Based on Frame. Computer Engineering, 2004, 30(23), pp. 22-24. (in Chinese)

has been cited by the following article:

Article

DACE: Extracting and Exploring Large Scale Chinese Web Collocations with Distributed Computing

1College of Computer Science, Yangtze University, Jingzhou, Hubei, China


American Journal of Information Systems. 2017, Vol. 5 No. 1, 27-32
DOI: 10.12691/ajis-5-1-4
Copyright © 2017 Science and Education Publishing

Cite this paper:
Lan Huang, Juan Zhou, Jing Xue, Yongxing Li, Youfu Du. DACE: Extracting and Exploring Large Scale Chinese Web Collocations with Distributed Computing. American Journal of Information Systems. 2017; 5(1):27-32. doi: 10.12691/ajis-5-1-4.

Correspondence to: Lan  Huang, College of Computer Science, Yangtze University, Jingzhou, Hubei, China. Email: lanhuang@yangtzeu.edu.cn

Abstract

Words that often occur together form collocations. Collocations are important language components and have been used to facilitate many natural language processing tasks, including natural language generation, machine translation, information retrieval, sentiment analysis and language learning. Meanwhile, collocations are difficult to capture, especially for second language learners; and new collocations develop quickly nowadays, especially with the help of the affluent user generated content on the Web. In this paper we present an automatic collocation extraction and exploration system for the Chinese language: the DACE system. We identify collocations using three measures: frequency, mutual information and χ2-test. The system was built upon distributed computing frameworks so as to efficiently process large scale corpora. Empirical evaluation and analysis of the system showed the effectiveness of the collocation measures and the efficiency of the distributed computing processes.

Keywords