American Journal of Software Engineering
ISSN (Print): 2379-5271 ISSN (Online): 2379-528X Website: http://www.sciepub.com/journal/ajse Editor-in-chief: Vicente Garcia Diaz
Open Access
Journal Browser
Go
American Journal of Software Engineering. 2017, 5(1), 20-26
DOI: 10.12691/ajse-5-1-3
Open AccessArticle

A Distributed Multi-facet Search Engine of Microblogs Based on SolrCloud

Lan Huang1, and Juan Zhou1

1College of Computer Science, Yangtze University, Jingzhou, Hubei, China

Pub. Date: September 06, 2017

Cite this paper:
Lan Huang and Juan Zhou. A Distributed Multi-facet Search Engine of Microblogs Based on SolrCloud. American Journal of Software Engineering. 2017; 5(1):20-26. doi: 10.12691/ajse-5-1-3

Abstract

Microblog services, such as Twitter and Weibo in China, has become a new yet powerful information dissemination channel. More than 500 million tweets are sent every day. The extraordinary large number of messages brings new challenges to conventional search paradigms: a message might be relevant to the query in many aspects, for example the content, time and location of a message. Furthermore, there might be a large number of such relevant messages. In order to address these challenges, we designed a multi-facet distributed microblog search system using off-the-shelf open source frameworks including SolrCloud, Hadoop and Zookeeper. The system was then populated with real world messages collected from the most popular microblog website in China: Sina Weibo. We compared the performances of the standalone and the distributed version of the system. Empirical experimental results showed both effectiveness and efficiency of the proposed system in retrieving large scale microblog messages.

Keywords:
solr SolrCloud multi-facet retrieval information retrieval microblog

Creative CommonsThis work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Figures

Figure of 8

References:

[1]  Twitter Usage Statistics. [Online]. Available: http://www.internetlivestats.com/twitter-statistics/#trend. [Accessed Aug. 20, 2017].
 
[2]  2016 Sina Weibo User Statistics Report. [Online]. Available: http://data.weibo.com/report/reportDetail?id=346. [Accessed Aug. 20, 2017].
 
[3]  Qu, Y., Huang, C., Zhang, P., Zhang, J.. Microblogging after a Major Disaster in China: A Case Study of the 2010 Yushu Earthquake. In: ACM 2011 Conference on Computer Supported Cooperative Work, 2011, ACM, pp. 25-34.
 
[4]  Sui, Y., Yang, X.. The Potential Marketing Power of Microblog. In: 2nd International Conference on Communication Systems, Networks and Applications (ICCSNA), 2010, pp. 164-167.
 
[5]  O’Connor, B., Balasubramanyan, R., Routledge, B. R., Smith, N. A.. From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series. In: 4th International AAAI Conference on Weblogs and Social Media, 2010, ACL, pp. 122-129.
 
[6]  Qazvinian, V., Rosengren, E., Radev, D. R., Mei, Q.. Rumor Has It: Identifying Misinformation in Microblogs. In: 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2011, pp. 1589-1599.
 
[7]  Chen, C., Wu, D., Hou, C., Yuan, X.. Exploiting Social Media for Stock Market Prediction with Factorization Machine. In: 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014, IEEE, pp. 142-149.
 
[8]  Damak, F., Pinel-Sauvagnat, K., Cabanac, G., Boughanem, M.. Effectiveness of State-of-the-art Features for Microblog Search. In: 28th Annual ACM Symposium on Applied Computing, 2013, ACM, pp. 914-919.
 
[9]  Teevan, J., Ramage, D., Morris, M. R.. #TwitterSearch: A Comparison of Microblog Search and Web Search. In: 4th ACM International Conference on Web Search and Data Mining, 2011, ACM, pp. 35-44.
 
[10]  Duan, Y., Jiang, L., Qin, T., Zhou, M., Shum, H. Y.. An Empirical Study on Learning to Rank of Tweets. In: 23rd International Conference on Computational Linguistics, AAAI, 2010, pp. 295-303.
 
[11]  Massoudi, K., Tsagkias, M., de Rijke, M., Weerkamp, W.. Incorporating Query Expansion and Quality Indicators in Searching Microblog Posts. In: 33rd European Conference on Advances in Information Retrieval, 2011, Springer-Verlag, pp. 362-367.
 
[12]  O’Connor, B., Krieger, M., Ahn, D.. TweetMotif: Exploratory Search and Topic Summarization for Twitter. In: 4th International AAAI Conference on Weblogs and Social Media, 2010, AAAI, pp. 384-385.
 
[13]  Itokawa, S., Shiramatsu, S., Ozono, T., Shintani, T.. Estimating Feature Terms for Supporting Exploratory Browsing of Twitter Timelines. In: IIAI International Conference on Advanced Applied Informatics, 2013, IEEE, pp. 62-67.
 
[14]  Zilincik, M., Navrat, P., Koskova, G.. Exploratory Search on Twitter Utilizing User Feedback and Multi-Perspective Microblog Analysis. PLoS ONE 8(11): e78857.
 
[15]  Grainger, T., Potter, T.. Solr in Action. Manning Publications, New York, 2014.
 
[16]  Apache Solr. [Online]. Available: https://lucene.apache.org/solr/. [Accessed Aug. 20, 2017].
 
[17]  SolrCloud. [Online]. Available: https://lucene.apache.org/solr/guide/6_6/solrcloud.html. [Accessed Aug. 20, 2017].
 
[18]  Goeschl, S.. Solr: An Open Source Enterprise Search. [Online]. Available: http://people.apache.org/~sgoeschl/presentations/solr/index.html. [Accessed Aug. 20, 2017].
 
[19]  Weibo API. [Online]. Available: http://open.weibo.com/wiki. [Accessed Aug. 20, 2017].
 
[20]  Kibriya, A. M., Frank, E., Pfahringer, B., Holmes, G.. Multinomial Naive Bayes for Text Categorization Revisited. In: 17th Australian Joint Conference on Artificial Intelligence, 2004, Springer, pp. 488-499.
 
[21]  IK-analyzer. [Online]. Available: https://code.google.com/archive/p/ik-analyzer/downloads. [Accessed Aug. 20, 2017].