American Journal of Systems and Software

ISSN (Print): 2372-708X

ISSN (Online): 2372-7071

Website: http://www.sciepub.com/journal/AJSS

Current Issue» Volume 3, Number 2 (2015)

Article

A Comparative Analysis of Browsing Behavior of Human Visitors and Automatic Software Agents

1Department of CS & E, NIT Raipur, Raipur, India

2Department of E & TC, NIT Raipur, Raipur, India

3Departments of IT, IIIT Allahabad, Allahabad, India


American Journal of Systems and Software. 2015, 3(2), 31-35
DOI: 10.12691/ajss-3-2-1
Copyright © 2015 Science and Education Publishing

Cite this paper:
Dilip Singh Sisodia, Shrish Verma, Om Prakash Vyas. A Comparative Analysis of Browsing Behavior of Human Visitors and Automatic Software Agents. American Journal of Systems and Software. 2015; 3(2):31-35. doi: 10.12691/ajss-3-2-1.

Correspondence to: Dilip  Singh Sisodia, Department of CS & E, NIT Raipur, Raipur, India. Email: dssisodia.cs@nitrr.ac.in

Abstract

In this paper, we investigate the comparative access behavior of human visitors and automatic software agents i.e. web robots through access logs of a web portal. We perform an exhaustive investigation on the various resources acquisition trends, hourly activities, entry and exit patterns, geographic analysis of their origin, user agents and the distribution of response sizes and response codes by human visitors and web robots. Gradually web robots are continuing to proliferate and grow in sophistication for non-malicious and malicious reasons. An important share of web traffic is credited to robots and this fraction is likely to cultivate over time. Presence of web robots access traffic entries in web server log repositories imposes a great challenge to extract meaningful knowledge about browsing behavior of actual visitors. This knowledge is useful for enhancement of services for more satisfaction of genuine visitors or optimization of server resources.

Keywords

References

[[[[[[[[[[[[[[[[[
[1]  http://www.incapsula.com/blog/what-google-doesnt-show-you-31-of-website-traffic-can-harm-your-business.html.
 
[2]  P. N. Tan and V. Kumar, “Discovery of web robot sessions based on their navigational patterns,” Data Mining and Knowledge Discovery, vol. 6, pp. 9-35, 2002.
 
[3]  D. Doran and S. S. Gokhale. Web Robot Detection Techniques: Overview and Limitations. Data Mining and Knowledge Discovery, 22(1-2):183-210, 2011.
 
[4]  M. F. Arlitt and C. L. Williamson, “Web server workload characterization: The search for invariants,” ACM SIGMETRICS Performance Evaluation Review, pp. 126-137, 1996.
 
[5]  Mark E. Crovella and Azer Bestavros. Self-similarity in World Wide Web traffic: Evidence and possible causes. Transactions on Networking, 5(6):835-846, December 1997.
 
Show More References
6]  J. X. Yu, Y. Ou, C. Zhang, and S. Zhang, “Identifying interesting customers through web log classification,” IEEE Intelligent Systems, vol. 20, no. 3, pp. 55-59, 2005.
 
7]  F. Li, K. Goseva-Popstojanova, and A. Ross, “Discovering web workload characteristics through cluster analysis,” in Proc. IEEE International Symposium on Network Computing and Applications, 2007, pp. 61-68.
 
8]  M. Spiliopoulou, “Web usage mining for web site evaluation,” Communications of the ACM, vol. 43, no. 8, 2000.
 
9]  M.-L. Shyu, C. Haruechaiyasak, and S.-C. Chen, “Mining user access patterns with traversal constraint for predicting web page requests,” Knowl. Inf. Syst., vol. 10, no. 4, pp. 515-528, 2006.
 
10]  Almeida, V., Menascé, D., Riedi, R., Peligrinelli, F., Fonseca, R., & Meira Jr, W. (2001, June). Analyzing Web robots and their impact on caching. In Proc. Sixth Workshop on Web Caching and Content Distribution (pp. 20-22).
 
11]  R. White and S. Drucker, “Investigating behavioral variability in web search,” in Proc. of the 16th Intl. conference on World Wide Web. ACM, 2007, pp. 21-30.
 
12]  X. Lin, L. Quan, and H. Wu, “An automatic scheme to categorize user sessions in modern http traffic,” in Proc. Of IEEE Global Telecommunications Conference (GLOBECOM 08), New Orleans, LO, November 2008, pp. 1-6.
 
13]  M. D. Dikaiakosa, A. Stassopouloub, and L. Papageorgioua. An Investigation of Web Crawler Behavior: Characterization and Metrics. Computer Networks, 28:880-897, 2005.
 
14]  Lee, Junsup, Sungdeok Cha, Dongkun Lee, and Hyungkyu Lee. “Classification of web robots: An empirical study based on over one billion requests.” computers & security 28, no. 8 (2009): 795-802.
 
15]  P. Huntington, D. Nicholas, and H. R. Jamali, “Web robot detection in the scholarly information environment,” Journal of Information Science, vol. 34, no. 5, pp. 726-741, 2008.
 
16]  G. Jacob, E. Kirda, C. Kruegel, and G. Vigna, “PUBCRAWL: protecting users and businesses from crawlers,” in Proceedings of the 21st USENIX conference on Security symposium. USENIX Association, 2012.
 
17]  Doran, Derek, Kevin Morillo, and Swapna S. Gokhale. “A comparison of web robot and human requests.” In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1374-1380. ACM, 2013.
 
18]  Sisodia, Dilip Singh, and Shrish Verma. “Web usage pattern analysis through web logs: A review.” In Computer Science and Software Engineering (JCSSE), 2012 International Joint Conference on, pp. 49-53. IEEE, 2012.
 
19]  “AWStats - free log file analyzer for advanced statistics (GNU GPL), http://awstats.sourceforge.net/.(accesed in February 2014)
 
20]  User agents database http://www.user-agents.org/index.shtml (accessed in February 2014).
 
21]  Well known robots database http://www.robotstxt.org/db.html(accesed in February 2014).
 
22]  Berendt, B., Mobasher, B., Spiliopoulou, M., and Nakagawa, M. “A Framework for the Evaluation of Session Reconstruction Heuristics in Web Usage Analysis,” INFORMS Journal of Computing, Special Issue on Mining Web-Based Data for E-Business Applications Vol. 15, No. 2, 2003.
 
Show Less References