American Journal of Systems and Software

ISSN (Print): 2372-708X

ISSN (Online): 2372-7071

Website: http://www.sciepub.com/journal/AJSS

Current Issue» Volume 3, Number 2 (2015)

Article

World towards Advance Web Mining: A Review

1M.Tech-Computer Science and Engineering, Lakshmi Narain College of Technology-Indore (RGPV, Bhopal), MP, India


American Journal of Systems and Software. 2015, 3(2), 44-61
DOI: 10.12691/ajss-3-2-3
Copyright © 2015 Science and Education Publishing

Cite this paper:
Shyam Nandan Kumar. World towards Advance Web Mining: A Review. American Journal of Systems and Software. 2015; 3(2):44-61. doi: 10.12691/ajss-3-2-3.

Correspondence to: Shyam  Nandan Kumar, M.Tech-Computer Science and Engineering, Lakshmi Narain College of Technology-Indore (RGPV, Bhopal), MP, India. Email: shyamnandan.mec@gmail.com

Abstract

With the advent of the World Wide Web and the emergence of e-commerce applications and social networks, organizations across the Web generate a large amount of data day-by-day. The abundant unstructured or semi-structured information on the Web leads a great challenge for both the users, who are seeking for effectively valuable information and for the business people, who needs to provide personalized service to the individual consumers, buried in the billions of web pages. To overcome these problems, data mining techniques must be applied on the Web. In this article, an attempt has been made to review the various web mining techniques to discover fruitful patterns from the Web, in detail. New concepts are also included in broad-sense for Optimal Web Mining. This paper also discusses the state of the art and survey on Web Mining that is used in knowledge discovery over the Web.

Keywords

References

[[[[[[[[[[[[[[[[[[[[[[[[[[[[
[1]  S. Chakrabarti, “Data mining for hypertext: A tutorial Survey,” ACM, SIGKDD, Explorations, 1(2), 1-11, 2000.
 
[2]  Galitsky B, Dobrocsi G, de la Rosa JL, Kuznetsov SO. “Using generalization of syntactic parse trees for taxonomy capture on the web”, ICCS. 2011; 8323.
 
[3]  Sankar K. Pal, Varun Talwar, and Pabitra Mitra, “Web Mining in Soft Computing Framework: Relevance, State of the Art and Future Directions”, IEEE transactions on neural network, Vol. 13, No. 5, September 2002, pp. 1163-1177.
 
[4]  Oren Etzioni, “The world wide Web: Quagmire or gold mine”, Communications of the ACM, 39(11):65-68, 1996.
 
[5]  Shyam Nandan Kumar, “Cryptography during Data Sharing and Accessing Over Cloud.” International Transaction of Electrical and Computer Engineers System, vol. 3, no. 1 (2015): 12-18.
 
Show More References
6]  Shyam Nandan Kumar and Shyam Sunder Kumar, “Advancement of Human Resource Management with Cloud Computing, International Journal of Research in Engineering Technology and Management, Paper Id: IJRETM-2014-SP-048, Issue: Special, June-2014, pp. 1-6.
 
7]  Shyam Nandan Kumar, “Advanced Mechanism to Handle Big Data of HD Video File for Mobile Devices,” International Journal of Research in Engineering Technology and Management, Paper Id: IJRETM-2014-02-06-006, Vol: 02, Issue: 06, Nov-2014, pp. 1-7.
 
8]  Berendt, B.spiliopoulou M., “Analysing navigation behaviour in web sites integrating multiple information system”, VLDB Journal, Special issue on databases and the web 9, I(2000),56-75.
 
9]  M. Spiliopoulou, “Data Mining for Web. In Principals of data mining and knowledge discovery, Second European Symposium, PKDD-1999, pp. 588-589.
 
10]  Fan, W., Wallace, L., Rich, S. and Zhang, Z., “Tapping into the Power of Text Mining”, Communications of the ACM – Privacy and Security in highly dynamic systems. Vol. 49, Issue-9, 2005.
 
11]  Gupta, V. and Lehal, G. S., “A Survey of Text Mining Techniques and Applications”, Journal of Emerging Technologies in Web Intelligence. Vol. 1. pp 60-76, 2009.
 
12]  Web crawler”, http://en.wikipedia.org/wiki/Web_crawler.
 
13]  Wrapper (data mining)”, http://en.wikipedia.org/wiki/Wrapper_(data_mining)
 
14]  Nicholas Kushmerick, Daniel S. Weld, Robert Doorenbos, “Wrapper Induction for Information Extraction”, Proceedings of the International Joint Conference on Artificial Intelligence, 1997.
 
15]  Papakonstantinou, Y. and Garcia-Molina, H. and Widom, J. (1995). “Object exchange across heterogeneous information sources”. Proceedings of the Eleventh International Conference on Data Engineering: 251-260.
 
16]  “Web scraping”, http://en.wikipedia.org/wiki/Web_scraping.
 
17]  “Document Object Model (DOM)”, http://en.wikipedia.org/wiki/Document_Object_Model.
 
18]  Shapira D., Avidan S., Hel-Or Y., “Multiple Histogram Matching”, 20th IEEE International Conference on Image Processing (ICIP), 2013, pp. 2269-2273.
 
19]  V. Sugumaran and J. A. Gulla, “Applied Semantic Web Technologies,” Taylor & Francis Group, Boca Raton, 2012.
 
20]  K. K. Breitman, M. A. Casanova, and W.Truszkowski, “Semantic Web: Concepts, Technology and Applications”, Springer, 2007.
 
21]  A. Jain, I. Khan and B. Verma, “Secure and Intelligent Decision Making in Semantic Web Mining,” Interna-tional Journal of Computer Applications, Vol. 15, No. 7, 2011, pp. 14-18.
 
22]  D. Jeon and W. Kim, “Development of Semantic Deci- sion Tree,” Proceedings of the 3rd International Confer- ence on Data Mining and Intelligent Information Tech- nology Applications, Macau, 24-26 October 2011, pp. 28-34.
 
23]  H. Hassanzadeh and M. R. Keyvanpour, “A machine Learning Based Analytical Framework for Semantic Annotation Requirements”, International Journal of Web and Semantic Technology (IJWeST), vol. 2, no. 2, pp. 27-38, 2011.
 
24]  S. Brin, and L. Page, “The Anatomy of a Large Scale Hypertextual Web Search Engine”, Computer Network and ISDN Systems, Vol. 30, Issue 1-7, pp. 107-117, 1998.
 
25]  Wenpu Xing and Ali Ghorbani, “Weighted PageRank Algorithm”, Proceedings of the Second Annual Conference on Communication Networks and Services Research (CNSR ’04), IEEE, 2004.
 
26]  J. Kleinberg, “Authoritative Sources in a Hyper-Linked Environment”, Journal of the ACM 46(5), pp. 604-632, 1999.
 
27]  Taher H. Haveliwala, “Topic-Sensitive PageRank”, Eleventh International World Wide Web Conference (Honolulu, Hawaii), USA, May-2002, ACM 1-58113-449-5/02/0005.
 
28]  The Open Directory Project: Web directory for over 2.5 million URLs”, http://www.dmoz.org/.
 
29]  J. Hosseinkhani, M. Koochakzaei, S. Keikhaee and Y. Amin, “Detecting Suspicion Information on the Web Using Crime Data Mining Techniques”, International Journal of Advanced Computer Science and Information Technology (IJACSIT) Vol. 3, No. 1, 2014, Page: 32-41.
 
30]  Shyam Nandan Kumar, “Review on Network Security and Cryptography.” International Transaction of Electrical and Computer Engineers System, vol. 3, no. 1 (2015): 1-11.
 
31]  Shyam Nandan Kumar, “Technique for Security of Multimedia using Neural Network,” Paper id-IJRETM-2014-02-05-020, IJRETM, Vol: 02, Issue: 05, pp.1-7. Sep-2014.
 
32]  “W3C- World Wide Web Consortium (W3C)”, http://www.w3.org/.
 
33]  “SALSA Algorithm”, http://en.wikipedia.org/wiki/SALSA_algorithm.
 
Show Less References

Article

Theory of Systems, Systems Metaphysics and Neoplatonism

1Department of Applied Mathematics. University of Alicante. Alicante. Spain.

2Department of Philosophy. University Rey Juan Carlos. Madrid. Spain


American Journal of Systems and Software. 2015, 3(2), 36-43
DOI: 10.12691/ajss-3-2-2
Copyright © 2015 Science and Education Publishing

Cite this paper:
J.L. Usó-Doménech, J.A. Nescolarde-Selva, M.J. Sabán. Theory of Systems, Systems Metaphysics and Neoplatonism. American Journal of Systems and Software. 2015; 3(2):36-43. doi: 10.12691/ajss-3-2-2.

Correspondence to: J.L.  Usó-Doménech, Department of Applied Mathematics. University of Alicante. Alicante. Spain.. Email: josue.selva@ua.es

Abstract

Science has been developed from the rational-empirical methods, having as a consequence, the representation of existing phenomena without understanding the root causes. The question which currently has is the sense of the being, and in a simplified way, one can say that the dogmatic religion lead to misinterpretations, the empirical sciences contain the exact rational representations of phenomena. Thus, Science has been able to get rid of the dogmatic religion. The project for the sciences of being looks to return to reality its essential foundations; under the plan of theory of systems necessarily involves a search for the meaning of Reality.

Keywords

References

[[[[[[[[[[[[[[[[[[[[[[[[
[1]  Ackrill, J.L. 1988. A New Aristotle Reader. Princeton University Press.
 
[2]  Aristotle. 1999. Metaphysics, Joe Sachs (trans.), Green Lion Press.
 
[3]  Axelrod, R. 1984. The Evolution of Cooperation. New York: Basic Books.
 
[4]  Boulding, K. 1956. General Systems Theory – the Skeleton of Science. Management Science 2, pp. 197-208. Reprinted in Buckley, ed. (1968), Modern Systems Research for the Behavioral Scientist. Chicago: Aldine.
 
[5]  Boulding, K. 1962. Conflict and Defense. New York: Harper and Row.
 
Show More References
6]  Deutsch, K. 1966. The Nerves of Government. New York: Free Press.
 
7]  Dodds, E.R.. 1933, [1963]. The Elements of Theology, Oxford: Clarendon.
 
8]  Goodman, L.E. (Ed). 1992. Neoplatonism and Jewish Thought. State University of New York. Albany.
 
9]  Hegel, G.W.F. 1969. Hegel's Science of Logic. Allen & Unwin,. Retrieved 2 January 2012.
 
10]  Heidegger, M. 1972. On Time and Being. Translated by Joan Stambaugh. Harper & Row. New York.
 
11]  Heschel, A. J. 1996. Moral Grandeur and Spiritual Audacity, Susannah Heschel, ed., New York: Farrar, Straus & Ciroux.
 
12]  Jonas, H. 1965. Spinoza and the Theory of the Organism. Journal of the History of Philosophy, 3, pp. 43-58. Reprinted in Grene, Marjorie, ed. (1979). Spinoza. A Collection of Critical Essays. Notre Dame: University of Notre Dame Press, pp. 259-278.
 
13]  Koestler, A. 1959. The Sleepwalkers: A History of Man’s Changing Vision of the Universe. London: Penguin.
 
14]  LeShan, L. and Margenau, H. 1982. Einstein’s Space and Van Gogh’s Sky. MacMillan Publ. Co. Inc. New York.
 
15]  Lovejoy, A. O. 1936. The Great Chain of Being: A Study of the History of an Idea. Cambridge: Harvard University Press.
 
16]  Neiman, S. 2002. Evil in Modern Thought: An Alternative History of Philosophy. Princeton: Princeton University Press.
 
17]  Nescolarde-Selva, J. and Usó-Doménech, J. 2014 Reality, System and Impure Systems. Foundations of Science. Vol 19, pp 289-396.
 
18]  Nescolarde-Selva, J. A., Usó-Doménech, J.L. and Gash, H. 2014. A theorical point of view of reality, perception, and language. Complexity. Vol 20 (1), pp 27-37.
 
19]  Nescolarde-Selva, J., Usó-Doménech, J.L. and Sabán, M.J. 2015. Linguistic knowledge of Reality: a metaphysical impossibility?. Foundations of Science. 20 (1). 27-58.
 
20]  Pessin, S. 2013. Ibn Gabirol's Theology of Desire: Matter and Method in Jewish Medieval Neoplatonism. Cambridge University Press.
 
21]  Rahman, F. 1975. The Philosophy of Mulla Sadr (Sadr al-Din al-Shirazi), Albany, NY: State University of New York Press.
 
22]  Schelling, F.W.J. 2000. The Ages of the World. Translated with and introduction, by Jason M. Wirth. State University of New York Press. New York.
 
23]  Scholem, G. 1991. On the Mystical Shape of the Godhead. New York: Schocken.
 
24]  Teilhard De Chardin, P. 1959. The Phenomenon of Man. New York: Harper & Row.
 
25]  Usó-Doménech, J.L. and Nescolarde-Selva, J. 2012. Mathematic and semiotic theory of ideological systems. Editorial LAP. Saarbrücken. Germany.
 
26]  Usó-Doménech, J.L., Nescolarde-Selva, J., Pérez-Gonzaga, S and Sabán, M. 2015. Paraconsistent Multivalued Logic and Coincidentia Oppositorum: Evaluation with Complex Numbers. American Journal of Systems and Software. Vol 3 (1), pp 1-12.
 
27]  Wheeler, J. Ar. 1994. It from Bit. In At Home in the Universe, by J. A. Wheeler. Woodbury, NY: American Institute of Physics Press: 295–312
 
28]  Whitehead, A, N. and Russell, B. 1910-1913. Principia Mathematica. Cambridge: Cambridge University Press.
 
29]  Wolfson, E. 1998. Perspectives on Jewish Thought and Mysticism, edited together with Alfred Ivry and Alan Arkush. Harwood Academic Publishers.
 
Show Less References

Article

A Comparative Analysis of Browsing Behavior of Human Visitors and Automatic Software Agents

1Department of CS & E, NIT Raipur, Raipur, India

2Department of E & TC, NIT Raipur, Raipur, India

3Departments of IT, IIIT Allahabad, Allahabad, India


American Journal of Systems and Software. 2015, 3(2), 31-35
DOI: 10.12691/ajss-3-2-1
Copyright © 2015 Science and Education Publishing

Cite this paper:
Dilip Singh Sisodia, Shrish Verma, Om Prakash Vyas. A Comparative Analysis of Browsing Behavior of Human Visitors and Automatic Software Agents. American Journal of Systems and Software. 2015; 3(2):31-35. doi: 10.12691/ajss-3-2-1.

Correspondence to: Dilip  Singh Sisodia, Department of CS & E, NIT Raipur, Raipur, India. Email: dssisodia.cs@nitrr.ac.in

Abstract

In this paper, we investigate the comparative access behavior of human visitors and automatic software agents i.e. web robots through access logs of a web portal. We perform an exhaustive investigation on the various resources acquisition trends, hourly activities, entry and exit patterns, geographic analysis of their origin, user agents and the distribution of response sizes and response codes by human visitors and web robots. Gradually web robots are continuing to proliferate and grow in sophistication for non-malicious and malicious reasons. An important share of web traffic is credited to robots and this fraction is likely to cultivate over time. Presence of web robots access traffic entries in web server log repositories imposes a great challenge to extract meaningful knowledge about browsing behavior of actual visitors. This knowledge is useful for enhancement of services for more satisfaction of genuine visitors or optimization of server resources.

Keywords

References

[[[[[[[[[[[[[[[[[
[1]  http://www.incapsula.com/blog/what-google-doesnt-show-you-31-of-website-traffic-can-harm-your-business.html.
 
[2]  P. N. Tan and V. Kumar, “Discovery of web robot sessions based on their navigational patterns,” Data Mining and Knowledge Discovery, vol. 6, pp. 9-35, 2002.
 
[3]  D. Doran and S. S. Gokhale. Web Robot Detection Techniques: Overview and Limitations. Data Mining and Knowledge Discovery, 22(1-2):183-210, 2011.
 
[4]  M. F. Arlitt and C. L. Williamson, “Web server workload characterization: The search for invariants,” ACM SIGMETRICS Performance Evaluation Review, pp. 126-137, 1996.
 
[5]  Mark E. Crovella and Azer Bestavros. Self-similarity in World Wide Web traffic: Evidence and possible causes. Transactions on Networking, 5(6):835-846, December 1997.
 
Show More References
6]  J. X. Yu, Y. Ou, C. Zhang, and S. Zhang, “Identifying interesting customers through web log classification,” IEEE Intelligent Systems, vol. 20, no. 3, pp. 55-59, 2005.
 
7]  F. Li, K. Goseva-Popstojanova, and A. Ross, “Discovering web workload characteristics through cluster analysis,” in Proc. IEEE International Symposium on Network Computing and Applications, 2007, pp. 61-68.
 
8]  M. Spiliopoulou, “Web usage mining for web site evaluation,” Communications of the ACM, vol. 43, no. 8, 2000.
 
9]  M.-L. Shyu, C. Haruechaiyasak, and S.-C. Chen, “Mining user access patterns with traversal constraint for predicting web page requests,” Knowl. Inf. Syst., vol. 10, no. 4, pp. 515-528, 2006.
 
10]  Almeida, V., Menascé, D., Riedi, R., Peligrinelli, F., Fonseca, R., & Meira Jr, W. (2001, June). Analyzing Web robots and their impact on caching. In Proc. Sixth Workshop on Web Caching and Content Distribution (pp. 20-22).
 
11]  R. White and S. Drucker, “Investigating behavioral variability in web search,” in Proc. of the 16th Intl. conference on World Wide Web. ACM, 2007, pp. 21-30.
 
12]  X. Lin, L. Quan, and H. Wu, “An automatic scheme to categorize user sessions in modern http traffic,” in Proc. Of IEEE Global Telecommunications Conference (GLOBECOM 08), New Orleans, LO, November 2008, pp. 1-6.
 
13]  M. D. Dikaiakosa, A. Stassopouloub, and L. Papageorgioua. An Investigation of Web Crawler Behavior: Characterization and Metrics. Computer Networks, 28:880-897, 2005.
 
14]  Lee, Junsup, Sungdeok Cha, Dongkun Lee, and Hyungkyu Lee. “Classification of web robots: An empirical study based on over one billion requests.” computers & security 28, no. 8 (2009): 795-802.
 
15]  P. Huntington, D. Nicholas, and H. R. Jamali, “Web robot detection in the scholarly information environment,” Journal of Information Science, vol. 34, no. 5, pp. 726-741, 2008.
 
16]  G. Jacob, E. Kirda, C. Kruegel, and G. Vigna, “PUBCRAWL: protecting users and businesses from crawlers,” in Proceedings of the 21st USENIX conference on Security symposium. USENIX Association, 2012.
 
17]  Doran, Derek, Kevin Morillo, and Swapna S. Gokhale. “A comparison of web robot and human requests.” In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1374-1380. ACM, 2013.
 
18]  Sisodia, Dilip Singh, and Shrish Verma. “Web usage pattern analysis through web logs: A review.” In Computer Science and Software Engineering (JCSSE), 2012 International Joint Conference on, pp. 49-53. IEEE, 2012.
 
19]  “AWStats - free log file analyzer for advanced statistics (GNU GPL), http://awstats.sourceforge.net/.(accesed in February 2014)
 
20]  User agents database http://www.user-agents.org/index.shtml (accessed in February 2014).
 
21]  Well known robots database http://www.robotstxt.org/db.html(accesed in February 2014).
 
22]  Berendt, B., Mobasher, B., Spiliopoulou, M., and Nakagawa, M. “A Framework for the Evaluation of Session Reconstruction Heuristics in Web Usage Analysis,” INFORMS Journal of Computing, Special Issue on Mining Web-Based Data for E-Business Applications Vol. 15, No. 2, 2003.
 
Show Less References