American Journal of Systems and Software
ISSN (Print): 2372-708X ISSN (Online): 2372-7071 Website: http://www.sciepub.com/journal/ajss Editor-in-chief: Josué-Antonio Nescolarde-Selva
Open Access
Journal Browser
Go
American Journal of Systems and Software. 2015, 3(2), 44-61
DOI: 10.12691/ajss-3-2-3
Open AccessArticle

World towards Advance Web Mining: A Review

Shyam Nandan Kumar1,

1M.Tech-Computer Science and Engineering, Lakshmi Narain College of Technology-Indore (RGPV, Bhopal), MP, India

Pub. Date: April 16, 2015

Cite this paper:
Shyam Nandan Kumar. World towards Advance Web Mining: A Review. American Journal of Systems and Software. 2015; 3(2):44-61. doi: 10.12691/ajss-3-2-3

Abstract

With the advent of the World Wide Web and the emergence of e-commerce applications and social networks, organizations across the Web generate a large amount of data day-by-day. The abundant unstructured or semi-structured information on the Web leads a great challenge for both the users, who are seeking for effectively valuable information and for the business people, who needs to provide personalized service to the individual consumers, buried in the billions of web pages. To overcome these problems, data mining techniques must be applied on the Web. In this article, an attempt has been made to review the various web mining techniques to discover fruitful patterns from the Web, in detail. New concepts are also included in broad-sense for Optimal Web Mining. This paper also discusses the state of the art and survey on Web Mining that is used in knowledge discovery over the Web.

Keywords:
data mining www web mining cloud mining web usage mining web content mining web structure mining semantic web mining web mining algorithm knowledge discovery information retrieval

Creative CommonsThis work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

References:

[1]  S. Chakrabarti, “Data mining for hypertext: A tutorial Survey,” ACM, SIGKDD, Explorations, 1(2), 1-11, 2000.
 
[2]  Galitsky B, Dobrocsi G, de la Rosa JL, Kuznetsov SO. “Using generalization of syntactic parse trees for taxonomy capture on the web”, ICCS. 2011; 8323.
 
[3]  Sankar K. Pal, Varun Talwar, and Pabitra Mitra, “Web Mining in Soft Computing Framework: Relevance, State of the Art and Future Directions”, IEEE transactions on neural network, Vol. 13, No. 5, September 2002, pp. 1163-1177.
 
[4]  Oren Etzioni, “The world wide Web: Quagmire or gold mine”, Communications of the ACM, 39(11):65-68, 1996.
 
[5]  Shyam Nandan Kumar, “Cryptography during Data Sharing and Accessing Over Cloud.” International Transaction of Electrical and Computer Engineers System, vol. 3, no. 1 (2015): 12-18.
 
[6]  Shyam Nandan Kumar and Shyam Sunder Kumar, “Advancement of Human Resource Management with Cloud Computing, International Journal of Research in Engineering Technology and Management, Paper Id: IJRETM-2014-SP-048, Issue: Special, June-2014, pp. 1-6.
 
[7]  Shyam Nandan Kumar, “Advanced Mechanism to Handle Big Data of HD Video File for Mobile Devices,” International Journal of Research in Engineering Technology and Management, Paper Id: IJRETM-2014-02-06-006, Vol: 02, Issue: 06, Nov-2014, pp. 1-7.
 
[8]  Berendt, B.spiliopoulou M., “Analysing navigation behaviour in web sites integrating multiple information system”, VLDB Journal, Special issue on databases and the web 9, I(2000),56-75.
 
[9]  M. Spiliopoulou, “Data Mining for Web. In Principals of data mining and knowledge discovery, Second European Symposium, PKDD-1999, pp. 588-589.
 
[10]  Fan, W., Wallace, L., Rich, S. and Zhang, Z., “Tapping into the Power of Text Mining”, Communications of the ACM – Privacy and Security in highly dynamic systems. Vol. 49, Issue-9, 2005.
 
[11]  Gupta, V. and Lehal, G. S., “A Survey of Text Mining Techniques and Applications”, Journal of Emerging Technologies in Web Intelligence. Vol. 1. pp 60-76, 2009.
 
[12]  Web crawler”, http://en.wikipedia.org/wiki/Web_crawler.
 
[13]  Wrapper (data mining)”, http://en.wikipedia.org/wiki/Wrapper_(data_mining)
 
[14]  Nicholas Kushmerick, Daniel S. Weld, Robert Doorenbos, “Wrapper Induction for Information Extraction”, Proceedings of the International Joint Conference on Artificial Intelligence, 1997.
 
[15]  Papakonstantinou, Y. and Garcia-Molina, H. and Widom, J. (1995). “Object exchange across heterogeneous information sources”. Proceedings of the Eleventh International Conference on Data Engineering: 251-260.
 
[16]  “Web scraping”, http://en.wikipedia.org/wiki/Web_scraping.
 
[17]  “Document Object Model (DOM)”, http://en.wikipedia.org/wiki/Document_Object_Model.
 
[18]  Shapira D., Avidan S., Hel-Or Y., “Multiple Histogram Matching”, 20th IEEE International Conference on Image Processing (ICIP), 2013, pp. 2269-2273.
 
[19]  V. Sugumaran and J. A. Gulla, “Applied Semantic Web Technologies,” Taylor & Francis Group, Boca Raton, 2012.
 
[20]  K. K. Breitman, M. A. Casanova, and W.Truszkowski, “Semantic Web: Concepts, Technology and Applications”, Springer, 2007.
 
[21]  A. Jain, I. Khan and B. Verma, “Secure and Intelligent Decision Making in Semantic Web Mining,” Interna-tional Journal of Computer Applications, Vol. 15, No. 7, 2011, pp. 14-18.
 
[22]  D. Jeon and W. Kim, “Development of Semantic Deci- sion Tree,” Proceedings of the 3rd International Confer- ence on Data Mining and Intelligent Information Tech- nology Applications, Macau, 24-26 October 2011, pp. 28-34.
 
[23]  H. Hassanzadeh and M. R. Keyvanpour, “A machine Learning Based Analytical Framework for Semantic Annotation Requirements”, International Journal of Web and Semantic Technology (IJWeST), vol. 2, no. 2, pp. 27-38, 2011.
 
[24]  S. Brin, and L. Page, “The Anatomy of a Large Scale Hypertextual Web Search Engine”, Computer Network and ISDN Systems, Vol. 30, Issue 1-7, pp. 107-117, 1998.
 
[25]  Wenpu Xing and Ali Ghorbani, “Weighted PageRank Algorithm”, Proceedings of the Second Annual Conference on Communication Networks and Services Research (CNSR ’04), IEEE, 2004.
 
[26]  J. Kleinberg, “Authoritative Sources in a Hyper-Linked Environment”, Journal of the ACM 46(5), pp. 604-632, 1999.
 
[27]  Taher H. Haveliwala, “Topic-Sensitive PageRank”, Eleventh International World Wide Web Conference (Honolulu, Hawaii), USA, May-2002, ACM 1-58113-449-5/02/0005.
 
[28]  The Open Directory Project: Web directory for over 2.5 million URLs”, http://www.dmoz.org/.
 
[29]  J. Hosseinkhani, M. Koochakzaei, S. Keikhaee and Y. Amin, “Detecting Suspicion Information on the Web Using Crime Data Mining Techniques”, International Journal of Advanced Computer Science and Information Technology (IJACSIT) Vol. 3, No. 1, 2014, Page: 32-41.
 
[30]  Shyam Nandan Kumar, “Review on Network Security and Cryptography.” International Transaction of Electrical and Computer Engineers System, vol. 3, no. 1 (2015): 1-11.
 
[31]  Shyam Nandan Kumar, “Technique for Security of Multimedia using Neural Network,” Paper id-IJRETM-2014-02-05-020, IJRETM, Vol: 02, Issue: 05, pp.1-7. Sep-2014.
 
[32]  “W3C- World Wide Web Consortium (W3C)”, http://www.w3.org/.
 
[33]  “SALSA Algorithm”, http://en.wikipedia.org/wiki/SALSA_algorithm.