Journal of City and Development
ISSN (Print): ISSN Pending ISSN (Online): ISSN Pending Website: https://www.sciepub.com/journal/jcd Editor-in-chief: Guangming Yu
Open Access
Journal Browser
Go
Journal of City and Development. 2021, 3(1), 12-30
DOI: 10.12691/jcd-3-1-3
Open AccessArticle

Incorporating K-means, Hierarchical Clustering and PCA in Customer Segmentation

Azad Abdulhafedh1,

1University of Missouri, USA

Pub. Date: February 02, 2021

Cite this paper:
Azad Abdulhafedh. Incorporating K-means, Hierarchical Clustering and PCA in Customer Segmentation. Journal of City and Development. 2021; 3(1):12-30. doi: 10.12691/jcd-3-1-3

Abstract

This paper addresses the use of clustering algorithms in the customer segmentation to define a marketing strategy of a credit card company. Customer segmentation divides customers into groups based on common characteristics, which is useful for banks, businesses, and companies to improve their products or service opportunities. The analysis explores the applications of the K-means, the Hierarchical clustering, and the Principal Component Analysis (PCA) in identifying the customer segments of a company based on their credit card transaction history. The dataset used in the project summarizes the usage behavior of 8950 active credit card holders in the last 6 months, and our aim is to perform customer segmentation in the most accurate way using clustering techniques. The project uses two approaches for customer segmentation: first, by considering all variables in the clustering algorithms using the Hierarchical clustering and the K-means. Second, by applying the dimensionality reduction through Principal Component Analysis (PCA) to the dataset, then identifying the optimal number of clusters, and repeating the clustering analysis with the updated number of clusters. Results show that the PCA can effectively be employed in the clustering process as a check tool for the K-means and Hierarchical clustering.

Keywords:
K-means Hierarchical Clustering Principal Component Analysis Agglomerative hierarchical clustering scree plot Silhouette average width Davies-Bouldin Index Dunn index customer segmentation

Creative CommonsThis work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

References:

[1]  James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer.
 
[2]  Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2008. The Elements of Statistical Learning. Springer.
 
[3]  Brett Lantz. 2019. Machine Learning with R. Packt Publishing Ltd.
 
[4]  Alboukadel Kassambara. 2017. Practical Guide to Cluster Analysis in R: Unsupervised Machine Learning. Sthda.com.
 
[5]  Alboukadel Kassambara. 2017. Practical Guide to Principal Component Methods in R. Sthda.com.
 
[6]  Alboukadel Kassambara. 2017. R Graphics Essentials for Great Data Visualization. Sthda.com.
 
[7]  Aurélien Géron. 2019. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly.
 
[8]  Philip D. Waggoner. 2020. Unsupervised Machine Learning for Clustering in Political and Social Research. Cambridge University Press.
 
[9]  Ankur A. Patel. 2019. Hands-On Unsupervised Learning Using Python: How to Build Applied Machine Learning Solutions from Unlabeled Data. O’Reilly.
 
[10]  Nayna Maheshwari. 2020. Artificial Intelligence: Applications, Problem Solving, Machine Learning, Knowledge Representation and Reasoning.
 
[11]  Bradford Tuckfield. 2019. Applied Unsupervised Learning with R: Uncover hidden relationships and patterns with k-means clustering, hierarchical clustering, and PCA. Packt Publishing Ltd.
 
[12]  Tarek Amr. 2020. Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits: A practical guide to implementing supervised and unsupervised machine learning algorithms in Python. Packt Publishing Ltd.
 
[13]  Morgan Maynard. 2020. Machine Learning: Introduction to Supervised and Unsupervised Learning Algorithms with Real-World Applications.
 
[14]  LazyProgrammer. 2016. Unsupervised Machine Learning in Python: Master Data Science and Machine Learning with Cluster Analysis, Gaussian Mixture Models, and Principal Components Analysis.
 
[15]  Rowel Atienza. 2020. Advanced Deep Learning with TensorFlow 2 and Keras: Apply DL, GANs, VAEs, deep RL, unsupervised learning, object detection and segmentation. 2nd edition, Packt Publishing Ltd.
 
[16]  Fred Nwanganga and Mike Chapple. 2020. Practical Machine Learning in R. Wiley.
 
[17]  Stephen Marsland. 2011. Machine Learning: An Algorithmic Perspective. Chapman and Hall/CRC.
 
[18]  Abdulhafedh, A. (2016). Crash Frequency Analysis. Journal of Transportation Technologies, 6, 169-180.
 
[19]  Steven L. Brunton and J. Nathan Kutz. 2019. Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press.
 
[20]  Pratap Dangeti. 2017. Statistics for Machine Learning: Techniques for exploring supervised, unsupervised, and reinforcement learning models with Python and R. Packt Publishing Ltd.
 
[21]  Abdulhafedh, Azad. (2017). Road Crash Prediction Models: Different Statistical Modeling Approaches. Journal of Transportation Technologies, 7, 190-205.
 
[22]  Marius Leordeanu. 2020. Unsupervised Learning in Space and Time: A Modern Approach for Computer Vision using Graph-based Techniques and Deep Neural Networks. Springer.
 
[23]  Michael Colins. 2017. Machine Learning: An Introduction to Supervised and Unsupervised Learning Algorithms.
 
[24]  Chirag Shah. 2020. A Hands-On Introduction to Data Science. Cambridge University Press.
 
[25]  Sunil Kumar Chinnamgari. 2019. R Machine Learning Projects: Implement supervised, unsupervised, and reinforcement learning techniques using R 3.5. Packt Publishing Ltd.
 
[26]  Abdulhafedh, Azad. (2017). Incorporating the Multinomial Logistic Regression in Vehicle Crash Severity Modeling: A Detailed Overview. Journal of Transportation Technologies, 7, 279-303.
 
[27]  Kevin Jolly. 2018. Machine Learning with scikit-learn Quick Start Guide: Classification, regression, and clustering techniques in Python. Packt Publishing Ltd.
 
[28]  Abdulhafedh, A. (2017). A Novel Hybrid Method for Measuring the Spatial Autocorrelation of Vehicular Crashes: Combining Moran’s Index and Getis-Ord Gi Statistic. Open Journal of Civil Eng , 7, 208-221.
 
[29]  Cory Lesmeister. 2017. Mastering Machine Learning with R: Advanced prediction, algorithms, and learning methods with R 3.x. Packt Publishing Ltd.
 
[30]  M. Emre Celebi and Kemal Aydin. 2016. Unsupervised Learning Algorithms. Springer.