APPLYING CLUSTERING METHODS TO CLASSIFY CUSTOMERS BASED ON SHOPPING BEHAVIOUR
DOI:
https://doi.org/10.61591/jslhu.22.713Keywords:
Customer segmentation, Shopping behaviour, K-Means, Hierarchical Clustering, Gaussian Mixture Model (GMM)Abstract
Customer segmentation is crucial for optimizing marketing strategies. This study applies and compares the effectiveness of three common clustering algorithms: K-Means, Hierarchical Clustering, and Gaussian Mixture Models (GMM) to classify customers based on shopping behavior and demographics (age, gender, total spending). Utilizing three retail datasets (two from Kaggle, one from Sling Academy), the research performs data preprocessing, applies the clustering algorithms, and evaluates their performance using Silhouette Score, Davies-Bouldin Index, and Calinski-Harabasz Index. The results indicate that GMM performs most effectively for segmenting based on total spending and gender, creating distinct clusters. Hierarchical Clustering proves suitable for detailed age-based analysis on specific datasets, while K-Means offers a balanced solution, particularly effective when cluster structures are clear or rapid results are needed. The study recommends selecting appropriate algorithms based on specific business objectives and data characteristics, enabling businesses to develop more effective personalized marketing strategies.
References
Chen, D., Sain, S. L., & Guo, K. (2012). Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining. Journal of Database Marketing & Customer Strategy Management, 19(3), 197–208.
DOI:10.1057/dbm.2012.17
Vohra, R., Pahareeya, J., Hussain, A., Ghali, F., & Lui, A. (2020). Using self organizing maps and K means clustering based on RFM model for customer segmentation in the online retail business. Lecture Notes in Computer Science, 484-497. Springer.
Wedel, M., & Kamakura, W. A. (2000). Market segmentation: Conceptual and methodological foundations (2nd ed.). Kluwer Academic Publishers.
Steinbach, M., Karypis, G., & Kumar, V. (2000). A comparison of document clustering techniques. KDD workshop on text mining.
Karl, T. (2024, February 12). DBSCAN vs. K-Means: A guide in python. New Horizons. [URL...]
(Hierarchical Clustering): Narayanamma, P. L., & Govindan, N. K. (2021). An efficient customer segmentation approach using K-Means and Hierarchical clustering techniques for online E-commerce data. Annals of the Romanian Society for Cell Biology, 25(6), 13000-13010.
(GMM): Al-Shboul, M., Al-Sayyed, R., & Cristea, A. I. (2023). A New Probabilistic Customer Segmentation Approach Based on Gaussian Mixture Models and Weighted RFM Features for E-commerce Websites. Information, 14(2), 97.
DOI: https://doi.org/10.3390/info14020097
(So sánh K-Means, Hierarchical, DBSCAN) Alshboul, O., Sheta, A., & Al-Sayyed, R. (2022). Customer Segmentation Using Machine Learning Approaches: K-Means, Hierarchical Clustering, and DBSCAN. In International Conference on Information Technology and Applications (pp. 187-199). Springer, Singapore. https://doi.org/10.1007/978-981-19-3371-2_15
(Factors & Segmentation) Putri, D. M., & Nugroho, L. E. (2023). Customer Segmentation Based on Online Purchasing Behavior Using K-Means Clustering. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 7(1), 161-168.
DOI: https://doi.org/10.29207/resti.v7i1.4698 (Phân khúc dựa trên hành vi mua online)
Cui, G., Cheng, Q., & Kwok, K. (2023). Artificial intelligence in marketing research and practice: a review and research agenda. Review of Marketing Research, 20, 73-98.
DOI: https://doi.org/10.1108/S1548-643520230000020004
Kumar, V., & Pansari, A. (2021). Artificial Intelligence and Machine Learning in Marketing. Journal of the Academy of Marketing Science, 49(2), 189-191.
DOI: https://doi.org/10.1007/s11747-020-00765-2
Hennig, C. (2019). Cluster validation by measurement of clustering characteristics relevant to the user. In Data Analysis and Applications 1 (pp. 1–24). John Wiley & Sons, Inc.
Runzhao, Y., & Qianni, C. (2019). Time-satisfaction of data series based on computer original genetic algorithm gradually covers the location and algorithm of electric vehicle charging station. Journal of Intelligent & Fuzzy Systems, 37(5), 5993–6001.
DOI: https://doi.org/10.3233/jifs-179181
Elayaraja, M., Maheshwari, D., Manikandan, R., & Ramkumar, M. (2023). Customer Segmentation using Machine Learning Algorithms: A Review. In 2023 International Conference on Networking and Communications (ICNWC) (pp. 1–6).
Sinaga, K. P., & Yang, M.-S. (2020). Unsupervised K-means clustering algorithm. IEEE Access: Practical Innovations, Open Solutions, 8, 80716–80727.
DOI: https://doi.org/10.1109/access.2020.2988796
Dursun, A., & Caber, M. (2016). Using data mining techniques for profiling profitable hotel customers: An application of RFM analysis. Tourism Management Perspectives, 18, 153–160.
DOI: https://doi.org/10.1016/j.tmp.2016.03.001
Khajvand, M., Zolfaghar, K., Ashoori, S., & Alizadeh, S. (2011). Estimating customer lifetime value based on RFM analysis of customer purchase behavior: Case study. Procedia Computer Science, 3, 57–63.
DOI: https://doi.org/10.1016/j.procs.2010.12.011
Schubert, E., Sander, J., Ester, M., Kriegel, H. P., & Xu, X. (2017). DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Transactions on Database Systems (TODS), 42(3), 1–21.
Dwivedi, Y. K., Ismagilova, E., Hughes, D. L., Carlson, J., Filieri, R., Jacobson, J., Jain, V., Karjaluoto, H., Kefi, H., Krishen, A. S., Kumar, V., Rahman, M. M., Raman, R., Rauschnabel, P. A., Rowley, J., Salo, J., Tran, G. A., & Wang, Y. (2021). Setting the future of digital and social media marketing research: Perspectives and research propositions. International Journal of Information Management, 59(102168), 102168.