An Improved Machine Learning-Based Model for Phishing Website and URL Detection

Authors

  • Warveen M. Eido Department of Information Technology, Technical College of Informatics, Akre University for Applied Sciences, Duhok, Kurdistan Region, Iraq https://orcid.org/0009-0001-7269-975X
  • Omar S. Kareem Department of Public Health, College of Health and Medical Technology, Duhok Polytechnic University, Duhok, Kurdistan Region, Iraq https://orcid.org/0000-0002-3063-2119

DOI:

https://doi.org/10.65542/djei.v2i2.40

Keywords:

XGBoost, ANOVA F-test, Machine Learning, Cybersecurity, Phishing Web site, URL Detection

Abstract

Cybersecurity experts consider that malicious URLs and phishing websites currently present their most dangerous threats because hackers use these threats to exploit both technical system weaknesses and user trust to steal sensitive data. The current detection methods, which use blacklist-based and rule-based systems, show decreasing effectiveness against new and unknown phishing attacks, which creates a demand for detection systems that can adapt to changing threats. The researchers developed an improved machine learning system that detects multiple types of phishing websites and URLs by using the ISCX-URL-2016 benchmark dataset. The framework uses data preprocessing methods, statistical feature engineering methods, and ANOVA F-test–based feature selection methods to enhance discriminative power while reducing feature redundancy. XGBoost serves as the primary classification model because it can handle the processing of high-dimensional structured URL features and the detection of complex nonlinear relationships. The system uses stratified cross-validation and randomized search as its hyperparameter tuning methods to achieve fairness in learning performance across different traffic types. The improved XGBoost model achieves high precision, recall, and F1-scores across all classes, which include benign, phishing, malware, defacement, and spam, while achieving an overall classification accuracy of 98.42%. The system reliably identifies phishing URLs with an F1 score of 0.96. The analysis of confusion matrix results shows that the system can separate different classes effectively because it produces very few misclassifications. The suggested architecture offers competitive performance with reduced computing complexity when compared to deep learning-based methods.

References

Prabakaran, M.K.; Sundaram, P.M.; Chandrasekar, A.D. An Enhanced Deep Learning ‐ Based Phishing Detection Mechanism to Effectively Identify Malicious URLs Using Variational Autoencoders. 2023, 423–440, doi:10.1049/ise2.12106.

Zara, U.M.E.; Ayyub, K.; Khan, H.U.; Daud, A.L.I.; Ahmad, S.G. Phishing Website Detection Using Deep Learning Models. IEEE Access 2024, 12, 167072–167087, doi:10.1109/ACCESS.2024.3486462.

Duarte, J.D.; Junior, P.C.; Paulo, J.; Da, J.; Member, S.; Costa, E.J.D.A.; Melo, L.P.D.E.; Nunes, R.R.; Soares, C.G.V.N. Machine Learning for Early Detection of Phishing URLs in Parked Domains : An Approach Applied to a Financial Institution. 2025, 13, doi:10.1109/ACCESS.2025.3599454.

Opara, C.; Chen, Y.; Wei, B. Look before You Leap : Detecting Phishing Web Pages by Exploiting Raw URL and HTML Characteristics. Expert Syst. Appl. 2024, 236, 121183, doi:10.1016/j.eswa.2023.121183.

Sahingoz, O.K.; Buber, E.; Kugu, E. DEPHIDES : Deep Learning Based Phishing Detection System. IEEE Access 2024, 12, 8052–8070, doi:10.1109/ACCESS.2024.3352629.

Guo, W.; Wang, Q.; Yue, H.; Sun, H.; Hu, R.Q. Efficient Phishing URL Detection Using Graph-Based Machine Learning and Loopy Belief Propagation.

Ogbuagu, B.C.U.; Akande, O.N.; Ogbuju, E. A Hybrid Deep Learning Technique for Spoofing Website URL Detection in Real ‑ Time Applications. J. Electr. Syst. Inf. Technol. 2024, 8, doi:10.1186/s43067-023-00128-8.

Karim, A.; Shahroz, M.; Mustofa, K.; Belhaouari, S.B.; Joga, S.R.K. Phishing Detection System Through Hybrid Machine Learning Based on URL. IEEE Access 2023, 11, 36805–36822, doi:10.1109/ACCESS.2023.3252366.

Mosa, D.T.; Shams, M.Y.; Abohany, A.A.; Thabet, M. Machine Learning Techniques for Detecting Phishing URL Attacks. 2023, doi:10.32604/cmc.2023.036422.

Kumar, A.V.; Prathiba, A.; Ashritha, A.; Reddy, N.H.; Shiny, X.S.A. Phishing Website Detection Based on URL Features. 2025, 5, 73–78.

Nallamala, S.H.; Namitha, K.; Raviteja, K.; Sumanth, K.S.; Kota, J.S. Phishing URL Detection Using Machine Learning. 2024.

Alzboon, M.S.; Alzboon, L. Phishing Website Detection Using Machine Learning Detección de Sitios Web de Phishing Mediante Aprendizaje Automático. 2025, doi:10.56294/gr202581.

Rao, G.K. Malicious URL Website Detection Using Ensemble Machine Learning Approach. 2025.

Goud, M.D. URL-BASED PHISHING DETECTION USING HYBRID MACHINE LEARNING. 2025, 3, 1–5.

Rani, L.M.; Feresa, C.; Foozy, M.; Noor, S.; Mustafa, B. Feature Selection to Enhance Phishing Website Detection Based On URL Using Machine Learning Techniques. 2023, 1, 30–41.

Bourigue, R.; Ait, D.; Hicham, O. Improving Online Security : A Deep Learning Model for Phishing URL Detection. Cluster Comput. 2025, 28, 1–13, doi:10.1007/s10586-025-05307-y.

Chudasama, D. Detection of Phishing Website Using Url. 2025.

Barik, K.; Misra, S.; Mohan, R. Web-Based Phishing URL Detection Model Using Deep Learning Optimization Techniques. Int. J. Data Sci. Anal. 2025, doi:10.1007/s41060-025-00728-9.

Detection, M.P.U.R.L.; Kocyigit, E.; Korkmaz, M.; Sahingoz, O.K. Applied Sciences Enhanced Feature Selection Using Genetic Algorithm For. 2024.

Almomani, O.; Alsaaidah, A.; Shambour, Q.; Abu-shareha, A.A.; Alzaqebah, A.; Amin, M. Enhance URL Defacement Attack Detection Using Particle Swarm Optimization and Machine Learning. 2025, 00, 1–13, doi:10.47852/bonviewJCCE52024668.

Alzubi, R. Improving Web Security through Machine Learning : A Feature-Based Methodology for Detecting Phishing URLs. 2025, 15, 26845–26851.

Downloads

Published

2026-04-12

How to Cite

M. Eido, W., & S. Kareem, O. (2026). An Improved Machine Learning-Based Model for Phishing Website and URL Detection. Dasinya Journal for Engineering and Informatics, 2(2). https://doi.org/10.65542/djei.v2i2.40