An Improved Machine Learning-Based Model for Phishing Website and URL Detection
DOI:
https://doi.org/10.65542/djei.v2i2.40Keywords:
XGBoost, ANOVA F-test, Machine Learning, Cybersecurity, Phishing Web site, URL DetectionAbstract
Cybersecurity experts consider that malicious URLs and phishing websites currently present their most dangerous threats because hackers use these threats to exploit both technical system weaknesses and user trust to steal sensitive data. The current detection methods, which use blacklist-based and rule-based systems, show decreasing effectiveness against new and unknown phishing attacks, which creates a demand for detection systems that can adapt to changing threats. The researchers developed an improved machine learning system that detects multiple types of phishing websites and URLs by using the ISCX-URL-2016 benchmark dataset. The framework uses data preprocessing methods, statistical feature engineering methods, and ANOVA F-test–based feature selection methods to enhance discriminative power while reducing feature redundancy. XGBoost serves as the primary classification model because it can handle the processing of high-dimensional structured URL features and the detection of complex nonlinear relationships. The system uses stratified cross-validation and randomized search as its hyperparameter tuning methods to achieve fairness in learning performance across different traffic types. The improved XGBoost model achieves high precision, recall, and F1-scores across all classes, which include benign, phishing, malware, defacement, and spam, while achieving an overall classification accuracy of 98.42%. The system reliably identifies phishing URLs with an F1 score of 0.96. The analysis of confusion matrix results shows that the system can separate different classes effectively because it produces very few misclassifications. The suggested architecture offers competitive performance with reduced computing complexity when compared to deep learning-based methods.
References
Prabakaran, M.K.; Sundaram, P.M.; Chandrasekar, A.D. An Enhanced Deep Learning ‐ Based Phishing Detection Mechanism to Effectively Identify Malicious URLs Using Variational Autoencoders. 2023, 423–440, doi:10.1049/ise2.12106.
Zara, U.M.E.; Ayyub, K.; Khan, H.U.; Daud, A.L.I.; Ahmad, S.G. Phishing Website Detection Using Deep Learning Models. IEEE Access 2024, 12, 167072–167087, doi:10.1109/ACCESS.2024.3486462.
Duarte, J.D.; Junior, P.C.; Paulo, J.; Da, J.; Member, S.; Costa, E.J.D.A.; Melo, L.P.D.E.; Nunes, R.R.; Soares, C.G.V.N. Machine Learning for Early Detection of Phishing URLs in Parked Domains : An Approach Applied to a Financial Institution. 2025, 13, doi:10.1109/ACCESS.2025.3599454.
Opara, C.; Chen, Y.; Wei, B. Look before You Leap : Detecting Phishing Web Pages by Exploiting Raw URL and HTML Characteristics. Expert Syst. Appl. 2024, 236, 121183, doi:10.1016/j.eswa.2023.121183.
Sahingoz, O.K.; Buber, E.; Kugu, E. DEPHIDES : Deep Learning Based Phishing Detection System. IEEE Access 2024, 12, 8052–8070, doi:10.1109/ACCESS.2024.3352629.
Guo, W.; Wang, Q.; Yue, H.; Sun, H.; Hu, R.Q. Efficient Phishing URL Detection Using Graph-Based Machine Learning and Loopy Belief Propagation.
Ogbuagu, B.C.U.; Akande, O.N.; Ogbuju, E. A Hybrid Deep Learning Technique for Spoofing Website URL Detection in Real ‑ Time Applications. J. Electr. Syst. Inf. Technol. 2024, 8, doi:10.1186/s43067-023-00128-8.
Karim, A.; Shahroz, M.; Mustofa, K.; Belhaouari, S.B.; Joga, S.R.K. Phishing Detection System Through Hybrid Machine Learning Based on URL. IEEE Access 2023, 11, 36805–36822, doi:10.1109/ACCESS.2023.3252366.
Mosa, D.T.; Shams, M.Y.; Abohany, A.A.; Thabet, M. Machine Learning Techniques for Detecting Phishing URL Attacks. 2023, doi:10.32604/cmc.2023.036422.
Kumar, A.V.; Prathiba, A.; Ashritha, A.; Reddy, N.H.; Shiny, X.S.A. Phishing Website Detection Based on URL Features. 2025, 5, 73–78.
Nallamala, S.H.; Namitha, K.; Raviteja, K.; Sumanth, K.S.; Kota, J.S. Phishing URL Detection Using Machine Learning. 2024.
Alzboon, M.S.; Alzboon, L. Phishing Website Detection Using Machine Learning Detección de Sitios Web de Phishing Mediante Aprendizaje Automático. 2025, doi:10.56294/gr202581.
Rao, G.K. Malicious URL Website Detection Using Ensemble Machine Learning Approach. 2025.
Goud, M.D. URL-BASED PHISHING DETECTION USING HYBRID MACHINE LEARNING. 2025, 3, 1–5.
Rani, L.M.; Feresa, C.; Foozy, M.; Noor, S.; Mustafa, B. Feature Selection to Enhance Phishing Website Detection Based On URL Using Machine Learning Techniques. 2023, 1, 30–41.
Bourigue, R.; Ait, D.; Hicham, O. Improving Online Security : A Deep Learning Model for Phishing URL Detection. Cluster Comput. 2025, 28, 1–13, doi:10.1007/s10586-025-05307-y.
Chudasama, D. Detection of Phishing Website Using Url. 2025.
Barik, K.; Misra, S.; Mohan, R. Web-Based Phishing URL Detection Model Using Deep Learning Optimization Techniques. Int. J. Data Sci. Anal. 2025, doi:10.1007/s41060-025-00728-9.
Detection, M.P.U.R.L.; Kocyigit, E.; Korkmaz, M.; Sahingoz, O.K. Applied Sciences Enhanced Feature Selection Using Genetic Algorithm For. 2024.
Almomani, O.; Alsaaidah, A.; Shambour, Q.; Abu-shareha, A.A.; Alzaqebah, A.; Amin, M. Enhance URL Defacement Attack Detection Using Particle Swarm Optimization and Machine Learning. 2025, 00, 1–13, doi:10.47852/bonviewJCCE52024668.
Alzubi, R. Improving Web Security through Machine Learning : A Feature-Based Methodology for Detecting Phishing URLs. 2025, 15, 26845–26851.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Dasinya Journal for Engineering and Informatics

This work is licensed under a Creative Commons Attribution 4.0 International License.















Dasinya Journal for Engineering and Informatics is licensed under a