A Machine Learning Framework for Detecting Phishing Websites

Danat Gutema, Ravikumar S., S. Vinoth Kumar

Abstract


Background: Identifying legitimate websites from fake ones has become increasingly difficult as phishing attempts have become a major hazard in the digital age. A recent poll conducted by Intel found that 97% of security professionals were unsure of how to differentiate between the two. Machine learning, with its many methods for URL classification, presents a viable answer to this problem.

Objectives: The major purpose of this research is to examine how well machine learning performs at identifying potentially fake websites. Different machine learning approaches have been developed and compared in this study to achieve this end. Additionally, the study tries to assess the most accurate model against current solutions in the literature.

Methods: To do this, we use a large dataset that includes both legal and counterfeit websites to train machine learning algorithms. These algorithms can learn to distinguish between the two types if they are presented with examples from both. As a result, advanced phishing detection systems may be built that can immediately warn users when they visit malicious websites.

Statistical Analysis: The algorithms utilized, datasets, and performance indicators for the machine learning methods used will all be presented and analyzed statistically in this part. An impartial assessment of the procedures will be provided.

Findings: The outcomes of using machine learning methods to identify fraudulent websites will be summed up in this section. Comparisons of the accuracy and efficiency of several algorithms for detecting phishing websites will be highlighted.

Applications and Enhancements: Here, we’ll look at how the research can be used in the real world to make the Internet safer for everyone. To further develop the machine learning models for improved accuracy and user safety, we will also address potential enhancements and future research initiatives.

Keywords


website phishing, random forest, decision tree, dataset, identity theft.

Full Text:

PDF

References


Ahmed, Dalia Shihab, Assist Prof Dr Karim Q. Hussein, and Hanan Abed Alwally Abed Allah. “Phishing Websites Detection Model based on Decision Tree Algorithm and Best Feature Selection Method”, Turkish Journal of Computer and Mathematics Education (TURCOMAT)13.1 (2022): 100-107.

Ahmed, Kahkasha, and Sameena Naaz. “Detection of phishing websites using machine learning approach”, Proceedings of International Conference on Sustainable Computing in Science, Technology and Management (SUSCOM), Amity University Rajasthan, Jaipur-India. 2019.

Almseidin, Mohammad, et al. “Phishing detection based on machine learning and feature selection methods”, (2019): 171-183.

Alnemari, Shouq, and Majid Alshammari. “Detecting phishing domains using machine learning”, Applied Sciences13.8 (2023): 4649.

Dutta, Ashit Kumar. Detecting phishing websites using machine learning technique”, PloS one16.10 (2021): e0258361

https://apwg.org/trendsreports/

https://wisdomml.in/malicious-url-detection-using-machine-learning-in-python/

https://www.egress.com/blog/phishing/how-to-identify-a-phishing-website#:~:text=Another%20indication%20that%20you%20may,omitted%2C%20treat%20it%20with%20suspicion.

https://www.egress.com/blog/phishing/phishing-statistics-round-up

https://www.techtarget.com/searchsecurity/definition/phishing#:~:text=Fake%20websites%20are%20set%20up,the%20user%20to%20respond%20quickly

Mehanović, Dželila, and Jasmin Kevrić. “Phishing Website Detection Using Machine Learning Classifiers Optimized by Feature Selection”, Traitement du Signal 37.4 (2020).

Salahdine, Fatima, Zakaria El Mrabet, and Naima Kaabouch. “Phishing Attacks Detection A Machine Learning-Based Approach”, 2021 IEEE 12th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON). IEEE, 2021.

Saravanan, Priya, and Selvakumar Subramanian. “A framework for detecting phishing websites using GA based feature selection and ARTMAP based website classification”, Procedia Computer Science 171 (2020): 1083-1092.




DOI: http://dx.doi.org/10.5281%2Fzenodo.14069089

Refbacks

  • There are currently no refbacks.




Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.