Journal of Cancer Sciences

Research Article

Predictive Analytics for Disease Diagnosis: A Study on Healthcare Data with Machine Learning Algorithms and Big Data

Purna Chandra Rao Chinta1*, Chethan Sriharsha Moore2, Laxmana Murthy Karaka3, Manikanth Sakuru4, Varun Bodepudi5 and Srinivasa Rao Maka6

1Microsoft , Sr Technical Support Enginner
2Microsoft , Sr Technical Support Engineer
3Code Ace Solutions Inc, Software Engineer
4JP Morgan Chase, Lead Software Engineer
5Deloitte Consulting LLP, Senior Solution Specialist
6North Star Group Inc, Software Engineer
*Address for Correspondence:Purna Chandra Rao Chinta, Microsoft, Sr Technical Support Enginner Email Id: chpurnachandrarao@gmail.com
Submission: 04 January, 2025 Accepted:31 January, 2025 Published:03 February, 2025
Copyright: © 2025 Chinta PCR, et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Keywords: World Health Organization (WHO); Breast Cancer; Tumour; Machine Learning; Healthcare; Disease Diagnosis; Feedforward neural network (FFNN); Random Forest (RF); Decision Tree (DT); Convolution Neural Network (CNN)

Abstract

At now, breast cancer ranks second among women in terms of cancer-related deaths, making it a major epidemiological issue. The illness is not caught early enough, and half of the one million women diagnosed with breast cancer annually die from the condition. This research aims to predict the occurrence of breast cancer using various ML algorithms, including Feed forward Neural Network, Random Forest, and Decision Tree, with the goal of reducing the risk of death from this disease, which is a second most common cause of death among women globally. This research uses the Breast Cancer Wisconsin (Diagnostic) dataset to assess ML models that may diagnose breast cancer. The FNN model outperformed RF and DT, achieving the best overall performance with a precision, recall, and accuracy of 97.18%. These results highlight the FNN’s robustness in minimising false positives and maximising true positives, making it a reliable tool for breast cancer diagnosis. To further enhance the accuracy of feature extraction and classification, future research may look at incorporating stronger deep learning models such transformer architectures and Convolution Neural Networks (CNNs). The model’s generalisability and clinical usefulness might be further validated by using bigger and more varied datasets.