Cervical cancer perceived risk factors behavior using logistic regression technique
-
Abstract
Cervical cancer represents a considerable global health challenge, mainly because of ineffective screening programs in middle-income countries. The current study aimed to forecast cervical cancer incidence by analyzing behavioral risk factors through logistic regression, employing feature engineering techniques such as principal component analysis (PCA). PCA successfully condensed the dataset into ten principal components, capturing 89% of the variance, while stratified K-fold cross-validation ensured a balanced representation of classes. With the application of L1 regularization, the logistic regression model achieved an accuracy of 97.2%, an AUC of 98.1%, an F1 score of 97.2%, a specificity of 96.1%, and a log loss of 0.17. The performance of models was comparatively evaluated, and the results revealed that the logistic regression model achieved the highest accuracy of 97.2% in comparison with decision trees at 93.33%, random forest at 93.33%, XGBoost at 93.33%, Naive Bayes at 91.67%, and non-regularized logistic regression at 87.55%. This research underscores the importance of early prediction of cervical cancer based on behavioral risk factors and suggests a robust, easily implementable workflow to improve classification accuracy. Future research should concentrate on refining these predictive tools to overcome social and behavioral barriers to prevention, particularly within underserved populations.
-
-