Machine Learning Framework for Fertility Determinants with Integrated Feature Selection and Data Imputation Techniques

Name of the Presenting Author: 
Dr Rakesh Kumar Saroj
Abstract Content (not more than 300 word, should include: Introduction, Objective, Methodology, critical findings & Conclusion): 
Introduction and Objective: Declining fertility rates pose critical demographic and socioeconomic challenges. Identifying key determinants requires advanced methods capturing complex, non-linear relationships. This study uses machine learning with data imputation and feature selection to uncover major fertility drivers. Methodology: Using NFHS-5 data specific to Sikkim (n = 3,271; 45 variables), missing data were classified as MCAR, MAR, or MNAR through Chi-square tests and logistic regression. Imputation was performed using Mode, KNN, and MICE methods. Feature selection integrated filter (χ², Mutual Information, Pearson’s r), wrapper (RFE, Stepwise Regression), and embedded approaches (Random Forest, XGBoost, CatBoost, LightGBM, LASSO, SHAP, Permutation Importance). Models were trained with a 70:30 split and evaluated using MAE, RMSE, and R². Results: The study analysed NFHS-5 data from Sikkim (n = 3,271; 45 variables), addressing 18.9% missing values through systematic classification and imputation. Model performance improved significantly after imputation and feature reduction. Using all features post-imputation, XGBoost and CatBoost reached R² = 0.8029, while with the top 15 predictors, LightGBM achieved R² = 0.8561 and CatBoost R² = 0.8551 with RMSE reduced to 0.47. Overall, post-imputation improvements were statistically significant (ΔR² ≈ +0.10; p < 0.01), confirming the effectiveness of integrating imputation with feature selection in predicting fertility outcomes. Conclusion: Combining imputation strategies with feature selection enhances model accuracy and interpretability, uncovering robust fertility determinants. Findings highlight that delayed childbearing, rising education levels, contraceptive use, and fertility preferences are primary drivers of low fertility. These insights offer critical guidance for policy interventions, reproductive health programs, and sustainable demographic planning.
In case of not been selected for oral presentation, do you want to be considered for the poster presentation ?: 
No
Do you require financial support to attend the seminar ? (Not applicable for virtual meet): 
Yes-full
Email of the Presenting Author: 
Gender: 
Male
Mobile number of the Presenting Author: 
9454196475
Address & Pincode of the Presenting Author: 
Room No.34, of Computational and Integrative Sciences (SC&IS) Jawaharlal Nehru University-110070,New Delhi, India
Evaluation Status: 
No
Back to Top