Predictive Modeling of Student Academic Performance in Higher Education: A Machine Learning Framework for Learning Analytics

Authors

Xin Li School of Computer and Software, Chengdu Jincheng College, Sichuan Chengdu 611731

DOI:

https://doi.org/10.53469/wjimt.2025.08(07).05

Keywords:

Intelligent education, Big data, A portrait of schoolwork, Pearson correlation

Abstract

This study proposes an integrated machine learning framework for predicting student academic performance in higher education, leveraging data-driven approaches to optimize learning analytics and educational interventions. By synthesizing multi-source data—including historical grades, learning behaviors, socioeconomic factors, and online engagement metrics—the research employs advanced machine learning algorithms to construct high-precision predictive models. Experimental results demonstrate that the Random Forest model achieves exceptional performance in early academic warning tasks, predicting Week 6 academic outcomes with 97.03% accuracy (sensitivity: 95.26%; specificity: 98.80%). To address class imbalance, SMOTETomek resampling and feature scaling techniques significantly improved Gradient Boosting classifier performance to 85.30%. Furthermore, a stacked ensemble architecture (RF-GB-SVC) enhanced cross-institutional prediction accuracy to 86.38%, with SHAP value analysis revealing key determinants: class attendance (SHAP value: +0.71) and familial background (e.g., maternal occupation contribution: 1.992).

References

Wang, Hao. "Joint Training of Propensity Model and Prediction Model via Targeted Learning for Recommendation on Data Missing Not at Random." AAAI 2025 Workshop on Artificial Intelligence with Causal Techniques. 2025.

Ding, C.; Wu, C. Self-Supervised Learning for Biomedical Signal Processing: A Systematic Review on ECG and PPG Signals. medRxiv 2024.

Xie, Minhui, and Shujian Chen. "InVis: Interactive Neural Visualization System for Human-Centered Data Interpretation." Authorea Preprints (2025).

Hu, Xiao. "AdPercept: Visual Saliency and Attention Modeling in Ad 3D Design." (2025).

Zhang, Yuhan. "InfraMLForge: Developer Tooling for Rapid LLM Development and Scalable Deployment." (2025).

Qin, Haoshen, et al. "Optimizing deep learning models to combat amyotrophic lateral sclerosis (ALS) disease progression." Digital health 11 (2025): 20552076251349719.

Li, X., Lin, Y., & Zhang, Y. (2025). A Privacy-Preserving Framework for Advertising Personalization Incorporating Federated Learning and Differential Privacy. arXiv preprint arXiv:2507.12098.

Wang, Yang, and Kowei Shih. "Hybrid multi-modal recommendation system: Integrating mmoe and xgboost for enhanced personalization and accuracy." 2024 4th International Conference on Artificial Intelligence, Robotics, and Communication (ICAIRC). IEEE, 2024.

Fu, Lei, et al. "Adversarial Prompt Optimization in LLMs: HijackNet’s Approach to Robustness and Defense Evasion." 2025 4th International Symposium on Computer Applications and Information Technology (ISCAIT). IEEE, 2025.

Downloads

Published

2025-07-30

Issue

Vol. 8 No. 7 (2025)

Section

Articles