An IoT-Ready Framework for Predictive Healthcare Using INGA Feature Selection and Six-Classifier Assessment
DOI:
https://doi.org/10.64229/y7c97d86Keywords:
Healthcare Analytics, Machine Learning, Predictive Modelling, Random Forest (RF), Support Vector Machine (SVM), Big DataAbstract
Healthcare’s rapid digitization via Electronic Health Records and IoT-enabled sensing has created heterogeneous Medical Big Data whose volume, velocity, variety, and variable veracity strain conventional analytics and impede scalable prediction in clinical workflows. This work presents an integrated predictive healthcare analytics framework that couples rigorous preprocessing with an Improved Niche Genetic Algorithm (INGA) for feature selection and a comparative evaluation of six supervised classifiers to enable automated, reliable pre-diagnosis suitable for resource-constrained, real-time settings. The preprocessing pipeline rectifies missing and erroneous entries through statistical and semantic repairs, normalizes numeric attributes, encodes categorical variables, and applies a 70:30 train-test split to support unbiased assessment across models and metrics. INGA encodes candidate feature subsets as binary chromosomes, optimizes a prediction-error-based fitness under niche-preserving evolution, and reduces the UCI Heart Disease dataset from 76 attributes to an optimal 10, achieving about 87% dimensionality reduction while maintaining diagnostic fidelity and lowering computational overheads critical for edge deployment. On the INGA-selected features, Support Vector Machine (quadratic kernel), K-Nearest Neighbor, Gaussian Naïve Bayes, Logistic Regression, Decision Tree, and Random Forest are benchmarked using accuracy, precision, recall, F1-score, and confusion matrices to capture clinically relevant trade-offs. Random Forest attains the top accuracy of 91.8%, with balanced precision-recall, while SVM achieves 88% for classification and 83% in a prognostic case, highlighting complementary strengths of ensemble and kernel methods on compact feature sets. Results confirm that combining robust preprocessing, evolutionary feature selection, and multi-model evaluation yields scalable, interpretable, and accurate decision support for IoT-driven healthcare, establishing a practical pathway from data ingestion to actionable clinical insights.
References
[1]Tayefi, M., Ngo, P., Chomutare, T., Dalianis, H., Salvi, E., Budrionis, A., & Godtliebsen, F. (2021). Challenges and opportunities beyond structured data in analysis of electronic health records. Wiley Interdisciplinary Reviews: Computational Statistics, 13(6), e1549.
[2]Mustafa, A., & Rahimi Azghadi, M. (2021). Automated machine learning for healthcare and clinical notes analysis. Computers, 10(2), 24.
[3]Shen, Y.-T., Chen, L., Yue, W.-W., & Xu, H.-X. (2021). Digital technology-based telemedicine for the COVID-19 pandemic. Frontiers in Medicine, 8, 646506.
[4]Laymouna, M., Ma, Y., Lessard, D., Schuster, T., Engler, K., & Lebouché, B. (2024). Roles, users, benefits, and limitations of chatbots in health care: Rapid review. Journal of Medical Internet Research, 26, e56930.
[5]Kushwaha, S. (2023). An effective adaptive fuzzy filter for speckle noise reduction. Multimedia Tools and Applications, 2023, 1-16. Springer.
[6]Adeghe, E. P., Okolo, C. A., & Ojeyinka, O. T. (2024). The role of big data in healthcare: A review of implications for patient outcomes and treatment personalization. World Journal of Biology Pharmacy and Health Sciences, 17(3), 198-204.
[7]Amaya-Tejera, N., Gamarra, M., Vélez, J. I., & Zurek, E. (2024). A distance-based kernel for classification via Support Vector Machines. Frontiers in Artificial Intelligence, 7, 1287875. https://doi.org/10.3389/frai.2024.1287875
[8]Ebrahimi, M., & Basiri, A. (2024). RACEkNN: A hybrid approach for improving the effectiveness of the k-nearest neighbor algorithm. Knowledge-Based Systems, 301(112357), 112357. https://doi.org/10.1016/j.knosys.2024.112357
[9]Atoyebi, T. O., Olanrewaju, R. F., Blamah, N. V., & Uwazie, E. C. (2024). Comparison of multinomial naive Bayes (MNB), Gaussian naive Bayes (GNB) and random forest (RF) algorithm in malaria disease diagnosis. 2024 International Conference on Science, Engineering and Business for Driving Sustainable Development Goals (SEB4SDG), 1-6.
[10]Peretz, O., Koren, M., & Koren, O. (2024). Naive Bayes classifier - An ensemble procedure for recall and precision enrichment. Engineering Applications of Artificial Intelligence, 136(108972), 108972. https://doi.org/10.1016/j.engappai.2024.108972
[11]Srisuradetchai, P., & Suksrikran, K. (2024). Random kernel k-nearest neighbors regression. Frontiers in Big Data, 7, 1402384.
[12]Ajmal, S., Ibrahim Ahmed, A. A., & Jalota, C. (2023). Natural language processing in improving information retrieval and knowledge discovery in healthcare conversational agents. Journal of Artificial Intelligence and Machine Learning in Management, 7(1), 34-47.
[13]Grové, C. (2021). Co-developing a mental health and wellbeing chatbot with and for young people. Frontiers in Psychiatry, 11, 606041.
[14]Li, I., Pan, J., Goldwasser, J., Verma, N., Wong, W. P., Nuzumlali, M. Y., & Rosand, B. (2022). Neural natural language processing for unstructured data in electronic health records: A review. Computer Science Review, 46, 100511.
[15]Poria, S., Cambria, E., Ku, L.-W., Gui, C., & Gelbukh, A. (2014). A rule-based approach to aspect extraction from product reviews. In Proceedings of the Second Workshop on Natural Language Processing for Social Media (SocialNLP) (pp. 28-37). Association for Computational Linguistics.
[16]Kushwaha, S., Chithras, T., Girija, S. P., Prasanth, K. G., Minisha, R. A., Dhanalakshmi, M., Jayanthi, A., Robin, C. R. R., & Rajaram, A. (2024). Efficient liver disease diagnosis using infrared image processing for enhanced detection and monitoring. Journal of Environmental Protection and Ecology, 25(4), 1266-1278.
[17]Papadopoulos, P., Soflano, M., Chaudy, Y., Adejo, W., & Connolly, T. M. (2022). A systematic review of technologies and standards used in the development of rule-based clinical decision support systems. Health and Technology, 12(4), 713-727.
[18]Hussain, M., Hussain, J., Ali, T., Ali, S. I., Bilal, H. S. M., Lee, S., & Chung, T. (2021). Text classification in clinical practice guidelines using machine-learning assisted pattern-based approach. Applied Sciences, 11(8), 3296.
[19]Rezaeian, N., & Novikova, G. (2020). Persian text classification using Naive Bayes algorithms and support vector machine algorithm. Indonesian Journal of Electrical Engineering and Informatics (IJEEI), 8(1), 178-188.
[20]Mohan, M., Patil, A., Mohana, S., Subhashini, P., Kushwaha, S., & Pandian, S. M. (2022). Multi-tier kernel for disease prediction using texture analysis with MR images. In Proceedings of the IEEE International Conference on Edge Computing and Applications (ICECAA 2022) (pp. 1020-1024). Gnanamani College of Technology, Namakkal, Tamilnadu, India.
[21]Gridach, M. (2020). A framework based on (probabilistic) soft logic and neural network for NLP. Applied Soft Computing, 93, 106232.
[22]Tao, J., & Fang, X. (2020). Toward multilabel sentiment analysis: A transfer learning based approach. Journal of Big Data, 7(1), 1.
[23]Kashina, M., Lenivtceva, I. D., & Kopanitsa, G. D. (2020). Preprocessing of unstructured medical data: The impact of each preprocessing stage on classification. Procedia Computer Science, 178, 284-290.
[24]Mascio, A., Kraljevic, Z., Bean, D., Dobson, R., Stewart, R., Bendayan, R., & Roberts, A. (2020). Comparative analysis of text classification approaches in electronic health records. arXiv Preprint, arXiv:2005.06624.
[25]Hicks, S. A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M. A., Halvorsen, P., & Parasa, S. (2022). On evaluation metrics for medical applications of artificial intelligence. Scientific Reports, 12, 5979.
[26]Alsentzer, E., Murphy, J. R., Boag, W., Weng, W.-H., Jin, D., Naumann, T., & McDermott, M. (2019). Publicly available clinical BERT embeddings. arXiv Preprint, arXiv:1904.03323.
[27]Zhu, R., Tu, X., & Huang, J. X. (2021). Utilizing BERT for biomedical and clinical text mining. In Data analytics in biomedical engineering and healthcare (pp. 73-103). Academic Press.
[28]Kushwaha, S., & Singh, R. K. (2019). Optimization of the proposed hybrid denoising technique to overcome over-filtering issue. Biomedical Engineering/Biomedizinische Technik, 64(5), 601-618.
[29]Gao, S., Alawad, M., Young, M. T., Gounley, J., Schaefferkoetter, N., Yoon, H. J., & Wu, X.-C. (2021). Limitations of transformers on clinical text classification. IEEE Journal of Biomedical and Health Informatics, 25(9), 3596-3607.
[30]Zaheer, M., Guruganesh, G., Dubey, K. A., Ainslie, J., Alberti, C., Ontañón, S., Pham, P., … (2020). Big Bird: Transformers for longer sequences. Advances in Neural Information Processing Systems, 33, 17283-17297.
[31]Dai, X., Chalkidis, I., Darkner, S., & Elliott, D. (2022). Revisiting transformerbased models for long document classification. arXiv Preprint, arXiv:2204.06683.
[32]Beltagy, I., Peters, M. E., & Cohan, A. (2020). Longformer: The longdocument transformer. arXiv Preprint, arXiv:2004.05150.
[33]Alshoaibi, A. M., & Fageehi, Y. A. (2024). Advances in Finite Element Modeling of Fatigue Crack Propagation. Applied Sciences, 14(20), 9297.
[34]Kovaleva, O., Romanov, A., Rogers, A., & Rumshisky, A. (2019). Revealing the dark secrets of BERT. arXiv Preprint, arXiv:1908.08593.
[35]Hadi, M. U., Qureshi, R., Shah, A., Irfan, M., Zafar, A., Shaikh, M. B., Akhtar, N., Wu, J., & Mirjalili, S. (2023). A survey on large language models: Applications, challenges, limitations, and practical usage. Authorea Preprints.
[36]Ampel, B., Yang, C.-H., Hu, J., & Chen, H. (2023). Large language models for conducting advanced text analytics information systems research. ACM Transactions on Management Information Systems. Advance online publication.
[37]Wu, Y. (2024). Large language model and text generation. In S. Ananiadou & T. Baldwin (Eds.), Natural language processing in biomedicine: A practical guide (pp. 265-297). Springer.
[38]Nassiri, K., & Akhloufi, M. A. (2024). Recent advances in large language models for healthcare. BioMedInformatics, 4(2), 1097-1143.
[39]Yang, J., Jin, H., Tang, R., Han, X., Feng, Q., Jiang, H., Zhong, S., Yin, B., & Hu, X. (2024). Harnessing the power of LLMs in practice: A survey on ChatGPT and beyond. ACM Transactions on Knowledge Discovery from Data, 18(6), 1-32.
[40]Liu, Y., Li, X., Wang, B., & Xu, Y. (2025). Transmit Power Optimization for Intelligent Reflecting Surface-Assisted Coal Mine Wireless Communication Systems. IoT, 6(4), 59.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Sumit Kushwaha, Sarthak Vishnoi (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.