FEATURE SELECTION METHOD USING HYBRID SWARM WITH IMPROVED FUZZY C-MEANS CLUSTERING IN DATA MINING FOR DISEASE DETECTION

M. Birundha Rani; Dr.A. Subramani

doi:10.70102/afts.2025.1833.569

Original scientific article

Published: October 2025

<< Prev | Next >>

PDF

https://doi.org/10.70102/afts.2025.1833.569

FEATURE SELECTION METHOD USING HYBRID SWARM WITH IMPROVED FUZZY C-MEANS CLUSTERING IN DATA MINING FOR DISEASE DETECTION

Abstract

A crucial method for reducing the dimensionality issue in DM (Data Mining) tasks is FS (Feature Selection). Conventional techniques for FS do not scale well in vast spaces. The HPSO-IKM approach has a rather long processing time, so future studies will keep enhancing the technique's stages to reduce the duration of detection. PSO's poor local search capability and lagging convergence in the refining search phase prevent it from mitigating the effects of poor initialization by reducing the greatest number of IC (Intra-Clustering) faults. This paper suggests a novel approach to the dimensionality issue, in which a good feature subset is produced through combining the correlation metric using clustering. Following Z Score Normalization (ZSN) for pre-processing, a computational model is constructed to identify the pertinent features based on pertinent constraints, and a structure is developed by extracting features via Principal Component Analysis (PCA). Next, utilizing Multi-Objective Glowworm Swarm Optimization using Improved Fuzzy C-Means Clustering (MOGWO-IFCM), unnecessary features are removed, and non-redundant features are chosen from every cluster based on correlation measures. This approach employs the IFCM technique for optimizing the initial clustering center after receiving the optimal solution as an initial clustering center with the GSO (Glowworm Swarm Optimization) technique. Utilizing the Modified Long Short-Term Memory (MLSTM) classifier, the suggested approach is tested on UCI datasets, and the outcomes are contrasted with those of other well-known FS methods. Percent-wise criteria are employed to confirm the accuracy of the suggested technique with varying numbers of pertinent features. The suggested technique's accuracy and efficiency are demonstrated by the outcomes of the experiment.

Keywords:

data mining,

swarm optimization,

z score normalization.

References

Wang XD, Chen RC, Yan F, Zeng ZQ, Hong CQ. Fast Adaptive K-Means Subspace Clustering for High-Dimensional Data. IEEE Access. 2019;7:42639–51.

Nizam M, Zaneta S, Basri F. Machine Learning based Human eye disease interpretation. International Journal of Communication and Computer Technologies (IJCCTS). 2023;(2):42–52.

Jaiswal JK, Samikannu R. Application of Random Forest Algorithm on Feature Subset Selection and Classification and Regression. 2017 World Congress on Computing and Communication Technologies (WCCCT). IEEE; 2017. p. 65–8.

Singhal P, Yadav R, Dwivedi U. Unveiling patterns and abnormalities of human gait: a comprehensive study. Indian Journal of Information Sources and Services. 2024;(1):51–70.

ADC: Novel Methodology for Code Converter Application for Data Processing. Journal of VLSI circuits and systems. 2023;4(2).

Blum AL, Langley P. Selection of relevant features and examples in machine learning. Artificial Intelligence. 1997;97(1–2):245–71.

Lakkaraju H, Kamar E, Caruana R, Horvitz E. Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration. Proceedings of the AAAI Conference on Artificial Intelligence. 2017;31(1).

Soofi A, Awan A. Classification techniques in machine learning: applications and issues. Journal of Basic & Applied Sciences. 2017;459–65.

Chen R, Dewi C, Huang S, Caraka R. Selecting critical features for data classification based on machine learning methods. Journal of Big Data. 2020;(1):52.

10.

Alhassan S, Abdul-Salaam DrG, Micheal A, Missah YM, Ganaa DrED, Shirazu AS. CFS-AE: Correlation-based Feature Selection and Autoencoder for Improved Intrusion Detection System Performance. Journal of Internet Services and Information Security. 2024;14(1):104–20.

11.

Ying S, Xue-Ying Z. Characteristics of human auditory model based on compensation of glottal features in speech emotion recognition. Future Generation Computer Systems. 2018;81:291–6.

12.

Rodriguez-Galiano V, Sanchez-Castillo M, Chica-Olmo M, Chica-Rivas M. Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geology Reviews. 2015;71:804–18.

13.

Zhu F, Jiang M, Qiu Y, Sun C, Wang M. RSLIME: An Efficient Feature Importance Analysis Approach for Industrial Recommendation Systems. 2019 International Joint Conference on Neural Networks (IJCNN). IEEE; 2019. p. 1–6.

14.

Bhuyan HK, Kamila NK. Privacy preserving sub-feature selection in distributed data mining. Applied Soft Computing. 2015;36:552–69.

15.

Ravisankar P, Ravi V, Raghava Rao G, Bose I. Detection of financial statement fraud and feature selection using data mining techniques. Decision Support Systems. 2011;50(2):491–500.

16.

Rahmat A, Nurrahman AA, Pramono SA, Ahmadi D, Firdaus W, Rahim R. Data Optimization using PSO and K-Means Algorithm. Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications. 2023;14(3):14–24.

17.

Devi SG, Sabrigiriraj M. Swarm intelligent based online feature selection (OFS) and weighted entropy frequent pattern mining (WEFPM) algorithm for big data analysis. Cluster Computing. 2017;22(S5):11791–803.

18.

Abualigah L, Dulaimi AJ. A novel feature selection method for data mining tasks using hybrid Sine Cosine Algorithm and Genetic Algorithm. Cluster Computing. 2021;24(3):2161–76.

19.

Wideband Patch Antenna For Military Applications. National Journal of Antennas and Propagation. 2020;2(1).

20.

Tang J, Zhang J, Yu G, Zhang W, Yu W. Multisource Latent Feature Selective Ensemble Modeling Approach for Small-Sample High-Dimensional Process Data in Applications. IEEE Access. 2020;8:148475–88.

21.

Gong L, Xie S, Zhang Y, Wang M, Wang X. Hybrid Feature Selection Method Based on Feature Subset and Factor Analysis. IEEE Access. 2022;10:120792–803.

22.

Fan W, Bouguila N, Ziou D. Unsupervised Hybrid Feature Extraction Selection for High-Dimensional Non-Gaussian Data Clustering with Variational Inference. IEEE Transactions on Knowledge and Data Engineering. 2013;25(7):1670–85.

23.

Huang Z, Yang C, Zhou X, Huang T. A Hybrid Feature Selection Method Based on Binary State Transition Algorithm and ReliefF. IEEE Journal of Biomedical and Health Informatics. 2019;23(5):1888–98.

24.

Zhang X, Mei C, Li J, Yang Y, Qian T. Instance and Feature Selection Using Fuzzy Rough Sets: A Bi-Selection Approach for Data Reduction. IEEE Transactions on Fuzzy Systems. 2023;31(6):1981–94.

25.

Z-Score Normalized Feature Selection and Iterative African Buffalo Optimization for Effective Heart Disease Prediction. International Journal of Intelligent Engineering and Systems. 2023;16(1):25–37.

26.

Shah SMS, Batool S, Khan I, Ashraf MU, Abbas SH, Hussain SA. Feature extraction through parallel Probabilistic Principal Component Analysis for heart disease diagnosis. Physica A: Statistical Mechanics and its Applications. 2017;482:796–807.

27.

Sefidian AM, Daneshpour N. Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model. Expert Systems with Applications. 2019;115:68–94.

28.

Aljarah I, Ludwig SA. A new clustering approach based on Glowworm Swarm Optimization. 2013 IEEE Congress on Evolutionary Computation. IEEE; 2013. p. 2642–9.

29.

Yu Y, Si X, Hu C, Zhang J. A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Computation. 2019;31(7):1235–70.

30.

Sherstinsky A. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network. Physica D: Nonlinear Phenomena. 2020;404:132306.

31.

Fong S, Wong R, Vasilakos AV. Accelerated PSO Swarm Search Feature Selection for Data Stream Mining Big Data. IEEE Transactions on Services Computing. 2016;9(1):33–45.

32.

Wang D, Nie F, Huang H. Feature Selection via Global Redundancy Minimization. IEEE Transactions on Knowledge and Data Engineering. 2015;27(10):2743–55.

33.

García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A. A review of robust clustering methods. Advances in Data Analysis and Classification. 2010;4(2–3):89–109.

34.

Chen R. Using Deep Learning to Predict User Rating on Imbalance Classification Data. IAENG International Journal of Computer Science. 2019;(1).

35.

S V. An Efficient Feature Subset Selection with Fuzzy Wavelet Neural Network for Data Mining in Big Data Environment. Journal of Internet Services and Information Security. 2023;13(2):233–48.

36.

Cenggoro TW, Mahesworo B, Budiarto A, Baurley J, Suparyanto T, Pardamean B. Features Importance in Classification Models for Colorectal Cancer Cases Phenotype in Indonesia. Procedia Computer Science. 2019;157:313–20.

37.

CAMGÖZLÜ Y, KUTLU Y. Leaf Image Classification Based on Pre-trained Convolutional Neural Network Models. Natural and Engineering Sciences. 2023;8(3):214–32.

38.

Venkatasalam K, Rajendran P, Thangavel M. Improving the Accuracy of Feature Selection in Big Data Mining Using Accelerated Flower Pollination (AFP) Algorithm. Journal of Medical Systems. 2019;43(4).

Citation

Copyright

This is an open access article distributed under the Creative Commons Attribution Non-Commercial License (CC BY-NC) License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Article metrics

Google scholar: See link

Issue 33, 2025

A NOVEL FRAMEWORK FOR ENHANCING DATA COLLECTION MACRO- STRATEGIES IN HETEROGENEOUS IOT NETWORKS USING ADVANCED MATHEMATICAL MODELING GA-PSO-MIN: A HYBRID HEURISTIC ALGORITHM FOR MULTI-OBJECTIVE JOB SCHEDULING IN CLOUD COMPUTING HOMOGENEITY URBAN CELLULAR AUTOMATA MODEL – FROM REGENERATIVE TO SUSTAINABLE CITIES IOT POWERED SMART CRADLE FOR INFANT CARE AND VACCINATION MONITORING SYSTEM ENVIRONMENTAL ANALYSIS OF A LOW-COST SOLAR STOVE USING RECYCLED MATERIALS: A CLEAN ENERGY INNOVATION FOR HOT ARID REGIONS See full issue

About us

Editorial policy

FEATURE SELECTION METHOD USING HYBRID SWARM WITH IMPROVED FUZZY C-MEANS CLUSTERING IN DATA MINING FOR DISEASE DETECTION

Abstract

Keywords:

References

Citation

Copyright

Article metrics

Issue 33, 2025

Disclaimer