A ROBUST FEATURE ENGINEERING ARCHITECTURE INCORPORATING HYBRID SAMPLING AND SEMANTIC-STRUCTURAL CODE AUGMENTATION

P. Bhavani; N. Danapaquiame

doi:10.70102/afts.2025.1834.815

Original scientific article

Published: December 2025

<< Prev | Next >>

PDF

https://doi.org/10.70102/afts.2025.1834.815

A ROBUST FEATURE ENGINEERING ARCHITECTURE INCORPORATING HYBRID SAMPLING AND SEMANTIC-STRUCTURAL CODE AUGMENTATION

Abstract

The sophistication of the contemporary code has augmented defect prediction (SDP) with vital concerns like severe imbalance in classes, high redundancy of features and failure of conventional techniques to gain the rich semantic and structural context of a source code. The model suggested within the current paper is HDA-SE-GFF that has Semantic-Enriched Graph Feature Fusion. In order to address the data imbalance, a hybrid sampling method, which is a mixture of SMOTE and Tomek links, is applied to produce synthetic minor samples without duplicating the majority data on the edge of choice. The Stacked Denoising Autoencoders (SDAEs) store deep features so as to achieve noise-invariant features. Meanwhile, code context may be acquired through translating Abstract Syntax trees (ASTs) and Program Dependency graphs (PDGs) to semantic-structural embeddings. The combination of these deep and structure features is with a weighted interpolation factor (or -) and is categorized by a Softmax layer. HDA-SE-GFF was trained in a 5-layer auto-encoder with 256-512 hidden cells, 20 % dropout and a learning rate of 5. The architecture, according to the experimental results of object-oriented software repositories, is much superior to baseline methods in terms of classification accuracy and model robustness. The key lessons learned include that the hybrid sampling approach is an effective method of alleviating the bias of the majority classes and at 80-300 edge-density code behavior, the combined representation of AST and PDG embedding. HDA-SE-GFF model suggests scaling and precise quality assurance procedure into the quality of software, an effective method of bridging the disassociation between the unmanaged process of feature learning and code structural examinations of large-scale systems.

Keywords:

defect prediction,

feature engineering,

autoencoders,

SMOTE,

code semantics,

software quality.

References

Gao F, Xu S, Hao W, Lu T. KA-RAG: Integrating Knowledge Graphs and Agentic Retrieval-Augmented Generation for an Intelligent Educational Question-Answering Model. Applied Sciences. 2025;15(23):12547.

Li W, Pu N, Sebe N, Zheng H, Zhong Z. Prototypical Hash Encoding for On-the-Fly Fine-Grained Category Discovery. Advances in Neural Information Processing Systems 37. Neural Information Processing Systems Foundation, Inc. (NeurIPS); 2024. p. 101428–55.

Husainat M. Exploiting Graphics Processing Units to Speed Up Subgraph Enumeration for Efficient Graph Pattern Mining GraphDuMato. PatternIQ Mining. 2024;1(2).

Wei Y, Xu Y, Zhu L, Ma J, Huang J. FUMMER: A fine-grained self-supervised momentum distillation framework for multimodal recommendation. Information Processing & Management. 2024;61(5):103776.

Rahim R. Optimizing reconfigurable architectures for enhanced performance in computing. SCCTS Transactions on Reconfigurable Computing. 2024;(1):11–5.

Zhang Z, Saber T. Machine Learning Approaches to Code Similarity Measurement: A Systematic Review. IEEE Access. 2025;13:51729–64.

Lv C, Qi M, Li X, Yang Z, Ma H. SGFormer: Semantic Graph Transformer for Point Cloud-Based 3D Scene Graph Generation. Proceedings of the AAAI Conference on Artificial Intelligence. 2024;38(5):4035–43.

alkato A ameid, sakhnini Y. Advanced Real-Time Anomaly Detection and Predictive Trend Modelling in Smart Systems using Deep Belief Networks Architectures. PatternIQ Mining. 2025;

Hudagi MR, Soma S, Biradar RL. Political Improved Invasive Weed Optimization-Driven Hybrid Exemplar Technique for Video Inpainting Process. International Journal of Pattern Recognition and Artificial Intelligence. 2023;37(01).

10.

Borhan M. Exploring smart technologies towards applications across industries. Innovative Reviews in Engineering and Science. 2025;(2):9–16.

11.

Gandhi ST. AI-Driven Smart Contract Security: A Deep Learning Approach to Vulnerability Detection. International Journal of Advanced Research in Computer Science & Technology. 2025;08(01).

12.

Syed Abu Bakar SAR, Waseem S, Omar Z, Bilalashfaqahmed. Exploring the Advancements and Challenges of Deepfake Face-swap: A Survey. Multimedia Tools and Applications. 2026;85(1).

13.

Zaki N, Alderei R, Alketbi M, Alkaabi A, Alneyadi F, Zaki N. Beyond N-Grams: Enhancing String Kernels With Transformer-Guided Semantic Insights. IEEE Access. 2025;13:97779–93.

14.

Jin D, He C, Zou Q, Qin Y, Wang B. Source Code Vulnerability Detection Based on Joint Graph and Multimodal Feature Fusion. Electronics. 2025;14(5):975.

15.

Wang X, Liu J, Deng J, Wang M, Deng Q, Yan Y, et al. Semantic Alignment of Malicious Question Based on Contrastive Semantic Networks and Data Augmentation. Journal of Artificial Intelligence Research. 2025;82:1243–66.

16.

Gao Y, Zhang H, Lyu C. EnCoSum: enhanced semantic features for multi-scale multi-modal source code summarization. Empirical Software Engineering. 2023;28(5).

17.

Yadav KK, Thakur G, Srivastava J. A Hybrid Tri-Encoder model for fake news detection in Bengali with LIME-based explainability. Intelligent Decision Technologies. 2025;19(6):3984–4003.

18.

Mahouachi R. Knowledge distillation-driven commit-aware multimodal learning for software vulnerability detection. Automated Software Engineering. 2026;33(2).

19.

Li L, Li J, Xu Y, Zhu H, Zhang X. Enhancing Code Summarization with Graph Embedding and Pre-Trained Model. International Journal of Software Engineering and Knowledge Engineering. 2023;33(11n12):1765–86.

20.

Shebl K, Afify Y, Badr N. Software Defect Prediction Approaches Revisited. International Journal of Intelligent Computing and Information Sciences. 2023;(3):31–58.

21.

Gong J, Niu W, Li S, Zhang M, Zhang X. Sensitive Behavioral Chain-Focused Android Malware Detection Fused With AST Semantics. IEEE Transactions on Information Forensics and Security. 2024;19:9216–29.

22.

Yang Z, Du H, Niyato D, Wang X, Zhou Y, Feng L, et al. Revolutionizing Wireless Networks with Self-Supervised Learning: A Pathway to Intelligent Communications. IEEE Wireless Communications. 2025;32(4):108–15.

23.

Yang X, Li H, Zhang J, Deng Y, Yuan H. Integrating textual data and knowledge graphs for intelligent fault diagnosis in railway operational equipment. Applied Intelligence. 2025;56(1).

24.

Houssel PRB, Layeghy S, Singh P, Portmann M. eX-NIDS: A framework for explainable network intrusion detection leveraging Large Language Models. Computers and Electrical Engineering. 2026;129:110826.

25.

Alhanaf AS, Balik HH, Farsadi M. Intelligent Fault Detection and Classification Schemes for Smart Grids Based on Deep Neural Networks. Energies. 2023;16(22):7680.

26.

Sun Y, Zheng J, Zhao H, Zhou H, Li J, Li F, et al. Modifying the one-hot encoding technique can enhance the adversarial robustness of the visual model for symbol recognition. Expert Systems with Applications. 2024;250:123751.

27.

Ito JY, Silveira FF, Munhoz IP, Akkari ACS. International publication trends in Lean Agile Management research: a bibliometric analysis. Procedia Computer Science. 2023;219:666–73.

28.

Luosang G, Wang Z, Liu J, Zeng F, Yi Z, Wang J. Automated Quality Assessment of Medical Images in Echocardiography Using Neural Networks with Adaptive Ranking and Structure-Aware Learning. International Journal of Neural Systems. 2024;34(10).

29.

Zheng A, Cai J, Yang H, Xun Y, Zhao X. Triple-Stream Contrastive Deep Embedding Clustering via Semantic Structure. Mathematics. 2025;13(22):3578.

30.

Yang L, Zhao S. ARGUS: A Neuro-Symbolic System Integrating GNNs and LLMs for Actionable Feedback on English Argumentative Writing. Systems. 2025;13(12):1079.

31.

Villegas-Ch W, Gutierrez R, García-Ortiz J, Guevara V. Explainable educational assistant integrated in Moodle: automated semantic assessment and adaptive tutoring based on NLP and XAI. Discover Artificial Intelligence. 2025;5(1).

Citation

Copyright

This is an open access article distributed under the Creative Commons Attribution Non-Commercial License (CC BY-NC) License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Article metrics

Google scholar: See link

Issue 34, 2025

THE MODEL OF GREEN ENTREPRENEURSHIP FACTORS ON THE INTERNATIONALIZATION PERFORMANCE OF SMES IN CHINA: A CONCEPTUAL FRAMEWORK HORNED LIZARD-CATBOOST FRAMEWORK FOR CYBERBULLYING PREVENTION IN SOCIAL NETWORKS ON LEVERAGING GENERATIVE ARTIFICIAL INTELLIGENCE (GENAI) FOR BEHAVIOR LEARNING AND PERSONALIZED MARKETING OPTIMIZATION ENHANCING IP COMMERCIALIZATION PERFORMANCE IN SOCIAL SCIENCE ACADEMICS AND THE ROLE OF ENTREPRENEURIAL ORIENTATION, UNIVERSITY SUPPORT, AND SELF-EFFICACY DETERMINANTS OF EMPLOYEE ENGAGEMENT IN ORGANIZED RETAIL: AN ANALYTICAL STUDY See full issue

About us

Editorial policy

A ROBUST FEATURE ENGINEERING ARCHITECTURE INCORPORATING HYBRID SAMPLING AND SEMANTIC-STRUCTURAL CODE AUGMENTATION

Abstract

Keywords:

References

Citation

Copyright

Article metrics

Issue 34, 2025

Citations

Disclaimer