×
Home Current Archive Editorial board
Instructions for papers
For Authors Aim & Scope Contact
Original scientific article

A ROBUST FEATURE ENGINEERING ARCHITECTURE INCORPORATING HYBRID SAMPLING AND SEMANTIC-STRUCTURAL CODE AUGMENTATION

By
P. Bhavani Orcid logo ,
P. Bhavani

Research Scholar, Department of Computer Science and Engineering, Sri Manakula Vinayagar Engineering College , Puducherry , India

N. Danapaquiame Orcid logo
N. Danapaquiame

Professor and Head, Department of Computer Science and Engineering, Sri Manakula Vinayagar Engineering College , Puducherry , India

Abstract

The sophistication of the contemporary code has augmented defect prediction (SDP) with vital concerns like severe imbalance in classes, high redundancy of features and failure of conventional techniques to gain the rich semantic and structural context of a source code. The model suggested within the current paper is HDA-SE-GFF that has Semantic-Enriched Graph Feature Fusion. In order to address the data imbalance, a hybrid sampling method, which is a mixture of SMOTE and Tomek links, is applied to produce synthetic minor samples without duplicating the majority data on the edge of choice. The Stacked Denoising Autoencoders (SDAEs) store deep features so as to achieve noise-invariant features. Meanwhile, code context may be acquired through translating Abstract Syntax trees (ASTs) and Program Dependency graphs (PDGs) to semantic-structural embeddings. The combination of these deep and structure features is with a weighted interpolation factor (or -) and is categorized by a Softmax layer. HDA-SE-GFF was trained in a 5-layer auto-encoder with 256-512 hidden cells, 20 % dropout and a learning rate of 5. The architecture, according to the experimental results of object-oriented software repositories, is much superior to baseline methods in terms of classification accuracy and model robustness. The key lessons learned include that the hybrid sampling approach is an effective method of alleviating the bias of the majority classes and at 80-300 edge-density code behavior, the combined representation of AST and PDG embedding. HDA-SE-GFF model suggests scaling and precise quality assurance procedure into the quality of software, an effective method of bridging the disassociation between the unmanaged process of feature learning and code structural examinations of large-scale systems.

References

1.
Gao F, Xu S, Hao W, Lu T. KA-RAG: Integrating Knowledge Graphs and Agentic Retrieval-Augmented Generation for an Intelligent Educational Question-Answering Model. Applied Sciences. 2025;15(23):12547.
2.
Li W, Pu N, Sebe N, Zheng H, Zhong Z. Prototypical Hash Encoding for On-the-Fly Fine-Grained Category Discovery. Advances in Neural Information Processing Systems 37. Neural Information Processing Systems Foundation, Inc. (NeurIPS); 2024. p. 101428–55.
3.
Husainat M. Exploiting Graphics Processing Units to Speed Up Subgraph Enumeration for Efficient Graph Pattern Mining GraphDuMato. PatternIQ Mining. 2024;1(2).
4.
Wei Y, Xu Y, Zhu L, Ma J, Huang J. FUMMER: A fine-grained self-supervised momentum distillation framework for multimodal recommendation. Information Processing & Management. 2024;61(5):103776.
5.
Rahim R. Optimizing reconfigurable architectures for enhanced performance in computing. SCCTS Transactions on Reconfigurable Computing. 2024;(1):11–5.
6.
Zhang Z, Saber T. Machine Learning Approaches to Code Similarity Measurement: A Systematic Review. IEEE Access. 2025;13:51729–64.
7.
Lv C, Qi M, Li X, Yang Z, Ma H. SGFormer: Semantic Graph Transformer for Point Cloud-Based 3D Scene Graph Generation. Proceedings of the AAAI Conference on Artificial Intelligence. 2024;38(5):4035–43.
8.
alkato A ameid, sakhnini Y. Advanced Real-Time Anomaly Detection and Predictive Trend Modelling in Smart Systems using Deep Belief Networks Architectures. PatternIQ Mining. 2025;
9.
Hudagi MR, Soma S, Biradar RL. Political Improved Invasive Weed Optimization-Driven Hybrid Exemplar Technique for Video Inpainting Process. International Journal of Pattern Recognition and Artificial Intelligence. 2023;37(01).
10.
Borhan M. Exploring smart technologies towards applications across industries. Innovative Reviews in Engineering and Science. 2025;(2):9–16.
11.
Gandhi ST. AI-Driven Smart Contract Security: A Deep Learning Approach to Vulnerability Detection. International Journal of Advanced Research in Computer Science & Technology. 2025;08(01).
12.
Syed Abu Bakar SAR, Waseem S, Omar Z, Bilalashfaqahmed. Exploring the Advancements and Challenges of Deepfake Face-swap: A Survey. Multimedia Tools and Applications. 2026;85(1).
13.
Zaki N, Alderei R, Alketbi M, Alkaabi A, Alneyadi F, Zaki N. Beyond N-Grams: Enhancing String Kernels With Transformer-Guided Semantic Insights. IEEE Access. 2025;13:97779–93.
14.
Jin D, He C, Zou Q, Qin Y, Wang B. Source Code Vulnerability Detection Based on Joint Graph and Multimodal Feature Fusion. Electronics. 2025;14(5):975.
15.
Wang X, Liu J, Deng J, Wang M, Deng Q, Yan Y, et al. Semantic Alignment of Malicious Question Based on Contrastive Semantic Networks and Data Augmentation. Journal of Artificial Intelligence Research. 2025;82:1243–66.
16.
Gao Y, Zhang H, Lyu C. EnCoSum: enhanced semantic features for multi-scale multi-modal source code summarization. Empirical Software Engineering. 2023;28(5).
17.
Yadav KK, Thakur G, Srivastava J. A Hybrid Tri-Encoder model for fake news detection in Bengali with LIME-based explainability. Intelligent Decision Technologies. 2025;19(6):3984–4003.
18.
Mahouachi R. Knowledge distillation-driven commit-aware multimodal learning for software vulnerability detection. Automated Software Engineering. 2026;33(2).
19.
Li L, Li J, Xu Y, Zhu H, Zhang X. Enhancing Code Summarization with Graph Embedding and Pre-Trained Model. International Journal of Software Engineering and Knowledge Engineering. 2023;33(11n12):1765–86.
20.
Shebl K, Afify Y, Badr N. Software Defect Prediction Approaches Revisited. International Journal of Intelligent Computing and Information Sciences. 2023;(3):31–58.
21.
Gong J, Niu W, Li S, Zhang M, Zhang X. Sensitive Behavioral Chain-Focused Android Malware Detection Fused With AST Semantics. IEEE Transactions on Information Forensics and Security. 2024;19:9216–29.
22.
Yang Z, Du H, Niyato D, Wang X, Zhou Y, Feng L, et al. Revolutionizing Wireless Networks with Self-Supervised Learning: A Pathway to Intelligent Communications. IEEE Wireless Communications. 2025;32(4):108–15.
23.
Yang X, Li H, Zhang J, Deng Y, Yuan H. Integrating textual data and knowledge graphs for intelligent fault diagnosis in railway operational equipment. Applied Intelligence. 2025;56(1).
24.
Houssel PRB, Layeghy S, Singh P, Portmann M. eX-NIDS: A framework for explainable network intrusion detection leveraging Large Language Models. Computers and Electrical Engineering. 2026;129:110826.
25.
Alhanaf AS, Balik HH, Farsadi M. Intelligent Fault Detection and Classification Schemes for Smart Grids Based on Deep Neural Networks. Energies. 2023;16(22):7680.
26.
Sun Y, Zheng J, Zhao H, Zhou H, Li J, Li F, et al. Modifying the one-hot encoding technique can enhance the adversarial robustness of the visual model for symbol recognition. Expert Systems with Applications. 2024;250:123751.
27.
Ito JY, Silveira FF, Munhoz IP, Akkari ACS. International publication trends in Lean Agile Management research: a bibliometric analysis. Procedia Computer Science. 2023;219:666–73.
28.
Luosang G, Wang Z, Liu J, Zeng F, Yi Z, Wang J. Automated Quality Assessment of Medical Images in Echocardiography Using Neural Networks with Adaptive Ranking and Structure-Aware Learning. International Journal of Neural Systems. 2024;34(10).
29.
Zheng A, Cai J, Yang H, Xun Y, Zhao X. Triple-Stream Contrastive Deep Embedding Clustering via Semantic Structure. Mathematics. 2025;13(22):3578.
30.
Yang L, Zhao S. ARGUS: A Neuro-Symbolic System Integrating GNNs and LLMs for Actionable Feedback on English Argumentative Writing. Systems. 2025;13(12):1079.
31.
Villegas-Ch W, Gutierrez R, García-Ortiz J, Guevara V. Explainable educational assistant integrated in Moodle: automated semantic assessment and adaptive tutoring based on NLP and XAI. Discover Artificial Intelligence. 2025;5(1).

Citation

This is an open access article distributed under the  Creative Commons Attribution Non-Commercial License (CC BY-NC) License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 

Article metrics

Google scholar: See link

The statements, opinions and data contained in the journal are solely those of the individual authors and contributors and not of the publisher and the editor(s). We stay neutral with regard to jurisdictional claims in published maps and institutional affiliations.