×
Home Current Archive Editorial board
Instructions for papers
For Authors Aim & Scope Contact
Original scientific article

A ROBUST FEATURE ENGINEERING ARCHITECTURE INCORPORATING HYBRID SAMPLING AND SEMANTIC-STRUCTURAL CODE AUGMENTATION

By
P. Bhavani Orcid logo ,
P. Bhavani

Research Scholar, Department of Computer Science and Engineering, Sri Manakula Vinayagar Engineering College , Puducherry , India

N. Danapaquiame Orcid logo
N. Danapaquiame

Professor and Head, Department of Computer Science and Engineering, Sri Manakula Vinayagar Engineering College , Puducherry , India

Abstract

The sophistication of the contemporary code has augmented defect prediction (SDP) with vital concerns like severe imbalance in classes, high redundancy of features and failure of conventional techniques to gain the rich semantic and structural context of a source code. The model suggested within the current paper is HDA-SE-GFF that has Semantic-Enriched Graph Feature Fusion. In order to address the data imbalance, a hybrid sampling method, which is a mixture of SMOTE and Tomek links, is applied to produce synthetic minor samples without duplicating the majority data on the edge of choice. The Stacked Denoising Autoencoders (SDAEs) store deep features so as to achieve noise-invariant features. Meanwhile, code context may be acquired through translating Abstract Syntax trees (ASTs) and Program Dependency graphs (PDGs) to semantic-structural embeddings. The combination of these deep and structure features is with a weighted interpolation factor (or -) and is categorized by a Softmax layer. HDA-SE-GFF was trained in a 5-layer auto-encoder with 256-512 hidden cells, 20 % dropout and a learning rate of 5. The architecture, according to the experimental results of object-oriented software repositories, is much superior to baseline methods in terms of classification accuracy and model robustness. The key lessons learned include that the hybrid sampling approach is an effective method of alleviating the bias of the majority classes and at 80-300 edge-density code behavior, the combined representation of AST and PDG embedding. HDA-SE-GFF model suggests scaling and precise quality assurance procedure into the quality of software, an effective method of bridging the disassociation between the unmanaged process of feature learning and code structural examinations of large-scale systems.

References

1.
Gao F, Xu S, Hao W, Lu T. KA-RAG: Integrating Knowledge Graphs and Agentic Retrieval-Augmented Generation for an Intelligent Educational Question-Answering Model. Applied Sciences. 2025;15(23):12547.
2.
Li W, Pu N, Sebe N, Zheng H, Zhong Z. Prototypical Hash Encoding for On-the-Fly Fine-Grained Category Discovery. Advances in Neural Information Processing Systems 37. Neural Information Processing Systems Foundation, Inc. (NeurIPS); 2024. p. 101428–55.
3.
Husainat M. Exploiting Graphics Processing Units to Speed Up Subgraph Enumeration for Efficient Graph Pattern Mining GraphDuMato. PatternIQ Mining. 2024;1(2).
4.
Wei Y, Xu Y, Zhu L, Ma J, Huang J. FUMMER: A fine-grained self-supervised momentum distillation framework for multimodal recommendation. Information Processing & Management. 2024;61(5):103776.
5.
Rahim R. Optimizing reconfigurable architectures for enhanced performance in computing. SCCTS Transactions on Reconfigurable Computing. 2024;(1):11–5.

Citation

This is an open access article distributed under the  Creative Commons Attribution Non-Commercial License (CC BY-NC) License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 

Article metrics

Google scholar: See link

The statements, opinions and data contained in the journal are solely those of the individual authors and contributors and not of the publisher and the editor(s). We stay neutral with regard to jurisdictional claims in published maps and institutional affiliations.