,
College of Computer Science and Information Technology, Wasit University , Kut , Iraq
Faculty of Computer Science and Mathematics, Kufa University , Kufa , Iraq
Citation network analysis is increasing rapidly nowadays. However, the most important challenge is addressing inaccurate citation detection, where many unreasonable citations happen today, such as when the content of the cited paper is not relevant to the reference paper (citing paper), reciprocity and small cycles, which mean that author A cites author B and author B cites author A (soviet citation). This can also happen in small groups or within institution concentration, such that authors are affiliated with the same institution. Inconsistency in the venue or scope is another significant issue. All these situations are studied in this paper, whereas other papers focus on one or two situations, by combining three complementary evidence blocks : (i) text features captured from the relations between the citing paper and the references using SBERT cosine and BM25 ranks, (ii) graph features extracted from citation –author network such as (PageRank, clustering, reciprocity counts, same-institution reference share and author overlap) and (iii) metadata feature extraction, such as (length,/ size, author affiliations and temporal context). These features are selected using ensample classification explanations (SHAP) to get stable features only. Our proposed method presents SiG-Gate, which is a similarity adaptive fusion head for identifying inaccurate citations by mutually leveraging text semantics, citation graph structure, and metadata features. Learning weights in the proposed SiG-Gate are adjusted by calibrating the involvement of each instance level, by participating calibrated unimodal outputs with explicit indicators of cross-modal disagreement, resulting in decisions that are both stronger and more interpretable. On stratified cross-validation, the approach attains accuracy = 0.946, F1-score = 0.91, precision =0.90 and AUC=0.96. The results indicate that consistency-aware, convergent evidence is essential for reliable and accurate citation detection.
This is an open access article distributed under the Creative Commons Attribution Non-Commercial License (CC BY-NC) License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
The statements, opinions and data contained in the journal are solely those of the individual authors and contributors and not of the publisher and the editor(s). We stay neutral with regard to jurisdictional claims in published maps and institutional affiliations.