Hallucination Detection on Code Generation with SelfCheckGPT
Digital data available(科学技術振興機構)
Begin reading now
J-STAGE
Holdings of Libraries in Japan
This page shows libraries in Japan other than the National Diet Library that hold the material.
Please contact your local library for information on how to use materials or whether it is possible to request materials from the holding libraries.
other
J-STAGE
DigitalCiNii Research
Search ServiceDigitalYou can check the holdings of institutions and databases with which CiNii Research is linked at the site of CiNii Research.
Bibliographic Record
You can check the details of this material, its authority (keywords that refer to materials on the same subject, author's name, etc.), etc.
- Material Type
- 記事
- Author Heading
- Publication Date
- 2025
- Publication Date (W3CDTF)
- 2025
- Periodical title
- Journal of Information Processing
- No. or year of volume/issue
- 33 0
- Volume
- 33
- Issue
- 0
- Pages
- 487-493
- Publication date of volume/issue (W3CDTF)
- 2025
- Publication (Periodical Title)
- Information Processing Society of Japan
- Text Language Code
- en
- Target Audience
- 一般
- DOI
- 10.2197/ipsjjip.33.487
- Related Material (URI)
- References
- DeepBugs: a learning approach to name-based bug detectionCodeBERT: A Pre-Trained Model for Programming and Natural LanguagesSelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language ModelsOut of the BLEU: How should we assess quality of the Code Generation models?LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and MitigationIntelliCode compose: code generation using transformerCodeBERTScore: Evaluating Code Generation with Pretrained Models of CodeDeveloper Testing in the IDE: Patterns, Beliefs, and BehaviorSurvey of Hallucination in Natural Language GenerationCodeJudge: Evaluating Code Generation with Large Language ModelsUsing LLMs in Software Requirements Specifications: An Empirical EvaluationAdvancing Requirements Engineering Through Generative AI: Assessing the Role of LLMsA Normalized Levenshtein Distance MetricTexygenBLEU
- Data Provider (Database)
- 国立情報学研究所 : CiNii Research
- Original Data Provider (Database)
- Japan Link CenterCrossref
- Summary, etc.
- <p>Large language models (LLMs) are expected to bring automation and efficiency to software development, including programming. However, an LLM encounters a challenge known as “hallucination, ” where it produces incorrect content or outputs that deviate from input requirements. SelfCheckGPT is one of the methods designed to detect hallucinations. Its key feature lies in its ability to infer the occurrence of hallucinations without requiring reference data or test cases. Although SelfCheckGPT has been evaluated and applied in natural language processing tasks such as text summarization and question answering, its performance in code generation has not yet been explored. In this study, we applied SelfCheckGPT to the HumanEval dataset, a standard benchmark for code generation, and investigated its evaluation performance by comparing it with execution-based evaluations. The results revealed that calculating similarity using BLEU, ROUGE-L, and EditSim is adequate for predicting the correctness of code or, in other words, hallucinations.</p>
- DOI
- 10.2197/ipsjjip.33.487
- Access Restrictions
- インターネット公開
- Data Provider (Database)
- 科学技術振興機構 : J-STAGE