Hallucination Detection on Code Generation with SelfCheckGPT

Material type: 記事

Author: Ito Wakaほか

Publisher: Information Processing Society of Japan

Publication date: 2025

Material Format: Digital

Journal name: Journal of Information Processing 33 0

Publication Page: p.487-493

View Details

Detailed bibliographic record

Summary, etc.：: <p>Large language models (LLMs) are expected to bring automation and efficiency to software development, including programming. However, an LLM encoun...

Holdings of Libraries in Japan

This page shows libraries in Japan other than the National Diet Library that hold the material.

List of Cooperating Institutions and Databases

Please contact your local library for information on how to use materials or whether it is possible to request materials from the holding libraries.

other

J-STAGE
Digital
J-STAGE
CiNii Research
Search Service
Digital
You can check the holdings of institutions and databases with which CiNii Research is linked at the site of CiNii Research.
Check the holdings of this book

Bibliographic Record

You can check the details of this material, its authority (keywords that refer to materials on the same subject, author's name, etc.), etc.

Digital

Material Type: 記事
Title: Hallucination Detection on Code Generation with SelfCheckGPT
Author Heading: Ito Waka
Sato Miyu
Obara Yui
Kuramitsu Kimio
Publication Date: 2025
Publication Date (W3CDTF): 2025
Periodical title: Journal of Information Processing
No. or year of volume/issue: 33 0
Volume: 33
Issue: 0
Pages: 487-493
Publication date of volume/issue (W3CDTF): 2025
Publication (Periodical Title): Information Processing Society of Japan
Text Language Code: en
Subject Heading: LLMs
generative AI
code generation
hallucination
evaluation metrics
Target Audience: 一般
DOI: 10.2197/ipsjjip.33.487
https://doi.org/10.2197/ipsjjip.33.487
Related Material (URI): https://www.jstage.jst.go.jp/article/ipsjjip/33/0/33_487/_pdf
References: DeepBugs: a learning approach to name-based bug detection
https://cir.nii.ac.jp/crid/1360011145753572480
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
https://cir.nii.ac.jp/crid/1360020701022781952
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
https://cir.nii.ac.jp/crid/1360022501345155968
Out of the BLEU: How should we assess quality of the Code Generation models?
https://cir.nii.ac.jp/crid/1360024022340832128
LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation
https://cir.nii.ac.jp/crid/1360024025226188928
IntelliCode compose: code generation using transformer
https://cir.nii.ac.jp/crid/1360298345090256128
CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code
https://cir.nii.ac.jp/crid/1360305497316018304
Developer Testing in the IDE: Patterns, Beliefs, and Behavior
https://cir.nii.ac.jp/crid/1360305497604995456
Survey of Hallucination in Natural Language Generation
https://cir.nii.ac.jp/crid/1360579820494762752
CodeJudge: Evaluating Code Generation with Large Language Models
https://cir.nii.ac.jp/crid/1360586971786245248
Using LLMs in Software Requirements Specifications: An Empirical Evaluation
https://cir.nii.ac.jp/crid/1360586972544462208
Advancing Requirements Engineering Through Generative AI: Assessing the Role of LLMs
https://cir.nii.ac.jp/crid/1360868448240871680
A Normalized Levenshtein Distance Metric
https://cir.nii.ac.jp/crid/1361699995767595392
Texygen
https://cir.nii.ac.jp/crid/1362544418386190976
BLEU
https://cir.nii.ac.jp/crid/1364233270606638080
Data Provider (Database): 国立情報学研究所 : CiNii Research
https://cir.nii.ac.jp/
Original Data Provider (Database): Japan Link Center
https://japanlinkcenter.org/top
Crossref
https://www.crossref.org

Digital

Summary, etc.: <p>Large language models (LLMs) are expected to bring automation and efficiency to software development, including programming. However, an LLM encounters a challenge known as “hallucination, ” where it produces incorrect content or outputs that deviate from input requirements. SelfCheckGPT is one of the methods designed to detect hallucinations. Its key feature lies in its ability to infer the occurrence of hallucinations without requiring reference data or test cases. Although SelfCheckGPT has been evaluated and applied in natural language processing tasks such as text summarization and question answering, its performance in code generation has not yet been explored. In this study, we applied SelfCheckGPT to the HumanEval dataset, a standard benchmark for code generation, and investigated its evaluation performance by comparing it with execution-based evaluations. The results revealed that calculating similarity using BLEU, ROUGE-L, and EditSim is adequate for predicting the correctness of code or, in other words, hallucinations.</p>
DOI: 10.2197/ipsjjip.33.487
Access Restrictions: インターネット公開
Data Provider (Database): 科学技術振興機構 : J-STAGE
http://www.jstage.jst.go.jp

See Less