Jump to main content
記事

Hallucination Detection on Code Generation with SelfCheckGPT

Icons representing 記事

Hallucination Detection on Code Generation with SelfCheckGPT

Material type
記事
Author
Ito Wakaほか
Publisher
Information Processing Society of Japan
Publication date
2025
Material Format
Digital
Journal name
Journal of Information Processing 33 0
Publication Page
p.487-493
View Details

Detailed bibliographic record

Summary, etc.:

<p>Large language models (LLMs) are expected to bring automation and efficiency to software development, including programming. However, an LLM encoun...

Holdings of Libraries in Japan

This page shows libraries in Japan other than the National Diet Library that hold the material.

Please contact your local library for information on how to use materials or whether it is possible to request materials from the holding libraries.

other

  • J-STAGE

    Digital
  • CiNii Research

    Search Service
    Digital
    You can check the holdings of institutions and databases with which CiNii Research is linked at the site of CiNii Research.

Bibliographic Record

You can check the details of this material, its authority (keywords that refer to materials on the same subject, author's name, etc.), etc.

Digital

Material Type
記事
Publication Date
2025
Publication Date (W3CDTF)
2025
Periodical title
Journal of Information Processing
No. or year of volume/issue
33 0
Volume
33
Issue
0
Pages
487-493
Publication date of volume/issue (W3CDTF)
2025
Publication (Periodical Title)
Information Processing Society of Japan
Text Language Code
en
Target Audience
一般
References
DeepBugs: a learning approach to name-based bug detection
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Out of the BLEU: How should we assess quality of the Code Generation models?
LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation
IntelliCode compose: code generation using transformer
CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code
Developer Testing in the IDE: Patterns, Beliefs, and Behavior
Survey of Hallucination in Natural Language Generation
CodeJudge: Evaluating Code Generation with Large Language Models
Using LLMs in Software Requirements Specifications: An Empirical Evaluation
Advancing Requirements Engineering Through Generative AI: Assessing the Role of LLMs
A Normalized Levenshtein Distance Metric
Texygen
BLEU
Data Provider (Database)
国立情報学研究所 : CiNii Research

Digital

Summary, etc.
<p>Large language models (LLMs) are expected to bring automation and efficiency to software development, including programming. However, an LLM encounters a challenge known as “hallucination, ” where it produces incorrect content or outputs that deviate from input requirements. SelfCheckGPT is one of the methods designed to detect hallucinations. Its key feature lies in its ability to infer the occurrence of hallucinations without requiring reference data or test cases. Although SelfCheckGPT has been evaluated and applied in natural language processing tasks such as text summarization and question answering, its performance in code generation has not yet been explored. In this study, we applied SelfCheckGPT to the HumanEval dataset, a standard benchmark for code generation, and investigated its evaluation performance by comparing it with execution-based evaluations. The results revealed that calculating similarity using BLEU, ROUGE-L, and EditSim is adequate for predicting the correctness of code or, in other words, hallucinations.</p>
DOI
10.2197/ipsjjip.33.487
Access Restrictions
インターネット公開
Data Provider (Database)
科学技術振興機構 : J-STAGE