Hallucination Detection on Code Generation with SelfCheckGPT

資料種別: 記事

著者: Ito Wakaほか

出版者: Information Processing Society of Japan

出版年: 2025

資料形態: デジタル

掲載誌名: Journal of Information Processing 33 0

掲載ページ: p.487-493

詳細を見る

資料詳細

要約等：: <p>Large language models (LLMs) are expected to bring automation and efficiency to software development, including programming. However, an LLM encoun...

全国の図書館の所蔵

国立国会図書館以外の全国の図書館の所蔵状況を表示します。

連携機関・データベースの一覧

所蔵のある図書館から取寄せることが可能かなど、資料の利用方法は、ご自身が利用されるお近くの図書館へご相談ください

その他

J-STAGE
デジタル
J-STAGEのサイトでこの本を確認
CiNii Research
検索サービス
デジタル
連携先のサイトで、CiNii Researchが連携している機関・データベースの所蔵状況を確認できます。
この本の所蔵を確認

書誌情報

この資料の詳細や典拠（同じ主題の資料を指すキーワード、著者名）等を確認できます。

デジタル

資料種別: 記事
タイトル: Hallucination Detection on Code Generation with SelfCheckGPT
著者標目: Ito Waka
Sato Miyu
Obara Yui
Kuramitsu Kimio
出版年月日等: 2025
出版年（W3CDTF）: 2025
タイトル（掲載誌）: Journal of Information Processing
巻号年月日等（掲載誌）: 33 0
掲載巻: 33
掲載号: 0
掲載ページ: 487-493
掲載年月日（W3CDTF）: 2025
出版事項（掲載誌）: Information Processing Society of Japan
本文の言語コード: en
件名標目: LLMs
generative AI
code generation
hallucination
evaluation metrics
対象利用者: 一般
DOI: 10.2197/ipsjjip.33.487
https://doi.org/10.2197/ipsjjip.33.487
関連情報（URI）: https://www.jstage.jst.go.jp/article/ipsjjip/33/0/33_487/_pdf
参照: DeepBugs: a learning approach to name-based bug detection
https://cir.nii.ac.jp/crid/1360011145753572480
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
https://cir.nii.ac.jp/crid/1360020701022781952
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
https://cir.nii.ac.jp/crid/1360022501345155968
Out of the BLEU: How should we assess quality of the Code Generation models?
https://cir.nii.ac.jp/crid/1360024022340832128
LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation
https://cir.nii.ac.jp/crid/1360024025226188928
IntelliCode compose: code generation using transformer
https://cir.nii.ac.jp/crid/1360298345090256128
CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code
https://cir.nii.ac.jp/crid/1360305497316018304
Developer Testing in the IDE: Patterns, Beliefs, and Behavior
https://cir.nii.ac.jp/crid/1360305497604995456
Survey of Hallucination in Natural Language Generation
https://cir.nii.ac.jp/crid/1360579820494762752
CodeJudge: Evaluating Code Generation with Large Language Models
https://cir.nii.ac.jp/crid/1360586971786245248
Using LLMs in Software Requirements Specifications: An Empirical Evaluation
https://cir.nii.ac.jp/crid/1360586972544462208
Advancing Requirements Engineering Through Generative AI: Assessing the Role of LLMs
https://cir.nii.ac.jp/crid/1360868448240871680
A Normalized Levenshtein Distance Metric
https://cir.nii.ac.jp/crid/1361699995767595392
Texygen
https://cir.nii.ac.jp/crid/1362544418386190976
BLEU
https://cir.nii.ac.jp/crid/1364233270606638080
連携機関・データベース: 国立情報学研究所 : CiNii Research
https://cir.nii.ac.jp/
提供元機関・データベース: Japan Link Center
https://japanlinkcenter.org/top
Crossref
https://www.crossref.org

デジタル

要約等: <p>Large language models (LLMs) are expected to bring automation and efficiency to software development, including programming. However, an LLM encounters a challenge known as “hallucination, ” where it produces incorrect content or outputs that deviate from input requirements. SelfCheckGPT is one of the methods designed to detect hallucinations. Its key feature lies in its ability to infer the occurrence of hallucinations without requiring reference data or test cases. Although SelfCheckGPT has been evaluated and applied in natural language processing tasks such as text summarization and question answering, its performance in code generation has not yet been explored. In this study, we applied SelfCheckGPT to the HumanEval dataset, a standard benchmark for code generation, and investigated its evaluation performance by comparing it with execution-based evaluations. The results revealed that calculating similarity using BLEU, ROUGE-L, and EditSim is adequate for predicting the correctness of code or, in other words, hallucinations.</p>
DOI: 10.2197/ipsjjip.33.487
オンライン閲覧公開範囲: インターネット公開
連携機関・データベース: 科学技術振興機構 : J-STAGE
http://www.jstage.jst.go.jp

少なく表示する