Hallucination Detection on Code Generation with SelfCheckGPT
デジタルデータあり(科学技術振興機構)
すぐに読む
J-STAGE
全国の図書館の所蔵
国立国会図書館以外の全国の図書館の所蔵状況を表示します。
所蔵のある図書館から取寄せることが可能かなど、資料の利用方法は、ご自身が利用されるお近くの図書館へご相談ください
その他
J-STAGE
デジタルCiNii Research
検索サービスデジタル連携先のサイトで、CiNii Researchが連携している機関・データベースの所蔵状況を確認できます。
書誌情報
この資料の詳細や典拠(同じ主題の資料を指すキーワード、著者名)等を確認できます。
- 資料種別
- 記事
- 出版年月日等
- 2025
- 出版年(W3CDTF)
- 2025
- タイトル(掲載誌)
- Journal of Information Processing
- 巻号年月日等(掲載誌)
- 33 0
- 掲載巻
- 33
- 掲載号
- 0
- 掲載ページ
- 487-493
- 掲載年月日(W3CDTF)
- 2025
- 出版事項(掲載誌)
- Information Processing Society of Japan
- 本文の言語コード
- en
- 対象利用者
- 一般
- DOI
- 10.2197/ipsjjip.33.487
- 参照
- DeepBugs: a learning approach to name-based bug detectionCodeBERT: A Pre-Trained Model for Programming and Natural LanguagesSelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language ModelsOut of the BLEU: How should we assess quality of the Code Generation models?LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and MitigationIntelliCode compose: code generation using transformerCodeBERTScore: Evaluating Code Generation with Pretrained Models of CodeDeveloper Testing in the IDE: Patterns, Beliefs, and BehaviorSurvey of Hallucination in Natural Language GenerationCodeJudge: Evaluating Code Generation with Large Language ModelsUsing LLMs in Software Requirements Specifications: An Empirical EvaluationAdvancing Requirements Engineering Through Generative AI: Assessing the Role of LLMsA Normalized Levenshtein Distance MetricTexygenBLEU
- 連携機関・データベース
- 国立情報学研究所 : CiNii Research
- 提供元機関・データベース
- Japan Link CenterCrossref
- 要約等
- <p>Large language models (LLMs) are expected to bring automation and efficiency to software development, including programming. However, an LLM encounters a challenge known as “hallucination, ” where it produces incorrect content or outputs that deviate from input requirements. SelfCheckGPT is one of the methods designed to detect hallucinations. Its key feature lies in its ability to infer the occurrence of hallucinations without requiring reference data or test cases. Although SelfCheckGPT has been evaluated and applied in natural language processing tasks such as text summarization and question answering, its performance in code generation has not yet been explored. In this study, we applied SelfCheckGPT to the HumanEval dataset, a standard benchmark for code generation, and investigated its evaluation performance by comparing it with execution-based evaluations. The results revealed that calculating similarity using BLEU, ROUGE-L, and EditSim is adequate for predicting the correctness of code or, in other words, hallucinations.</p>
- DOI
- 10.2197/ipsjjip.33.487
- オンライン閲覧公開範囲
- インターネット公開
- 連携機関・データベース
- 科学技術振興機構 : J-STAGE