勾配ブースティングの高度化と応用に関する研究

資料種別: 規格・テクニカルリポート類

著者: 鈴木, 秀男

出版者: 慶應義塾大学

出版年: 2021

資料形態: デジタル

ページ数・大きさ等: -

NDC: -

すべて見る

資料に関する注記

一般注記：: 出版タイプ： VoRtype:text本研究では、勾配ブースティングの予測精度および計算効率の向上を目指して、正則化項を考慮した損失関数、最適化に関する効率的なアルゴリズム（Momentum SGD(Stochastic Gradient Boosting)の考え方の導入）を検討した。正則化は、作成...

書店で探す

全国の図書館の所蔵

国立国会図書館以外の全国の図書館の所蔵状況を表示します。

連携機関・データベースの一覧

所蔵のある図書館から取寄せることが可能かなど、資料の利用方法は、ご自身が利用されるお近くの図書館へご相談ください

その他

慶應義塾大学学術情報リポジトリ
デジタル
連携先のサイトで、学術機関リポジトリデータベース（IRDB）（機関リポジトリ）が連携している機関・データベースの所蔵状況を確認できます。
慶應義塾大学学術情報リポジトリのサイトでこの本を確認

書店で探す

書誌情報

この資料の詳細や典拠（同じ主題の資料を指すキーワード、著者名）等を確認できます。

デジタル

資料種別: 規格・テクニカルリポート類
タイトル: 勾配ブースティングの高度化と応用に関する研究
著者・編者: 鈴木, 秀男
著者標目: 鈴木, 秀男
出版事項: 慶應義塾大学
出版年月日等: 2021
出版年（W3CDTF）: 2021
並列タイトル等: コウバイブースティングノコウドカトオウヨウニカンスルケンキュウ
Kōbai būsutingu no kōdoka to ōyō ni kansuru kenkyū
A study on the advancement and application of gradient boosting
タイトル（掲載誌）: 学事振興資金研究成果実績報告書
対象利用者: 一般
一般注記: 出版タイプ： VoR
type:text
本研究では、勾配ブースティングの予測精度および計算効率の向上を目指して、正則化項を考慮した損失関数、最適化に関する効率的なアルゴリズム（Momentum SGD(Stochastic Gradient Boosting)の考え方の導入）を検討した。正則化は、作成するモデルの自由度に制限を加えることにより過学習を抑える効果がある。正則化項（または罰則項）を作成し、その値をモデルの損失と同時に扱う。正則化項は数値で計算され、予測値と実測値との残差をから得られる損失関数に加えられる。本研究では、決定木の全ての葉のスコアにおけるL1正則化項、L2正則化項を用いた。また、従来のSGDでは、学習データをシャッフルし、その中からランダムに1つを取り出して誤差を計算し、勾配法により損失関数が小さくなるようにパラメータを更新する。SGDの問題点として、収束が遅く、振動や鞍点に陥ることがある。SDGを改良したMomentum SGDでは、１期前の勾配情報を用いることで振動を抑制して従来のSDGの問題を緩和している。勾配ブースティングに対する正則化およびMomentumの効果を検証するために、UCI Machine Learning Repository のいくつのデータセットに対する予測精度と計算効率の指標の測定を行い、従来のSGD、SGD（正則化）、SGD(Momentum)、SGD（正則化＋Momentum）の比較検討を行った。その結果、おおむね、予測精度および計算効率の観点から、SDG（正則化＋Momentum）が良いことが示された。正則化とMomentumの相乗効果があることが示唆される。また、推薦システムに対する向スコアマッチングを用いたアプローチを提案し、既存手法となる協調フィルタリングとの比較を行うことで提案手法の優位性を示した。傾向スコアマッチングと勾配ブースティングの融合が今後の課題である。 In this study, aiming to improve the prediction accuracy and calculation efficiency of gradient boosting, we examine the loss function considering the regularization term and the efficient algorithm for optimization (use of the concept of Momentum SGD (Stochastic Gradient Boosting)). Regularization has the effect of suppressing overfitting by constraining the degrees of freedom of the constructed model. The regularization term (or penalty term) is created, and its value is treated as the model loss. The regularization term is calculated numerically and added to the loss function obtained from the residual between the predicted and measured values. In this study, we use the L1 regularization term and the L2 regularization term in the scores of all leaves of the decision tree. In the conventional SGD, the training data is shuffled, one is randomly extracted from it, the error is calculated, and the parameters are updated so as to reduce the loss function by using the gradient method. The problem with SGD is that it converges slowly, causing vibrations and saddle points. The Momentum SGD, which is an improved version of the SGD, suppresses vibration by using the gradient information from the previous period, which enables us to alleviate the problems of the conventional SGD. In order to verify the effect of regularization and Momentum on gradient boosting, we measure the predictive accuracy and calculation efficiency indicators for several datasets of the UCI Machine Learning Repository, and compare those of the conventional SGD, SGD (regularization), SGD (Momentum) and SGD (regularization + Momentum).The result shows that SGD (regularization + Momentum) is generally good in terms of prediction accuracy and calculation efficiency. It suggests that there is a synergistic effect of regularization and Momentum. In addition, we propose an approach using the propensity score matching to recommender systems, and show the superiority of the proposed method by comparing it with collaborative filtering, which is regarded as an existing method. The examination for combining the propensity score matching with the gradient boosting is a future study.
連携機関・データベース: 国立情報学研究所 : 学術機関リポジトリデータベース（IRDB）（機関リポジトリ）
https://irdb.nii.ac.jp
提供元機関・データベース: 慶應義塾大学 : 慶應義塾大学学術情報リポジトリ