大規模計算システム向け自動故障解析

資料種別: 博士論文

著者: 丸山, 直也ほか

出版者: -

出版年: 2008-03

資料形態: デジタル

ページ数・大きさ等: -

授与大学名・学位: 東京工業大学,博士（理学）

すべて見る

資料に関する注記

一般注記：: This dissertation presents two scalable, automated approaches to simplifying fault localization in large-scale computing systems that view localizatio...

書店で探す

全国の図書館の所蔵

国立国会図書館以外の全国の図書館の所蔵状況を表示します。

連携機関・データベースの一覧

所蔵のある図書館から取寄せることが可能かなど、資料の利用方法は、ご自身が利用されるお近くの図書館へご相談ください

その他

東京工業大学リサーチリポジトリ
デジタル
連携先のサイトで、学術機関リポジトリデータベース（IRDB）（機関リポジトリ）が連携している機関・データベースの所蔵状況を確認できます。
東京工業大学リサーチリポジトリのサイトでこの本を確認

書店で探す

書誌情報

この資料の詳細や典拠（同じ主題の資料を指すキーワード、著者名）等を確認できます。

デジタル

資料種別: 博士論文
タイトル: 大規模計算システム向け自動故障解析
著者・編者: 丸山, 直也
Maruyama, Naoya
著者標目: 丸山, 直也
Maruyama, Naoya
出版年月日等: 2008-03
出版年（W3CDTF）: 2008-03
並列タイトル等: Automated Fault Localization in Large-Scale Computing Systems
授与機関名: 東京工業大学
授与年月日: 2008-03-26
報告番号: A7145
学位: 博士（理学）
本文の言語コード: eng
対象利用者: 一般
一般注記: This dissertation presents two scalable, automated approaches to simplifying fault localization in large-scale computing systems that view localization as anomaly detection in system behaviors. Both approaches always capture system behaviors by obtaining function call traces, and identify anomalous behaviors through automatic data analysis of the collected traces. To find anomalies in scalably and automatically, they assume processes in typical distributed software systems have behavioral similarities, and find violations in the assumed similarities as anomalies. The first approach, outlier-detection-based localization, localizes faults by assuming that the target system consists of distributed processes with similar behaviors. Specifically, once a failure occurs, it identifies anomalous processes and functions by comparing the failure traces and finding outliers among them. Traces are compared by using their function-execution times. By finding outliers based on these times, this approach can localize faults such as performance bugs, deadlocks, and livelocks.The second approach, model-based localization, localizes faults by assuming that all processes exhibit similar behaviors to those observed in the past. By using traces collected during normal operations, it derives an execution model that estimates the call probability of each function. Once a failure occurs, it finds anomalous processes and function calls by comparing the failure traces against the derived model. We consider the following cases anomalous when: 1) high-probability functions are not called, and 2) low-probability functions are called. This approach is especially effective in localizing program logic bugs by finding these functions.Experimental studies done on real-world large-scale environments indicate the effectiveness of the proposed techniques. Our outlier-detection-based localization almost automatically found the causes of several nondeterministic failures in a distributed cluster middleware running on a 129-node production cluster. The model-based localization also substantially simplified the localization process of a failure that occurred in a three-site, 78-node Grid environment.
identifier:oai:t2r2.star.titech.ac.jp:50091984
記録形式（IMT）: application/pdf
一次資料へのリンクURL: fulltext
http://t2r2.star.titech.ac.jp/rrws/file/CTT100597095/ATD100000413/300064507.pdf
オンライン閲覧公開範囲: インターネット公開
連携機関・データベース: 国立情報学研究所 : 学術機関リポジトリデータベース（IRDB）（機関リポジトリ）
https://irdb.nii.ac.jp
提供元機関・データベース: 東京工業大学 : 東京工業大学リサーチリポジトリ