Framework for Experimental Information Extraction from Research Papers to Support Nanocrystal Device Development

国立国会図書館永続的識別子: info:ndljp/pid/9768017

資料種別: 博士論文

著者: Moustafa Dieb, Thaer

出版者: Hokkaido University

出版年: 2015-12-25

資料形態: デジタル

ページ数・大きさ等: -

授与大学名・学位: 北海道大学,博士(情報科学)

すべて見る

国立国会図書館での利用に関する注記

本資料は、掲載誌(URI)等のリンク先にある学位授与機関のWebサイトやCiNii Dissertationsから、本文を自由に閲覧できる場合があります。

資料に関する注記

一般注記：: Nanocrystal device development is a nanoscale research domain, where researchers producex nanocrystals for electronic and optoelectronic devices (e.g....

書店で探す

障害者向け資料で読む

障害者向け資料を見る（1種類）

全国の図書館の所蔵

国立国会図書館以外の全国の図書館の所蔵状況を表示します。

連携機関・データベースの一覧

所蔵のある図書館から取寄せることが可能かなど、資料の利用方法は、ご自身が利用されるお近くの図書館へご相談ください

その他

北海道大学学術成果コレクション
デジタル
連携先のサイトで、学術機関リポジトリデータベース（IRDB）（機関リポジトリ）が連携している機関・データベースの所蔵状況を確認できます。
北海道大学学術成果コレクションのサイトでこの本を確認

書店で探す

障害者向け資料で読む

他サービス
- テキストデータ国立国会図書館デジタルコレクションで確認する

書誌情報

この資料の詳細や典拠（同じ主題の資料を指すキーワード、著者名）等を確認できます。

デジタル

資料種別: 博士論文
タイトル: Framework for Experimental Information Extraction from Research Papers to Support Nanocrystal Device Development
著者・編者: Moustafa Dieb, Thaer
著者標目: Moustafa Dieb, Thaer
出版事項: Hokkaido University
出版年月日等: 2015-12-25
出版年（W3CDTF）: 2015-12-25
並列タイトル等: ナノ結晶デバイス開発支援のための論文からの実験情報抽出フレームワーク
寄与者: 吉岡, 真治
原口, 誠
有村, 博紀
授与機関名: 北海道大学
授与年月日: 2015-12-25
授与年月日（W3CDTF）: 2015-12-25
報告番号: 甲第12046号
学位: 博士(情報科学)
博論授与番号: 甲第12046号
本文の言語コード: eng
NDC: 500
対象利用者: 一般
一般注記: Nanocrystal device development is a nanoscale research domain, where researchers producex nanocrystals for electronic and optoelectronic devices (e.g., in solar cells, light-emitting devices, and memory component). This process requires both engineering knowledge andcraftsmanship skills. Since there is no well-systematized process to develop new nanocrystaldevices, researchers have to conduct several experiments before reaching the appropriatemanufacturing process to produce the desired output. In order to support this process,analysis of development experiments’ results is necessary. Such analysis can provide insights on experiment planning leading to a quicker and less costly development process. In this study, we discuss our approach to extract experimental information related to nanocrystal devices from research papers using machine-learning techniques based on an annotatedcorpus approach. We defined the necessary information and designed an annotation guideline in collaboration with a domain expert. We checked the reliability of this guideline through corpus construction experiments with graduate students of this domain, and then evaluated the corpus with a domain expert. The finalized corpus called "NaDev" (Nanocrystal DeviceDevelopment corpus) then has been used to build an automatic information extraction system called "NaDevEx" (Nanocrystal Device Automatic Information Extraction Framework) to automatically extract the desired information from research papers on nanocrystal devices using machine learning and natural language processing techniques.This thesis is divided into 6 chapters. Chapter 1 introduces the nanocrystal device development process and experiments, and discusses the motivation of the study. Chapter 2 overviews the efforts in nanoinformatics, where information technology is used to support nanoscale research. This chapter discusses other efforts for extracting information fromnanoscale research papers. We also review the information extraction from research papers inbioinformatics. In Chapter 3, we discuss in detail our methodology to construct the annotated corpus (NaDev). A tag set was designed in collaboration with a domain expert to annotate the desired information categories such as source material information, experimental parameters,evaluation parameters, final product, and so on. Preliminary annotation experiments were conducted with two graduate students of nanocrystal device development domain; the results of these experiments were used to build a corpus construction guideline that contains detailed definition of the desired information categories and how to annotate them with several realexamples to avoid mismatches between different annotators. The reliability of this guidelinewas checked with corpus construction experiments using inter-annotator agreement (IAA)between two different annotators. Even though the corpus construction guideline reached a reliable level with loose agreement (where two entities agrees on information categories but disagree on the boundary, in many cases we can find appropriate head nouns in loosematching terms), it was necessary to evaluate this corpus and finalize it with a domain expert to ensure reliability. The corpus was finalized as NaDev corpus, which includes 392 sentences, and 2870 terms annotated using eight information categories. In chapter 4, wediscuss the development of the automatic information extraction framework (NaDevEx) using machine-learning techniques. Since entities from different information categories are overlapped within each other in the nanocrystal device development domain, we use astep-by-step (cascading style) information extraction system. In each step, NaDevEx extracts a group of information categories that do not overlap within each other using tagging results from previous steps as clues for information extraction. We found that, for the information category with rich domain knowledge information (source material); the system performanceis almost not defeated by that of human annotators. NaDevEx also uses domain knowledge features like chemical entity recognition, and physical quantities list to support extraction of material information and parameter information respectively. The evaluation of NaDevEx using NaDev corpus is also discussed in detail regarding comparison with human annotators, paper type effect on the system performance, and domain knowledge features effect. Since there is a considerable amount of chemical entities exists in research papers related to nanocrystal devices, chemical named entity recognition is supportive for NaDevEx. We discuss in further detail a chemical named entity recognition system using ensemble-learning approach. In chapter 5, we present our preliminary efforts to utilize the information extracted to support nanocrystal device development. Finally, chapter 6 concludes the study anddiscusses future work.
(主査) 准教授吉岡真治, 教授原口誠, 教授有村博紀
情報科学研究科（コンピュータサイエンス専攻）
DOI: 10.14943/doctoral.k12046
https://doi.org/10.14943/doctoral.k12046
国立国会図書館永続的識別子: info:ndljp/pid/9768017
https://dl.ndl.go.jp/pid/9768017
コレクション（共通）: 障害者向け資料
コレクション（障害者向け資料：レベル1）: テキストデータ
コレクション（個別）: 国立国会図書館デジタルコレクション > デジタル化資料 > 博士論文
https://dl.ndl.go.jp/collections/A00014
収集根拠: 博士論文（自動収集）
受理日（W3CDTF）: 2016-02-01T21:19:47+09:00
作成日（W3CDTF）: 2015-11
記録形式（IMT）: PDF
application/pdf
オンライン閲覧公開範囲: 国立国会図書館内限定公開
デジタル化資料送信: 図書館・個人送信対象外
遠隔複写可否（NDL）: 可
掲載誌（URI）: http://dx.doi.org/10.14943/doctoral.k12046
http://hdl.handle.net/2115/60485
参照（URI）: http://hdl.handle.net/2115/60482
連携機関・データベース: 国立国会図書館 : 国立国会図書館デジタルコレクション
https://dl.ndl.go.jp

Framework for Experimental Information Extraction from Research Papers to Support Nanocrystal Device Development

書店で探す

障害者向け資料で読む

目次

全国の図書館の所蔵

書店で探す

障害者向け資料で読む

書誌情報

デジタル