素性密度及びクロスリンガルゼロショット転移学習による多言語のネットいじめ自動検出の改良に関する研究

国立国会図書館永続的識別子: info:ndljp/pid/12361370

資料種別: 博士論文

著者: Eronen Juuso Kalevi Kristian

出版者: -

出版年: 2022-09

資料形態: デジタル

ページ数・大きさ等: -

授与大学名・学位: 北見工業大学,博士（工学）

すべて見る

国立国会図書館での利用に関する注記

本資料は、掲載誌(URI)等のリンク先にある学位授与機関のWebサイトやCiNii Dissertationsから、本文を自由に閲覧できる場合があります。

資料に関する注記

一般注記：: In this thesis, I study two different methods for improving multilingual automatic cyberbullyingdetection. First, I study the effectiveness of Feature...

書店で探す

障害者向け資料で読む

障害者向け資料を見る（1種類）

全国の図書館の所蔵

国立国会図書館以外の全国の図書館の所蔵状況を表示します。

連携機関・データベースの一覧

所蔵のある図書館から取寄せることが可能かなど、資料の利用方法は、ご自身が利用されるお近くの図書館へご相談ください

その他

北見工業大学学術機関リポジトリ　KIT-R
デジタル
連携先のサイトで、学術機関リポジトリデータベース（IRDB）（機関リポジトリ）が連携している機関・データベースの所蔵状況を確認できます。
北見工業大学学術機関リポジトリ　KIT-Rのサイトでこの本を確認

書店で探す

障害者向け資料で読む

他サービス
- テキストデータ国立国会図書館デジタルコレクションで確認する

書誌情報

この資料の詳細や典拠（同じ主題の資料を指すキーワード、著者名）等を確認できます。

デジタル

資料種別: 博士論文
タイトル: 素性密度及びクロスリンガルゼロショット転移学習による多言語のネットいじめ自動検出の改良に関する研究
著者・編者: Eronen Juuso Kalevi Kristian
著者標目: Eronen Juuso Kalevi Kristian
出版年月日等: 2022-09
出版年（W3CDTF）: 2022-09
並列タイトル等: Improving Multilingual Automatic Cyberbullying Detection With Feature Density And Cross-lingual Zero-shot Transfer
授与機関名: 北見工業大学
授与年月日: 2022-09-06
授与年月日（W3CDTF）: 2022-09-06
報告番号: 甲第203号
学位: 博士（工学）
博論授与番号: 甲第203号
本文の言語コード: eng
著者別名: エロネン　ユーソ　カレビ　クリスティアン
対象利用者: 一般
一般注記: In this thesis, I study two different methods for improving multilingual automatic cyberbullyingdetection. First, I study the effectiveness of Feature Density (FD) using different linguisticallybackedfeature preprocessing methods in order to estimate dataset complexity, which in turn isused to comparatively estimate the potential performance of machine learning (ML) classifiersprior to any training. I hypothesize that estimating dataset complexity allows for the reductionof the number of required experiments iterations, making it possible to optimize the resourceintensivetraining of ML models which is becoming a serious issue due to the increases in availabledataset sizes and the ever rising popularity of models based on Deep Neural Networks (DNN).The problem of constantly increasing needs for more powerful computational resources is alsoaffecting the environment due to alarmingly-growing amount of CO2 emissions caused by trainingof large-scale ML models. I use cyberbullying datasets collected for multiple languages, namelyEnglish, Japanese and Polish. The difference in linguistic complexity of datasets allows me toadditionally discuss the efficacy of linguistically-backed word preprocessing.Second, I study the selection of transfer languages for automatic abusive language detection.I demonstrate the effectiveness of cross-lingual transfer learning for zero-shot abusive languagedetection. This way it is possible to use existing data from higher-resource languages to buildbetter detection systems for languages lacking data. The datasets are from eight different languagesfrom three language families. I measure the distance between the languages using several languagesimilarity measures, especially by quantifying the World Atlas of Language Structures. I showthat there is a correlation between linguistic similarity and classifier performance, making itpossible to choose an optimal transfer language for zero shot abusive language detection.Next, I demonstrate that this method is also generally applicable to multiple Natural LanguageProcessing tasks, specifically sentiment analysis, named entity recognition and dependency parsing.I show that there is also a correlation between linguistic similarity and zero-shot cross-lingualtransfer performance for these tasks, allowing me to select an ideal transfer language in order toaid with the problem of dealing with languages that do not currently have a sufficient amountof data. Lastly, I show that the World Atlas of Language Structures can be quantified into aneffective linguistic similarity method.
DOI: 10.19000/0002000332
https://doi.org/10.19000/0002000332
国立国会図書館永続的識別子: info:ndljp/pid/12361370
https://dl.ndl.go.jp/pid/12361370
コレクション（共通）: 障害者向け資料
コレクション（障害者向け資料：レベル1）: テキストデータ
コレクション（個別）: 国立国会図書館デジタルコレクション > デジタル化資料 > 博士論文
https://dl.ndl.go.jp/collections/A00014
収集根拠: 博士論文（自動収集）
受理日（W3CDTF）: 2022-11-07T16:56:35+09:00
記録形式（IMT）: PDF
オンライン閲覧公開範囲: 国立国会図書館内限定公開
デジタル化資料送信: 図書館・個人送信対象外
遠隔複写可否（NDL）: 可
掲載誌（URI）: http://dx.doi.org/10.19000/0002000332
https://kitami-it.repo.nii.ac.jp/records/2000332
連携機関・データベース: 国立国会図書館 : 国立国会図書館デジタルコレクション
https://dl.ndl.go.jp