Ontology Based Machine Translation for Bengali as Low-resourse Language

The cover of this title could differ from library to library.

Available in National Diet Library

国立国会図書館デジタルコレクション

Digital data available（The University of Electro-Communications）

Check on the publisher's website

Ontology Based Machine Translation for Bengali as Low-resourse Language

Persistent ID (NDL): info:ndljp/pid/10120591

Material type: 博士論文

Author: Khan Md. Anwarus Salam

Publisher: The University of Electro-Communications

Publication date: 2014-03-24

Material Format: Digital

Capacity, size, etc.: -

Name of awarding university/degree: 電気通信大学,博士（工学）

View All

Notes on use at the National Diet Library

本資料は、掲載誌(URI)等のリンク先にある学位授与機関のWebサイトやCiNii Dissertationsから、本文を自由に閲覧できる場合があります。

Notes on use

Note (General)：: In this research we propose ontology based Machine Translation with the help of WordNetand UNL Ontology. Example-Based Machine Translation (EBMT) for ...

Search by Bookstore

Read this material in an accessible format.

View materials in accessible formats for people with print disabilities (1Type)

Search by Bookstore

Read in Disability Resources

Other Services
- テキストデータ check it on 国立国会図書館デジタルコレクション

Bibliographic Record

You can check the details of this material, its authority (keywords that refer to materials on the same subject, author's name, etc.), etc.

Digital

Material Type: 博士論文
Title: Ontology Based Machine Translation for Bengali as Low-resourse Language
Author/Editor: Khan Md. Anwarus Salam
Publication, Distribution, etc.: The University of Electro-Communications
Publication Date: 2014-03-24
Publication Date (W3CDTF): 2014-03-24
Alternative Title: 低資源言語としてのベンガル語に対するオントロジーに基づく機械翻訳
Periodical title: 学位論文
Degree grantor/type: 電気通信大学
Date Granted: 2014-03-24
Date Granted (W3CDTF): 2014-03-24
Dissertation Number: 甲第737号
Degree Type: 博士（工学）
Conferring No. (Dissertation): 12612甲第737号
Text Language Code: eng
Note (General): In this research we propose ontology based Machine Translation with the help of WordNetand UNL Ontology. Example-Based Machine Translation (EBMT) for low resource language,like Bengali, has low-coverage issues. Due to the lack of parallel corpus, it has highprobability of handling unknown words. We have implemented an EBMT system for lowresourcelanguage pair. The EBMT architecture use chunk-string templates (CSTs) andunknown word translation mechanism. CSTs consist of a chunk in source-language, a stringin target-language, and word alignment information. CSTs are prepared automatically fromaligned parallel corpus and WordNet by using English chunker. For unknown wordtranslation, we used WordNet hypernym tree and English-Bengali dictionary. Proposedsystem first tries to find semantically related English words from WordNet for the unknownword. From these related words, we choose the semantically closest related word whoseBangla translation exists in English-Bangla dictionary. If no Bangla translation exists, thesystem uses IPA-based-transliteration. For proper nouns, the system uses Akkhortransliteration mechanism. CSTs improved the wide-coverage by 57 points and quality by48.81 points in human evaluation. Currently 64.29% of the test-set translations by the systemwere acceptable. The combined solutions of CSTs and unknown words generated 67.85%acceptable translations from the test-set. Unknown words mechanism improved translationquality by 3.56 points in human evaluation. This research also proposed the way to autogenerate the explanation of each concept using the semantic backgrounds provided by UNLOntology. These explanations are useful for improving translation quality of unknown words.Ontology Based Machine Translation for Bengali as Low-resource Language.本研究では、WordNet と UNL オントロジーを用いた、オントロジーに基づく機械翻訳を提案する。ベンガル語のような低資源言語 (low-resource language)に対しては、具体例に基づく機械翻訳 (EBMT)は、あまり有効ではない。パラレル・コーパスの欠如のために、多数の未知語を扱わなければならなくなるためである。我々は、低資源言語間の EBMT システムを実装した。実装したEBMT アーキテクチャでは、chunk-string templates (CSTs)と、未知語翻訳メカニズムを用いている。CST は、起点言語のチャンク、目的言語の文字列と、単語アラメント情報から成る。CST は、英語チャンカーを用いて、アラインメント済みのパラレル・コーパスとWordNet から、自動的に生成される。最初に、起点言語のチャンクが OpenNLP チャンカーを用いて自動生成される。そして、初期CST が、各起点言語のチャンクに対して生成され、すべての目的文に対するCSTアラインメントがパラレル・コーパスを用いて生成される。その後、システムは、単語アラインメント情報を用いて、CSTの組合せを生成する。最後に、WordNet を用いて、広い適用範囲を得るためにCST を一般化する。未知語翻訳に対しては、WordNet hypernym treeと、英語・ベンガル語辞書を用いる。提案システムは、最初に、未知語に対して、WordNet から意味的に関連した英単語を発見しようと試みる。これらの関連語から、英語・ベンガル語辞書にベンガル語の翻訳が存在する、意味的に最も近い語を選ぶ。もし、ベンガル語の翻訳が存在しなければ、システムはIPA-based翻訳を行う。固有名詞に対しては、システムは、Akkhor 翻訳メカニズムを用いる。CST は57 ポイントの広い適用範囲を持つように改善され、その際の人間による訳文の評価も 48.81 ポイントを得た。現在、システムのよって、64.29%のテストケースの翻訳が行える。未知語メカニズムは、人間に評価において 3.56 ポイント、翻訳の質を改善した。CST と未知語の組合せよる解法は、テストケースにおいて、67.85%の許容可能な翻訳を生成した。また、本研究では、UNL オントロジーが提供するsemantic background を用いて、各概念に対する説明を自動生成する方法も提案した。このシステムに対する入力は、１つのユニバーサル・ワード(UN)であり、システムの出力はその UN の英語や日本語による説明文である。与えられたUN に対して、システムは、最初に、SemanticWordMap を発見するが、それは、１つの特定のUN に対する、UNL オントロジーからのすべての直接的、間接的参照関係を含む。したがって、このステップの入力は、１つのUN であり、出力はWordMapグラフである。次のステップで、変換規則を用いて、WordMap グラフをUNL に変換する。この変換規則は、ユーザの要求に応じて、“From UWs only”や “From UNL Ontology”と指定できる。したがって、このステップの入力はWordMap グラフであり、出力はUNL表現である。最終ステップでは、UNL DeConverter を用いてUNL 表現を変換し、自然言語を用いて記述する。これらの表現は、未知語に対する翻訳の質の向上に有効であることがわかった。
開始ページ : 1
終了ページ : 82
Persistent ID (NDL): info:ndljp/pid/10120591
https://dl.ndl.go.jp/pid/10120591
Collection: 障害者向け資料
Collection (Materials For Handicapped People:1): テキストデータ
Collection (particular): 国立国会図書館デジタルコレクション > デジタル化資料 > 博士論文
https://dl.ndl.go.jp/collections/A00014
Acquisition Basis: 博士論文（自動収集）
Date Accepted (W3CDTF): 2016-07-07T04:28:02+09:00
Format (IMT): application/pdf
Access Restrictions: 国立国会図書館内限定公開
Service for the Digitized Contents Transmission Service: 図書館・個人送信対象外
Availability of remote photoduplication service: 可
Periodical Title (URI): http://hdl.handle.net/10480/9000000716
Data Provider (Database): 国立国会図書館 : 国立国会図書館デジタルコレクション
https://dl.ndl.go.jp