本文へ移動
博士論文

遺伝的アルゴリズムを用いた自然言語とその関連モデルの最適化手法の開発

博士論文を表すアイコン
表紙は所蔵館によって異なることがあります ヘルプページへのリンク

遺伝的アルゴリズムを用いた自然言語とその関連モデルの最適化手法の開発

国立国会図書館永続的識別子
info:ndljp/pid/11164204
資料種別
博士論文
著者
PAWEL, CEZARY LEMPA
出版者
-
授与年月日
2018-09-10
資料形態
デジタル
ページ数・大きさ等
-
授与機関名・学位
北見工業大学,博士(工学)
詳細を見る

国立国会図書館での利用に関する注記

本資料は、掲載誌(URI)等のリンク先にある学位授与機関のWebサイトやCiNii Research外部サイトから、本文を自由に閲覧できる場合があります。

資料詳細

要約等:

Language models are an indispensable element of Natural Language Processing (NLP) research. They are used in machine translation, speech recognition, ...

書店で探す

障害者向け資料で読む

目次

提供元:国立国会図書館デジタルコレクションヘルプページへのリンク
  • 2021-12-07 再収集

全国の図書館の所蔵

国立国会図書館以外の全国の図書館の所蔵状況を表示します。

所蔵のある図書館から取寄せることが可能かなど、資料の利用方法は、ご自身が利用されるお近くの図書館へご相談ください

その他

  • 北見工業大学学術機関リポジトリ KIT-R

    デジタル
    連携先のサイトで、学術機関リポジトリデータベース(IRDB)(機関リポジトリ)が連携している機関・データベースの所蔵状況を確認できます。

書誌情報

この資料の詳細や典拠(同じ主題の資料を指すキーワード、著者名)等を確認できます。

デジタル

資料種別
博士論文
著者・編者
PAWEL, CEZARY LEMPA
出版年月日等
2018-09
出版年(W3CDTF)
2018-09
並列タイトル等
Development of Optimization Method with the Use of Genetic Algorithms for Natural Language and Related Models
掲載ページ
1-
授与機関名
北見工業大学
授与年月日
2018-09-10
授与年月日(W3CDTF)
2018-09-10
報告番号
10106甲第170号
学位
博士(工学)
博論授与番号
10106甲第170号
本文の言語コード
eng
対象利用者
一般
国立国会図書館永続的識別子
info:ndljp/pid/11164204
コレクション(共通)
コレクション(障害者向け資料:レベル1)
コレクション(個別)
国立国会図書館デジタルコレクション > デジタル化資料 > 博士論文
収集根拠
博士論文(自動収集)
受理日(W3CDTF)
2018-10-03T17:18:55+09:00
記録形式(IMT)
application/pdf
オンライン閲覧公開範囲
国立国会図書館内限定公開
デジタル化資料送信
図書館・個人送信対象外
遠隔複写可否(NDL)
連携機関・データベース
国立国会図書館 : 国立国会図書館デジタルコレクション

デジタル

要約等
Language models are an indispensable element of Natural Language Processing (NLP) research. They are used in machine translation, speech recognition, part-of-speech tagging, handwriting recognition, syntactic parsing, information retrieval and others. In short, language models are probability distributions over sequences of words. There are countless numbers of NLP solutions, algorithms and programs applying language models in specific tasks. Unfortunately, often these are not optimized, but rely on default, most commonly used sets of parameters. For example, many of them use numerous objective functions with different variables but without proper weights applied to them. Users usually set these variables themselves, which causes the results not to exceed a certain mediocre level. In case of small number of variables, users can adjust them manually, but optimization of objective functions with massive number of variables, especially multi-objective functions is difficult and time consuming. This was the motivation to propose an application of a Genetic Algorithms (GAs) to optimize the weighting process. GAs are subset of Evolutionary Algorithms (EAs), inspired by the process of natural selection known from nature. They use bio-inspired operators such as selection, crossover and mutation to generate solutions for optimization and search problems. This way GAs represent randomized heuristic search strategies simulating natural selection process, where the population is composed of candidate solutions. They are focused on evolving a population from which strong and diverse candidates can emerge via mutation and crossover (mating). There exist different types of GAs, moreover the same type of GA can bring different quality of solutions, depending on multiple variables, which include starting population, number of generations or fitness function. Finding the best starting parameters and type of GA the most appropriate for a given optimization problem is a next challenge. For that reason, I created a library that automatically applies multiple types of GAs in optimization purposes. The library was created in C++ language, with the use of .NET environment. Its main goal is to be used with different secondary programs and applications, without significant interfering in the original structure of the solution. Basic function of library allows the use of several different kinds of GAs like: Simple GA, Uniform Crossover GA, n-point Crossover GA, GA with sexual selection, GA with chromosome aging and so forth. User can freely define starting parameters for GA including: population size, starting population, number of generations, type of mutation and crossover. Advanced functions of the library allow the use of multithreaded processing for running several GAs in the same time. Basic option of multithreading runs the same type of GA with different starting parameters, advanced version allows to exchange information between different threads every set number of generations. In case of large number of variables to compute, it is also possible to separate a mutation and crossover for several threads running at the same time. The most important functionality of the library is its easy adjustability in optimization of different kinds of applications. The library is used to run the original program in every generation of GA with new weights for variables generated from natural selection. Time of program running is closely related with original program processing time. It depends on the type of original solution and the time of processing one generation is similar to one run of the optimized program. During creating and testing the library, numerous experiments have been carried out. In preliminary experiments the library was used for optimization of construction of mechanical elements. Later the application was tested on natural language processing and related solutions. One part of the research was optimizing Quantitative Learner’s Motivation Model. The goal of this experiment was to optimize the formula for prediction of learning motivation by means of different weights for three values: interest, usefulness in the future and satisfaction. For this optimization, an application in C# using GA library was created. Data sets for the experiments were acquired from questionnaires enquiring about the above three elements in actual university classes. The results of the experiment showed improvement in the estimation of student’s learning motivation up to over 17 percentage points of Fscore. The final experiment aimed to optimize the implementation of Support Vector Ma-chines (SVMs) for the problem of pattern recognition in natural language data. SVMs are a machine learning algorithm based on statistical learning theory. They are applied to large number of real-world applications, such as text categorization, hand-written character recognition, etc. Original program was created in C++. For this application numerous different types of GAs were tested with different number of generations, weight range and starting parameters. Optimization was successful, with different scale of improvement based on previously mentioned conditions, with the highest achieved improvement of over 6 percentage points of recall comparing to baseline and reaching 78%. All experiments data are included in this work.
記録形式(IMT)
application/pdf
一次資料へのリンクURL
Doctoral_Thesis_2018_Pawel_Cezary_Lempa (fulltext)
オンライン閲覧公開範囲
インターネット公開
連携機関・データベース
国立情報学研究所 : 学術機関リポジトリデータベース(IRDB)(機関リポジトリ)
提供元機関・データベース
北見工業大学 : 北見工業大学学術機関リポジトリ KIT-R