Research on Deep Learning for Artificially Controllable Image Synthesis

国立国会図書館永続的識別子: info:ndljp/pid/13342479

資料種別: 博士論文

著者: ZHANG, Zhiqiang

出版者: -

出版年: 2023-09-15

資料形態: デジタル

ページ数・大きさ等: -

授与大学名・学位: 法政大学 (Hosei University),博士(工学)

すべて見る

国立国会図書館での利用に関する注記

本資料は、掲載誌(URI)等のリンク先にある学位授与機関のWebサイトやCiNii Dissertationsから、本文を自由に閲覧できる場合があります。

資料に関する注記

一般注記：: type:ThesisImage synthesis has been one of the most important topics in computer vision research. In recent years, with the rise of artificial intelli...

書店で探す

障害者向け資料で読む

障害者向け資料を見る（1種類）

書店で探す

障害者向け資料で読む

他サービス
- テキストデータ国立国会図書館デジタルコレクションで確認する

書誌情報

この資料の詳細や典拠（同じ主題の資料を指すキーワード、著者名）等を確認できます。

デジタル

資料種別: 博士論文
タイトル: Research on Deep Learning for Artificially Controllable Image Synthesis
著者・編者: ZHANG, Zhiqiang
著者標目: ZHANG, Zhiqiang
出版年月日等: 2023-09-15
出版年（W3CDTF）: 2023-09-15
授与機関名: 法政大学 (Hosei University)
授与年月日: 2023-09-15
授与年月日（W3CDTF）: 2023-09-15
報告番号: 甲第586号
学位: 博士(工学)
本文の言語コード: eng
件名標目: Controllable Image Synthesis
Generative Adversarial Networks
Deep Learning
対象利用者: 一般
一般注記: type:Thesis
Image synthesis has been one of the most important topics in computer vision research. In recent years, with the rise of artificial intelligence, research in the image synthesis field has made many breakthroughs. Especially the introduction of deep learning has made the field advance by leaps and bounds, the most notable of which is generative adversarial networks (GAN). However, since the input received by GAN is randomly generated Gaussian distribution noise, this makes the image synthesis process artificially uncontrollable, resulting in poor practicability of the whole method. In order to solve this problem, in recent years, text-to-image synthesis (T2I) has been proposed and gained extensive attention. T2I generates corresponding images through simple, intuitive, and easy-to-enter text information. Due to the text information conforming to people’s input habits, this method can realize the artificially controllable image synthesis effect to a certain extent. Nevertheless, T2I still faces the following challenges: 1) The quality of image synthesis needs to be further improved. Quality is reflected in the realism of the synthesized content. The current T2I methods still produce poorly realistic image results, so the overall quality needs to be improved. 2) The controllability of the image synthesis process needs to be further improved. Controllability is reflected in the control degree over the synthetic content. By using text information, the current T2I methods can only control the basic content of the synthesized object but cannot control the shape, size, and position information of the synthesized object, so the overall controllability is insufficient. 3) The overall practicability of the image synthesis method needs to be further improved. Practicality is reflected in the application degree of the synthetic method. The current T2I methods can synthesize the corresponding image based on the input text, but it cannot continue to input new text to modify the content of the generated image, which makes the overall practicability of the current method insufficient. Facing the above challenges, this research is committed to realizing the artificial controllable image synthesis method in the whole process, which is divided into three parts: 1) Developing better T2I methods to achieve higher-quality image results; 2) Developing controllable image synthesis methods to improve the controllability of the synthesis process; 3) Based on the first and second parts, introduce the image manipulation method to achieve controllable image synthesis and manipulation with high quality, thereby further improving the practicability of the synthesis method. In the first part, we propose three methods to achieve higher-quality image synthesis results. The basic idea of the first method is to synthesize simple contour information at first, and then synthesize foreground content, and then synthesize the final image result; The basic idea of the second method is first to synthesize the foreground content based on the text information, and then synthesize the final image result based on the synthesized foreground and the input text information. The basic idea of the third method is to introduce additional image discrimination types into the GAN’s discriminator to improve its discriminative ability, and then better discriminant is fed back to the generator to improve the quality of the synthesis result. Extensive experimental results have proved that the three methods proposed above all achieve higher-quality image synthesis results. In the second part, we propose a more controllable approach to image synthesis. Specifically, text description and simple contour information are used to synthesize corresponding image results, where text description can control the synthesis content, contour information can control the basic shape and position of the synthesized object, and both text and contour information can be manually input. Therefore, using text and contour information to synthesize the corresponding image has better controllability. In this idea, we proposed two network structures. The first is to simply combine text and contour information, and then achieve corresponding image synthesis through the residual and upsampling operations. This method preliminarily achieves the effect of controllable image synthesis, but the overall quality of the synthesis is mediocre. Therefore, the second network structure is proposed. The core of the second structure is to introduce an attention mechanism to fine-tune the synthesis result to improve the quality of image synthesis. Experimental results demonstrate that our proposed second network structure achieves better controllable and higher-quality image synthesis results. In the third part, the core is to introduce the image manipulation method on top of the first and second parts to form the high practicality image synthesis methods. Therefore, we first propose a text-guided image manipulation (TGIM) method. The basic idea of this method is to design a sentence-aware and word-aware network structure to achieve better image manipulation effects. After that, by fusing the proposed text-guided image manipulation method and the image synthesis methods proposed in the first and second parts, we finally achieve the text-guided image synthesis and manipulation and text-guided controllable image synthesis and manipulation methods. The former allows the input text manually to synthesize the corresponding image, and then continue to input new text manually to modify the content of the synthesized image. The latter allows input text and simple contour information artificially to synthesize the corresponding image result, and then can artificially continue to input new text to modify the content of the previously synthesized image. From the experimental results, these two methods have achieved good practicability. In contrast, the second approach has better human controllability and practicability because it can control the basic content of image synthesis and the shape and position information of the synthetic object at the beginning.
DOI: 10.15002/00030035
https://doi.org/10.15002/00030035
国立国会図書館永続的識別子: info:ndljp/pid/13342479
https://dl.ndl.go.jp/pid/13342479
コレクション（共通）: 障害者向け資料
コレクション（障害者向け資料：レベル1）: テキストデータ
コレクション（個別）: 国立国会図書館デジタルコレクション > デジタル化資料 > 博士論文
https://dl.ndl.go.jp/collections/A00014
収集根拠: 博士論文（自動収集）
受理日（W3CDTF）: 2024-03-01T22:05:19+09:00
記録形式（IMT）: application/pdf
オンライン閲覧公開範囲: 国立国会図書館内限定公開
デジタル化資料送信: 図書館・個人送信対象外
遠隔複写可否（NDL）: 可
掲載誌（URI）: https://doi.org/10.15002/00030035
http://hdl.handle.net/10114/00030035
連携機関・データベース: 国立国会図書館 : 国立国会図書館デジタルコレクション
https://dl.ndl.go.jp