博士論文
Available in National Diet Library
Find on the publisher's website
国立国会図書館デジタルコレクション
Digital data available
Check on the publisher's website
DOI[10.15002/00030035]to the data of the same series
Research on Deep Learning for Artificially Controllable Image Synthesis
- Persistent ID (NDL)
- info:ndljp/pid/13342479
- Material type
- 博士論文
- Author
- ZHANG, Zhiqiang
- Publisher
- -
- Publication date
- 2023-09-15
- Material Format
- Digital
- Capacity, size, etc.
- -
- Name of awarding university/degree
- 法政大学 (Hosei University),博士(工学)
Notes on use at the National Diet Library
本資料は、掲載誌(URI)等のリンク先にある学位授与機関のWebサイトやCiNii Dissertationsから、本文を自由に閲覧できる場合があります。
Notes on use
Note (General):
- type:ThesisImage synthesis has been one of the most important topics in computer vision research. In recent years, with the rise of artificial intelli...
Search by Bookstore
Read this material in an accessible format.
Search by Bookstore
Read in Disability Resources
Bibliographic Record
You can check the details of this material, its authority (keywords that refer to materials on the same subject, author's name, etc.), etc.
Digital
- Material Type
- 博士論文
- Author/Editor
- ZHANG, Zhiqiang
- Author Heading
- Publication Date
- 2023-09-15
- Publication Date (W3CDTF)
- 2023-09-15
- Degree grantor/type
- 法政大学 (Hosei University)
- Date Granted
- 2023-09-15
- Date Granted (W3CDTF)
- 2023-09-15
- Dissertation Number
- 甲第586号
- Degree Type
- 博士(工学)
- Text Language Code
- eng
- Target Audience
- 一般
- Note (General)
- type:ThesisImage synthesis has been one of the most important topics in computer vision research. In recent years, with the rise of artificial intelligence, research in the image synthesis field has made many breakthroughs. Especially the introduction of deep learning has made the field advance by leaps and bounds, the most notable of which is generative adversarial networks (GAN). However, since the input received by GAN is randomly generated Gaussian distribution noise, this makes the image synthesis process artificially uncontrollable, resulting in poor practicability of the whole method. In order to solve this problem, in recent years, text-to-image synthesis (T2I) has been proposed and gained extensive attention. T2I generates corresponding images through simple, intuitive, and easy-to-enter text information. Due to the text information conforming to people’s input habits, this method can realize the artificially controllable image synthesis effect to a certain extent. Nevertheless, T2I still faces the following challenges: 1) The quality of image synthesis needs to be further improved. Quality is reflected in the realism of the synthesized content. The current T2I methods still produce poorly realistic image results, so the overall quality needs to be improved. 2) The controllability of the image synthesis process needs to be further improved. Controllability is reflected in the control degree over the synthetic content. By using text information, the current T2I methods can only control the basic content of the synthesized object but cannot control the shape, size, and position information of the synthesized object, so the overall controllability is insufficient. 3) The overall practicability of the image synthesis method needs to be further improved. Practicality is reflected in the application degree of the synthetic method. The current T2I methods can synthesize the corresponding image based on the input text, but it cannot continue to input new text to modify the content of the generated image, which makes the overall practicability of the current method insufficient. Facing the above challenges, this research is committed to realizing the artificial controllable image synthesis method in the whole process, which is divided into three parts: 1) Developing better T2I methods to achieve higher-quality image results; 2) Developing controllable image synthesis methods to improve the controllability of the synthesis process; 3) Based on the first and second parts, introduce the image manipulation method to achieve controllable image synthesis and manipulation with high quality, thereby further improving the practicability of the synthesis method. In the first part, we propose three methods to achieve higher-quality image synthesis results. The basic idea of the first method is to synthesize simple contour information at first, and then synthesize foreground content, and then synthesize the final image result; The basic idea of the second method is first to synthesize the foreground content based on the text information, and then synthesize the final image result based on the synthesized foreground and the input text information. The basic idea of the third method is to introduce additional image discrimination types into the GAN’s discriminator to improve its discriminative ability, and then better discriminant is fed back to the generator to improve the quality of the synthesis result. Extensive experimental results have proved that the three methods proposed above all achieve higher-quality image synthesis results. In the second part, we propose a more controllable approach to image synthesis. Specifically, text description and simple contour information are used to synthesize corresponding image results, where text description can control the synthesis content, contour information can control the basic shape and position of the synthesized object, and both text and contour information can be manually input. Therefore, using text and contour information to synthesize the corresponding image has better controllability. In this idea, we proposed two network structures. The first is to simply combine text and contour information, and then achieve corresponding image synthesis through the residual and upsampling operations. This method preliminarily achieves the effect of controllable image synthesis, but the overall quality of the synthesis is mediocre. Therefore, the second network structure is proposed. The core of the second structure is to introduce an attention mechanism to fine-tune the synthesis result to improve the quality of image synthesis. Experimental results demonstrate that our proposed second network structure achieves better controllable and higher-quality image synthesis results. In the third part, the core is to introduce the image manipulation method on top of the first and second parts to form the high practicality image synthesis methods. Therefore, we first propose a text-guided image manipulation (TGIM) method. The basic idea of this method is to design a sentence-aware and word-aware network structure to achieve better image manipulation effects. After that, by fusing the proposed text-guided image manipulation method and the image synthesis methods proposed in the first and second parts, we finally achieve the text-guided image synthesis and manipulation and text-guided controllable image synthesis and manipulation methods. The former allows the input text manually to synthesize the corresponding image, and then continue to input new text manually to modify the content of the synthesized image. The latter allows input text and simple contour information artificially to synthesize the corresponding image result, and then can artificially continue to input new text to modify the content of the previously synthesized image. From the experimental results, these two methods have achieved good practicability. In contrast, the second approach has better human controllability and practicability because it can control the basic content of image synthesis and the shape and position information of the synthetic object at the beginning.
- DOI
- 10.15002/00030035
- Persistent ID (NDL)
- info:ndljp/pid/13342479
- Collection
- Collection (Materials For Handicapped People:1)
- Collection (particular)
- 国立国会図書館デジタルコレクション > デジタル化資料 > 博士論文
- Acquisition Basis
- 博士論文(自動収集)
- Date Accepted (W3CDTF)
- 2024-03-01T22:05:19+09:00
- Format (IMT)
- application/pdf
- Access Restrictions
- 国立国会図書館内限定公開
- Service for the Digitized Contents Transmission Service
- 図書館・個人送信対象外
- Availability of remote photoduplication service
- 可
- Periodical Title (URI)
- Data Provider (Database)
- 国立国会図書館 : 国立国会図書館デジタルコレクション