Efficiency-Centric Hardware Accelerator for Deep Neural Network Inference

国立国会図書館永続的識別子: info:ndljp/pid/11645614

資料種別: 博士論文

著者: 植吉, 晃大

出版者: Hokkaido University

出版年: 2020-03-25

資料形態: デジタル

ページ数・大きさ等: -

授与大学名・学位: 北海道大学,博士(工学)

すべて見る

国立国会図書館での利用に関する注記

本資料は、掲載誌(URI)等のリンク先にある学位授与機関のWebサイトやCiNii Dissertationsから、本文を自由に閲覧できる場合があります。

資料に関する注記

一般注記：: This study discusses the efficiency-centric hardware architecture for deep neural network (DNN)inference. DNN is a mathematical model inspired by the ...

書店で探す

障害者向け資料で読む

障害者向け資料を見る（1種類）

全国の図書館の所蔵

国立国会図書館以外の全国の図書館の所蔵状況を表示します。

連携機関・データベースの一覧

所蔵のある図書館から取寄せることが可能かなど、資料の利用方法は、ご自身が利用されるお近くの図書館へご相談ください

その他

北海道大学学術成果コレクション
デジタル
連携先のサイトで、学術機関リポジトリデータベース（IRDB）（機関リポジトリ）が連携している機関・データベースの所蔵状況を確認できます。
北海道大学学術成果コレクションのサイトでこの本を確認

書店で探す

障害者向け資料で読む

他サービス
- テキストデータ国立国会図書館デジタルコレクションで確認する

書誌情報

この資料の詳細や典拠（同じ主題の資料を指すキーワード、著者名）等を確認できます。

デジタル

資料種別: 博士論文
タイトル: Efficiency-Centric Hardware Accelerator for Deep Neural Network Inference
著者・編者: 植吉, 晃大
著者標目: 植吉, 晃大
出版事項: Hokkaido University
出版年月日等: 2020-03-25
出版年（W3CDTF）: 2020-03-25
並列タイトル等: 深層ニューラルネットワーク向け高効率HWアクセラレータに関する研究
寄与者: 浅井, 哲也
富田, 章久
葛西, 誠也
池辺, 将之
授与機関名: 北海道大学
授与年月日: 2020-03-25
授与年月日（W3CDTF）: 2020-03-25
報告番号: 甲第14129号
学位: 博士(工学)
博論授与番号: 甲第14129号
本文の言語コード: eng
NDC: 500
対象利用者: 一般
一般注記: This study discusses the efficiency-centric hardware architecture for deep neural network (DNN)inference. DNN is a mathematical model inspired by the functionality of the cortex of the brain.Recently, DNN has been devoted growing attention in many fields of artificial intelligence technology,such as image or sound recognition and natural language processing. This is because DNN can achieve high performance and accuracy in the fields. A lot of data can be managed to train DNN because of improve of processor technology. GPU is the most usable devices for DNN training. Recent GPUs achieves highly parallel processing with low cost. Therefore, became to be able to train a lot of data on the real-world devices. On the other hand, trained DNN have to be run on the restricted devices in the real world. Therefore, designing high energy efficient hardware model is required for the embedded devices.In this study, we explore an optimal approach in the aspect of both algorithm and architectur for highly efficient DNN hardware. Here, I analyze three points, compressed DNN, architecture exploration, and optimal NN model.At first, we analyze a log-quantization and the benefit. Log-quantization is a multi-bit quantization method that utilizes a power-of-2 logarithmic format. The most important feature of logarithmic quantization (log-quantization) is that multiplier hardware is no longer required because all multiplications in the linear field are represented simply through additions in the logarithmic field. Therefore, LOGNET can potentially achieve a high level of energy efficiency. Another advantage of LOGNET is that the memory footprint and bandwidth requirements are much lower than with linear quantization, but with the same accuracy, because a log-quantization can represent the same numeric range using fewer bits. A key insight here is that most of the weight distributions generally form a Gaussian dis ribution, in which smaller values appear more frequently than larger values. Log-quantization can represent these types of non-uniform distributions with a lower amount of numerical errors as compared to linear quantization with the same bit width.Secondly, we propose a novel DNN architecture called QUEST. QUEST is a programmable MIMD parallel accelerator for general-purpose state-of-the-art deep neural networks (DNNs). It features die-to-die stacking with three-cycle latency, 28.8 GB/s, 96 MB, and 8 SRAMs using an inductive coupling technology called the ThruChip Interface (TCI). By stacking the SRAMs instead of DRAMs, lower memory access latency and simpler hardware are expected. This facilitates in balancing the memory capacity, latency, and bandwidth, all of which are in demand by cutting-edge DNNs at a high level. QUEST also introduces log-quantized programmable bit-precision processing for achieving faster (larger) DNN computation (size) in a 3D module. It can sustain a higher recognition accuracy at a lower bit-width region compared to linear quantization. The prototype QUEST chip is integrated in the 40-nm CMOS technology, and it achieves 7.49 tera operations per second (TOPS) peak performance in binary precision, and 1.96 TOPS in 4-bit precision at 300-MHz clock.Lastly, we propose prediction-based DNN model called Dead Neuron Prediction (DNP). In most DNN models, a large part of neurons finally results in zero (dead neuron) due to activation functions.Computations for such the dead neurons waste huge energy by unnecessary multiply-and-accumulate (MAC) operations. To skip unnecessary computations for dead neurons, we propose DNP to predict liveness of neurons in advance by employing a supportive lightweight neural network. By efficiently pipelining both computations of a main DNN and its prediction, computations for likely dead neurons are dynamically skipped. Experiment results indicate a DNN accelerator with DNP achieves a better energy efficiency than prior approaches at the same accuracy.
(主査) 教授浅井哲也, 教授富田章久, 教授葛西誠也（量子集積エレクトロニクス研究センター）, 教授池辺将之（量子集積エレクトロニクス研究センター）
情報科学研究科（情報エレクトロニクス専攻）
DOI: 10.14943/doctoral.k14129
https://doi.org/10.14943/doctoral.k14129
国立国会図書館永続的識別子: info:ndljp/pid/11645614
https://dl.ndl.go.jp/pid/11645614
コレクション（共通）: 障害者向け資料
コレクション（障害者向け資料：レベル1）: テキストデータ
コレクション（個別）: 国立国会図書館デジタルコレクション > デジタル化資料 > 博士論文
https://dl.ndl.go.jp/collections/A00014
収集根拠: 博士論文（自動収集）
受理日（W3CDTF）: 2021-03-07T02:10:05+09:00
作成日（W3CDTF）: 2020-02
記録形式（IMT）: PDF
オンライン閲覧公開範囲: 国立国会図書館内限定公開
デジタル化資料送信: 図書館・個人送信対象外
遠隔複写可否（NDL）: 可
掲載誌（URI）: http://dx.doi.org/10.14943/doctoral.k14129
http://hdl.handle.net/2115/80359
参照（URI）: http://hdl.handle.net/2115/78434
連携機関・データベース: 国立国会図書館 : 国立国会図書館デジタルコレクション
https://dl.ndl.go.jp