Query-Aware Locality Sensitive Hashing for Similarity Search Problems

国立国会図書館永続的識別子: info:ndljp/pid/11679005

資料種別: 博士論文

著者: 陸, 可鏡

出版者: Hokkaido University

出版年: 2021-03-25

資料形態: デジタル

ページ数・大きさ等: -

授与大学名・学位: 北海道大学,博士(情報科学)

すべて見る

国立国会図書館での利用に関する注記

本資料は、掲載誌(URI)等のリンク先にある学位授与機関のWebサイトやCiNii Dissertationsから、本文を自由に閲覧できる場合があります。

資料に関する注記

一般注記：: Similarity search is an old but fundamental research topic which has various applications in database, data mining, information retrieval and machine ...

書店で探す

障害者向け資料で読む

障害者向け資料を見る（1種類）

全国の図書館の所蔵

国立国会図書館以外の全国の図書館の所蔵状況を表示します。

連携機関・データベースの一覧

所蔵のある図書館から取寄せることが可能かなど、資料の利用方法は、ご自身が利用されるお近くの図書館へご相談ください

その他

北海道大学学術成果コレクション
デジタル
連携先のサイトで、学術機関リポジトリデータベース（IRDB）（機関リポジトリ）が連携している機関・データベースの所蔵状況を確認できます。
北海道大学学術成果コレクションのサイトでこの本を確認

書店で探す

障害者向け資料で読む

他サービス
- テキストデータ国立国会図書館デジタルコレクションで確認する

書誌情報

この資料の詳細や典拠（同じ主題の資料を指すキーワード、著者名）等を確認できます。

デジタル

資料種別: 博士論文
タイトル: Query-Aware Locality Sensitive Hashing for Similarity Search Problems
著者・編者: 陸, 可鏡
著者標目: 陸, 可鏡
出版事項: Hokkaido University
出版年月日等: 2021-03-25
出版年（W3CDTF）: 2021-03-25
並列タイトル等: 類似検索問題に対するクエリ・アウェア局所鋭敏ハッシング
寄与者: 工藤, 峰一
今井, 英幸
田中, 章
授与機関名: 北海道大学
授与年月日: 2021-03-25
授与年月日（W3CDTF）: 2021-03-25
報告番号: 甲第14585号
学位: 博士(情報科学)
博論授与番号: 甲第14585号
本文の言語コード: eng
NDC: 500
対象利用者: 一般
一般注記: Similarity search is an old but fundamental research topic which has various applications in database, data mining, information retrieval and machine learning. Although the goal of similarity search, that is, finding most similar points of issued queries in the dataset given a specified measure, is quite straightforward, it is very challenging to solve this problem efficiently due to the following two reasons. (1) In recent years, as the data size increases rapidly, we need to deal with up to billion-scale datasets, on which many traditional techniques lose the effectiveness. (2) Due to the phenomenon called the curse of dimensionality, the distances among data points become much closer as the dimension increases, which makes most of tree structures perform even worse than the linear scan. In order to overcome these two obstacles, many researchers have devoted to find more efficient techniques in the past two decades and proposed various algorithms.Although these algorithms vary much, the following two performance metrics always attract particular attention in their designs: one is the accuracy, which is often indicated by the percentage of true points which are successfully found, also called, the recall rate, and the other one is the efficiency which is often indicated by the running time of searching. In order to make the accuracy more controllable, researchers have developed some techniques with theoretical guarantees to satisfy specified success probabilities. Among these techniques, Locality Sensitive Hashing (LSH) draws a particular interest due to its attractive query performance and robust probability guarantee.The basic idea of LSH is to build a group of projected vectors and select the most promising candidates only from the projection information of those vectors. Such an idea becomes the starting point of the research work introduced in this paper.Since the solutions of similarity search problems vary highly over different spaces and different measures, in this paper, we particularly focus on the most important two situations: the Euclidean space equipped with `2 metric and the inner product space. In practice, many applications are intimately related to either of these two situations. Accordingly, this thesis is divided into two major parts.In the first part, we focus on the `2 metric and aim to solve approximate nearest neighbor search problems. For this purpose, we propose two diskbased LSH variants called VHP and R2LSH, which are highly inspired by the concept of query-aware search window. By exploiting more accurate projection information of the generated one-dimensional projected vectors, these two methods build more effective query-centric search regions in the projected spaces. Specifically, R2LSH builds multiple query-centric balls in a group of two-dimensional projected subspaces, while VHP utilizes the information of one-dimensional distances between the query and data points on every projected vector. Both of these two methods can prune false points more accurately than the existing query-aware LSH variants.In the second part, we focus on the inner product space and solve the maximum inner product search problem. For this purpose, we propose a query-aware LSH variant called AdaLSH which is built on a multi-ring structure. The basic idea of AdaLSH is that, based on dfferent norms of data points, we can control adaptively the width of the query-aware search windows such that those points having larger norms can be examined more carefully to ensure the accuracy. In order to realize this idea, we design a multi-round search strategy such that query-aware search windows in different rings extend at different paces, which not only achieves the goal mentioned above, but also significantly decreases the total searching cost.Since all three proposed methods are based on LSH, all of the proposed algorithms possess probability guarantees in accuracy. Extensive experiments confirm the superiority of these methods over existing state-of-the-art methods. In particular, R2LSH and VHP scale up to billion-scale datasets,because they are disk-based.
(主査) 教授工藤峰一, 教授今井英幸, 教授田中章
情報科学研究科(情報理工学専攻)
DOI: 10.14943/doctoral.k14585
https://doi.org/10.14943/doctoral.k14585
国立国会図書館永続的識別子: info:ndljp/pid/11679005
https://dl.ndl.go.jp/pid/11679005
コレクション（共通）: 障害者向け資料
コレクション（障害者向け資料：レベル1）: テキストデータ
コレクション（個別）: 国立国会図書館デジタルコレクション > デジタル化資料 > 博士論文
https://dl.ndl.go.jp/collections/A00014
収集根拠: 博士論文（自動収集）
受理日（W3CDTF）: 2021-06-07T02:06:26+09:00
記録形式（IMT）: PDF
オンライン閲覧公開範囲: 国立国会図書館内限定公開
デジタル化資料送信: 図書館・個人送信対象外
遠隔複写可否（NDL）: 可
掲載誌（URI）: http://dx.doi.org/10.14943/doctoral.k14585
http://hdl.handle.net/2115/81266
参照（URI）: http://hdl.handle.net/2115/81261
連携機関・データベース: 国立国会図書館 : 国立国会図書館デジタルコレクション
https://dl.ndl.go.jp