2020 Fiscal Year Final Research Report
Construction speedup and deepening of partially transpose double array ngram language models
Project/Area Number |
18K11423
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Review Section |
Basic Section 61030:Intelligent informatics-related
|
Research Institution | University of Tsukuba |
Principal Investigator |
YAMAMOTO Mikio 筑波大学, システム情報系, 教授 (40210562)
|
Project Period (FY) |
2018-04-01 – 2021-03-31
|
Keywords | ngram言語モデル / ダブル配列 / 双方向配置 / 文字列マッチング / 細粒度並列化 |
Outline of Final Research Achievements |
The implementation of ngram language models using the partially transposed double array is excellent in terms of both access speed and model size, but has the disadvantage that it takes a very long time to build the model (data structure). The essential difficulty lies in arranging hundreds of millions to billions of child node arrays (with gaps) in a single array so that they do not collide with each other. Due to the large interdependence, it is difficult to increase the speed by techniques such as simple parallelization. In this study, we deeply examined the properties of the partially transposed double array, realized a faster model construction by multiple acceleration methods, and at the same time achieved a higher compression rate.
|
Free Research Field |
情報工学
|
Academic Significance and Societal Importance of the Research Achievements |
ngram言語モデルは音声認識や統計的機械翻訳技術の基盤技術であるため、本研究の成果によって高速かつコンパクトなngram言語モデルを短時間で作成できるようになった点に意義がある。また、より広い観点からは、ダブル配列はトライと呼ばれる一般的な辞書データ構造の実現方法の一つであり、本研究は巨大なデータに対するトライを高速かつコンパクトに実現できるという意味で巨大な辞書を必要とする広いアプリケーションに対しても有効である。
|