A study on compact and fast translation and language models for statistical machine translation
Project/Area Number |
15H02744
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | University of Tsukuba |
Principal Investigator |
YAMAMOTO Mikio 筑波大学, システム情報系, 教授 (40210562)
|
Co-Investigator(Kenkyū-buntansha) |
乾 孝司 筑波大学, システム情報系, 准教授 (60397031)
|
Research Collaborator |
NORIMATSU Jun-ya
TANIGUCHI Masanori
HAGA Shumpei
OSUMI Kenji
TAKENAKA Kousuke
ISHII Akihiko
|
Project Period (FY) |
2015-04-01 – 2018-03-31
|
Project Status |
Completed (Fiscal Year 2017)
|
Budget Amount *help |
¥16,120,000 (Direct Cost: ¥12,400,000、Indirect Cost: ¥3,720,000)
Fiscal Year 2017: ¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)
Fiscal Year 2016: ¥5,070,000 (Direct Cost: ¥3,900,000、Indirect Cost: ¥1,170,000)
Fiscal Year 2015: ¥6,760,000 (Direct Cost: ¥5,200,000、Indirect Cost: ¥1,560,000)
|
Keywords | 言語モデル / ダブル配列 / 部分転置ダブル配列 / ランダム配置 / 統計的機械翻訳 / トライ / ngram言語モデル / ngramモデル / シングル配列 |
Outline of Final Research Achievements |
Although DALM (Double-Array Language Model) is a fast and compact implementation of ngram language models, it fails to fully capitalize on quantization techniques for values of model parameters such as probabilities of ngrams, because of a structual limitation: it stores values and indexes in the common array. In this study, we developed some variants of DALM which have separate arrays for values and indexes and can exploit benefits of quantization. We investigated basic characteristics of DALM empirically and propose "partly transposed double-array" which is a key technique to educe the ability of DALMs with separate arrays.
|
Report
(4 results)
Research Products
(5 results)