2016 Fiscal Year Final Research Report
Large-scale text data analysis using hashing techniques
Project/Area Number |
26730126
|
Research Category |
Grant-in-Aid for Young Scientists (B)
|
Allocation Type | Multi-year Fund |
Research Field |
Intelligent informatics
|
Research Institution | NTT Communication Science Laboratories |
Principal Investigator |
Hayashi Katsuhiko 日本電信電話株式会社NTTコミュニケーション科学基礎研究所, 協創情報研究部, 研究員 (50725794)
|
Project Period (FY) |
2014-04-01 – 2017-03-31
|
Keywords | 談話構造解析 / 省略補完 / 行列分解 / ハッシュ法 |
Outline of Final Research Achievements |
I investigated hashing and matrix factorization techniques to efficiently analyze large-scale text data in various domains. First, I proposed a fast and accurate parsing algorithm for discourse tree structure analysis of English newswire texts. I also presented a text summarization method using discourse trees, and achieved an improvement in text summarization accuracy. Second, I proposed a method to automatically detect and insert missing elements in English and Japanese speech/newswire texts. Finally, I proposed a knowledge (word thesaurus) embedding method for fast word similarity computation. In future, I will apply these methods to such more advanced NLP applications as machine translation and question answering.
|
Free Research Field |
自然言語処理
|