Large-scale text data analysis using hashing techniques
Project/Area Number |
26730126
|
Research Category |
Grant-in-Aid for Young Scientists (B)
|
Allocation Type | Multi-year Fund |
Research Field |
Intelligent informatics
|
Research Institution | NTT Communication Science Laboratories |
Principal Investigator |
Hayashi Katsuhiko 日本電信電話株式会社NTTコミュニケーション科学基礎研究所, 協創情報研究部, 研究員 (50725794)
|
Project Period (FY) |
2014-04-01 – 2017-03-31
|
Project Status |
Completed (Fiscal Year 2016)
|
Budget Amount *help |
¥2,470,000 (Direct Cost: ¥1,900,000、Indirect Cost: ¥570,000)
Fiscal Year 2016: ¥390,000 (Direct Cost: ¥300,000、Indirect Cost: ¥90,000)
Fiscal Year 2015: ¥260,000 (Direct Cost: ¥200,000、Indirect Cost: ¥60,000)
Fiscal Year 2014: ¥1,820,000 (Direct Cost: ¥1,400,000、Indirect Cost: ¥420,000)
|
Keywords | 談話構造解析 / 省略補完 / 行列分解 / ハッシュ法 / 低ランク近似 / 分枝限定法 / 線形時間言語解析 / 音声言語データ解析 / 自然言語処理 / 談話解析 / 修辞構造解析 |
Outline of Final Research Achievements |
I investigated hashing and matrix factorization techniques to efficiently analyze large-scale text data in various domains. First, I proposed a fast and accurate parsing algorithm for discourse tree structure analysis of English newswire texts. I also presented a text summarization method using discourse trees, and achieved an improvement in text summarization accuracy. Second, I proposed a method to automatically detect and insert missing elements in English and Japanese speech/newswire texts. Finally, I proposed a knowledge (word thesaurus) embedding method for fast word similarity computation. In future, I will apply these methods to such more advanced NLP applications as machine translation and question answering.
|
Report
(4 results)
Research Products
(7 results)
-
-
[Presentation] K-best Iterative Viterbi Parsing2017
Author(s)
Katsuhiko Hayashi, Masaaki Nagata
Organizer
The 15th Conference of the European Chapter of the Association for Computational Linguistics
Place of Presentation
バレンシア
Year and Date
2017-04-05
Related Report
Int'l Joint Research
-
-
-
-
-