2016 Fiscal Year Annual Research Report

Large-scale text data analysis using hashing techniques

Research Project

Project/Area Number	26730126
Research Institution	NTT Communication Science Laboratories
Principal Investigator	林克彦日本電信電話株式会社NTTコミュニケーション科学基礎研究所, 協創情報研究部, 研究員 (50725794)
Project Period (FY)	2014-04-01 – 2017-03-31
Keywords	ハッシュ法 / 低ランク近似 / 分枝限定法
Outline of Annual Research Achievements	最終年度では「文書の構造処理」、「高速なK-best構造予測アルゴリズム」、「低次元埋め込みモデル」に関する研究を行った。「文書の構造処理」では文書を修辞構造木と呼ばれる構造に変換する技術について研究を行った。これは文書要約や文書データマイニングなどに応用される。この課題では文書のような長い系列を効率的に処理する仕組みが必要とされるため、統計モデルの特徴量表現をハッシュ化し、また、探索中に発生する冗長な解をハッシュ法で取り除く技術を開発した。その結果、英語の新聞記事に対する修辞構造木を精度の低下なく平均0.05cpu秒で解析することを可能とし、文書要約システムの高速化に大きく貢献した。「高速なK-best構造予測アルゴリズム」では文の句構造や文書の修辞構造解析の最適な上位K個の解を高速に求めるアルゴリズムを考案した。信頼性の高いK個の解を高速に求めることで、機械翻訳や文書要約などの精度や速度向上を可能にする。ここでは分枝限定法をK-best構造予測問題に応用することで高速化を可能にした。「低次元埋め込みモデル」では知識グラフなどのラベル付き有向グラフを低ランク近似するための分解法について理論的な分析を行った。知識グラフは情報抽出、セマンティックウェブ、質問応答などへの応用が期待されている。低ランク近似モデルでは知識グラフをそのentityやrelationに関する行列に分解し、低次元ベクトルに埋め込んだ上で、内積などのベクトル演算を使ってentity間のrelationをスコア化する。このスコアは単純にはリンク予測問題に使われ、情報抽出や質問応答にも拡張することができる。ここでは複素数を使った埋め込み法が従来法と比較して、低次元、高速、かつ、高精度であることを理論的に分析した。

Research Products
(5 results)

All 2017 2016

All Presentation (4 results) (of which Int'l Joint Research: 3 results, Invited: 1 results) Patent(Industrial Property Rights) (1 results)

[Presentation] On the Equivalence of Holographic and Complex Embeddings for Link Prediction2017
- Author(s)
  Katsuhiko Hayashi, Masashi Shimbo
- Organizer
  The 55th Annual Meeting of the Association for Computational Linguistics
- Place of Presentation
  バンクーバー
- Year and Date
  2017-07-31 – 2017-08-02
- Int'l Joint Research
[Presentation] 知識グラフの埋め込みとその応用2017
- Author(s)
  林克彦
- Organizer
  千葉工業大学ステアラボ人工知能セミナー
- Place of Presentation
  東京
- Year and Date
  2017-06-23 – 2017-06-23
- Invited
[Presentation] K-best Iterative Viterbi Parsing2017
- Author(s)
  Katsuhiko Hayashi, Masaaki Nagata
- Organizer
  The 15th Conference of the European Chapter of the Association for Computational Linguistics
- Place of Presentation
  バレンシア
- Year and Date
  2017-04-05 – 2017-04-07
- Int'l Joint Research
[Presentation] Empirical comparison of dependency conversions for RST discourse trees2016
- Author(s)
  Katsuhiko Hayashi, Tsutomu Hirao, Masaaki Nagata
- Organizer
  The 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue
- Place of Presentation
  ロサンゼルス
- Year and Date
  2016-09-13 – 2016-09-15
- Int'l Joint Research
[Patent(Industrial Property Rights)] 単語学習装置、単語学習方法及び単語学習プログラム2017
- Inventor(s)
  林克彦、新保仁、永田昌明
- Industrial Property Rights Holder
  林克彦、新保仁、永田昌明
- Industrial Property Rights Type
  特許
- Industrial Property Number
  2017039543
- Filing Date
  2017-03-02

2016 Fiscal Year Annual Research Report

Large-scale text data analysis using hashing techniques

Principal Investigator

林 克彦 日本電信電話株式会社NTTコミュニケーション科学基礎研究所, 協創情報研究部, 研究員 (50725794)

Research Products

[Presentation] On the Equivalence of Holographic and Complex Embeddings for Link Prediction2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 知識グラフの埋め込みとその応用2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] K-best Iterative Viterbi Parsing2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Empirical comparison of dependency conversions for RST discourse trees2016

Author(s)

Organizer

Place of Presentation

Year and Date

[Patent(Industrial Property Rights)] 単語学習装置、単語学習方法及び単語学習プログラム2017

Inventor(s)

Industrial Property Rights Holder

Industrial Property Rights Type

Industrial Property Number

Filing Date

林克彦日本電信電話株式会社NTTコミュニケーション科学基礎研究所, 協創情報研究部, 研究員 (50725794)