2019 Fiscal Year Annual Research Report

Visual Question Answering System with a Knowledge Base

Research Project

Project/Area Number	18H03264
Research Institution	Osaka University
Principal Investigator	中島悠太大阪大学, データビリティフロンティア機構, 准教授 (70633551)
Co-Investigator(Kenkyū-buntansha)	金進東大学共同利用機関法人情報・システム研究機構(機構本部施設等), データサイエンス共同利用基盤施設, 特任准教授 (40536893)
Project Period (FY)	2018-04-01 – 2022-03-31
Keywords	質疑応答 / 知識ベース / 深層学習
Outline of Annual Research Achievements	知識に基づく視覚情報に関する質疑応答（Visual Question Answering: VQA）の実現を目指して、まずデータセットの構築を実施した。このデータセットは、テレビドラマから抽出された短時間の映像クリップ（発話内容に関する字幕付き）、そのドラマに関する知識を持たなければ回答できず、かつ映像クリップの内容に関連する質問文、またその質問文に対する回答候補４件と、正答、さらに回答に必要となる知識（自然言語テキスト）により構成される。データセットの構築にはクラウドソーシングサービスを利用し、24,282件のデータを収集した。これは映像に関する知識を要求するデータセットとしては最大規模となる。また、このデータセットを利用して、知識ベースを利用する質疑応答システムの基礎的手法を構築した。この手法では、クラウドソーシングにより得られた回答に必要となる知識をまとめて知識ベースとし、映像クリップ、字幕、質問文、回答候補が入力されると、知識ベースから必要な知識を検索して回答に利用する。提案手法の正答率は65%、既存手法では最も正答率の高いモデルで52%であることから、提案手法の有効性が示せたと考える。一方で、当該のドラマを視聴したことがない人、ある人の正答率をクラウドソーシングにより評価したところ、それぞれ75%と90％であることから、モデルとしては改善の余地が見られる。また、質疑応答の問題文などと知識ベースの間の表記の違いを吸収するためのパラフレーズ検出についても、F1スコアで87%の精度を達成可能であることを実験的に示した。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 本研究で目標とする知識に基づく質疑応答の実現に対して、すでに基盤となる技術の開発が完了しており、またパラフレーズ検出についても高い精度を実現した。知識のグラフによる表現とDBpediaなどを外部知識として利用するシステムの構築など、当初予定と前後しているものの、概ね順調に進展しているものと考える。
Strategy for Future Research Activity	今後の計画として、下記を考える。（１）自然言語を外部知識として利用する映像に関する質疑応答: 今年度は、知識ベースとしてデータセット収集時に合わせて収集した回答に必要な知識（通常は１文）をまとめたものを利用した。通常の応用では、回答に紐づく知識は必ずしも入手可能であるとは言えないことから、次年度はインターネット上で公開されたテレビドラマの各話に関するサマリーなどを知識ベースとして用いることを考える。この場合、入力される映像クリップ、質問、回答候補などから、対応するサマリーを検索し、さらに必要に応じてその中の一部を抽出するなどの操作が必要になる。これは、知識ベースを利用する映像に関する質疑応答システムをより実用に近づける取り組みであると考える。（２）知識グラフを利用した質疑応答: 知識ベースとしては、自然言語テキストによる表現に加えて、DBpediaに代表されるデータベースのように、知識グラフとして構造化された表現も考えられ、すでに広く整備されている。そこで、視覚情報に関する質疑応答で、知識ベースとして知識グラフを利用するシステムを構築する。DBpediaなどの既存の知識ベースの活用に加えて、特にテレビドラマに関する質疑応答については、映像自体からの知識の獲得についても検討する。（３）外部知識の置き換えに関する初期的検討: 上記（１）と（２）、いずれについても知識ベースを検索して利用することから、利用時の置き換えの可能性が考えられる。例えば、特定のドラマシリーズに関する知識ベースを別のドラマシリーズのものに置き換えるなどにより、異なるドメインの質問に対応できる可能性がある。そこで、知識ベースの置き換えの可能性を検証するために、まずは（１）のシステムを対象に実際に試行し、問題点を確認する。

Research Products
(15 results)

All 2020 2019 Other

All Int'l Joint Research (1 results) Journal Article (2 results) (of which Peer Reviewed: 2 results, Open Access: 1 results) Presentation (10 results) (of which Int'l Joint Research: 6 results) Remarks (2 results)

[Int'l Joint Research] University of Oulu/Tampere University(フィンランド)
- Country Name
  FINLAND
- Counterpart Institution
  University of Oulu/Tampere University
[Journal Article] Visually grounded paraphrase identification via gating and phrase localization2020
- Author(s)
  Mayu Otani, Chenhui Chu, and Yuta Nakashima
- Journal Title
  
  Neurocomputing
  
  Volume: - Pages: -
- Peer Reviewed
[Journal Article] ContextNet: Representation and exploration for painting classification and retrieval in context2019
- Author(s)
  Noa Garcia, Benjamin Renoust, and Yuta Nakashima
- Journal Title
  
  International Journal on Multimedia Information Retrieval
  
  Volume: 9 Pages: 17-30
- DOI
  https://doi.org/10.1007/s13735-019-00189-4
- Peer Reviewed / Open Access
[Presentation] BERT representations for video question answering2020
- Author(s)
  Zekun Yang, Noa Garcia, Chenhui Chu, Mayu Otani, Yuta Nakashima, and Haruo Takemura
- Organizer
  IEEE Winter Conference on Applications of Computer Vision
- Int'l Joint Research
[Presentation] KnowIT VQA: Answering knowledge-based questions about video2020
- Author(s)
  Noa Garcia, Chenhui Chu, Mayu Otani, and Yuta Nakashima
- Organizer
  AAAI Conference on Artificial Intelligence
- Int'l Joint Research
[Presentation] Adaptive gating mechanism for identifying visually grounded paraphrases2019
- Author(s)
  Mayu Otani, Chenhui Chu, and Yuta Nakashima
- Organizer
  Multi-Discipline Approach for Learning Concepts
- Int'l Joint Research
[Presentation] Rethinking the evaluation of video summaries2019
- Author(s)
  Mayu Otani, Yuta Nakashima, Esa Rahtu, and Janne Heikkila
- Organizer
  IEEE Conference on Computer Vision and Pattern Recognition
- Int'l Joint Research
[Presentation] Context-aware embeddings for automatic art analysis2019
- Author(s)
  Noa Garcia, Benjamin Renoust, and Yuta Nakashima
- Organizer
  ACM International Conference on Multimedia Retrieval
- Int'l Joint Research
[Presentation] Video meets knowledge in visual question answering2019
- Author(s)
  Noa Garcia, Chenhui Chu, Mayu Otani, and Yuta Nakashima
- Organizer
  第22回画像の認識・理解シンポジウム
[Presentation] Collecting relation-aware video captions2019
- Author(s)
  Mayu Otani, Kazuhiro Ota, Yuta Nakashima, Esa Rahtu, Janne Heikkila, and Yoshitaka Ushiku
- Organizer
  第22回画像の認識・理解シンポジウム
[Presentation] Video question answering with BERT2019
- Author(s)
  Zekun Yang, Noa Garcia, Chenhui Chu, Mayu Otani, Yuta Nakashima, and Haruo Takemura
- Organizer
  第22回画像の認識・理解シンポジウム
[Presentation] コメディドラマにおける字幕と表情を用いた笑い予測2019
- Author(s)
  萓谷勇太, 大谷まゆ, Chenhui Chu, 中島悠太, 竹村治雄
- Organizer
  2019年度人工知能学会全国大会
[Presentation] Understanding art through multi-modal retrieval in paintings2019
- Author(s)
  Noa Garcia, Benjamin Renoust, and Yuta Nakashima
- Organizer
  Language and Vision Workshop
- Int'l Joint Research
[Remarks] KnowIT VQA Paper
- URL
  https://knowit-vqa.github.io
[Remarks] Knowledge VQA
- URL
  https://www.n-yuta.jp/project/knowledge-vqa/

2019 Fiscal Year Annual Research Report

Visual Question Answering System with a Knowledge Base

Principal Investigator

中島 悠太 大阪大学, データビリティフロンティア機構, 准教授 (70633551)

Current Status of Research Progress

Reason

Research Products

[Int'l Joint Research] University of Oulu/Tampere University(フィンランド)

Country Name

Counterpart Institution

[Journal Article] Visually grounded paraphrase identification via gating and phrase localization2020

Author(s)

Journal Title

[Journal Article] ContextNet: Representation and exploration for painting classification and retrieval in context2019

Author(s)

Journal Title

DOI

[Presentation] BERT representations for video question answering2020

Author(s)

Organizer

[Presentation] KnowIT VQA: Answering knowledge-based questions about video2020

Author(s)

Organizer

[Presentation] Adaptive gating mechanism for identifying visually grounded paraphrases2019

Author(s)

Organizer

[Presentation] Rethinking the evaluation of video summaries2019

Author(s)

Organizer

[Presentation] Context-aware embeddings for automatic art analysis2019

Author(s)

Organizer

[Presentation] Video meets knowledge in visual question answering2019

Author(s)

Organizer

[Presentation] Collecting relation-aware video captions2019

Author(s)

Organizer

[Presentation] Video question answering with BERT2019

Author(s)

Organizer

[Presentation] コメディドラマにおける字幕と表情を用いた笑い予測2019

Author(s)

Organizer

[Presentation] Understanding art through multi-modal retrieval in paintings2019

Author(s)

Organizer

[Remarks] KnowIT VQA Paper

URL

[Remarks] Knowledge VQA

URL

中島悠太大阪大学, データビリティフロンティア機構, 准教授 (70633551)