2021 Fiscal Year Annual Research Report

Building World Knowledge by Grounding Language and Multimedia

Research Project

Project/Area Number	19H04166
Research Institution	The University of Tokyo
Principal Investigator	中山英樹東京大学, 大学院情報理工学系研究科, 准教授 (00643305)
Project Period (FY)	2019-04-01 – 2023-03-31
Keywords	自然言語処理 / 画像認識 / グラウンディング / 知識グラフ / ゼロショット認識
Outline of Annual Research Achievements	本研究では、画像・動画像等のマルチメディアに言語概念をグラウンディング（記号接地）させ、時空間的な共起関係を手掛かりにさまざまな概念間の関連性を推定し、グラフ構造を有するデータベース（知識グラフ）として獲得する新しいアプローチを提案する。本年度は、大きく分けて以下に示す三つの成果を得た。１．BERTにより辞書データ(Wiktionary)から特徴抽出を行った単語概念と画像領域特徴をアラインメントし、共通の埋め込み空間を学習する手法を提案した。これにより、事前学習済み言語モデルの表現能力を活用しながら、画像概念と言語概念のグラウンディングを行ったマルチモーダル埋め込み空間を得ることができる。この空間では、Transformerの注意機構により、画像領域の視覚的特徴のみならず複数領域の共起関係や位置情報なども考慮された埋め込みが行われていることが重要であり、この空間上での距離を基準として概念が為すグラフを構築することができる。その具体的な応用として、画像中の未知物体に対しグラフ上の最近傍の単語を検索することで、ゼロショットの画像キャプショニングを高い精度で実現できることを示した。本研究はコンピュータビジョンの最難関国際会議であるCVPRへ採択された。２．自然言語処理におけるストーリー生成タスクを題材とし、概念グラフ上の探索を通じて論理性を保ちながら多様性のあるテキスト生成を行う手法を開発した。ここで提案したグラフ探索法は汎用性の高いものであり、ストーリー生成に限らずさまざまなダウンストリームタスクで、知識グラフを活用するための重要な基礎技術となる。３．グラウンディング自体の性能向上を行うため、基礎的な画像認識の手法開発にも引き続き取り組み、訓練時に想定しない入力ノイズに対して認識の頑健性を高める手法を複数開発した。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 本研究計画の立案時から周辺状況が変化し、BERTやGPT-3に代表される事前学習済み言語モデルが外部知識（常識的知識）のリソースとして驚異的な性能を発揮するようになったため、これにマルチメディアから得られる知識をどのように組み込むかが本年度の研究の焦点となった。この問いに対し、研究実績１で述べたグラウンディング手法は、事前学習済み言語モデルの表現能力や辞書情報を活用しながら画像のセマンティクスに沿うように概念の埋め込み空間を学習するものであり、よい着地点になっていると考える。成果発表の面でも、コンピュータビジョンの最難関国際会議であるCVPRへ採択されるなど既に高い評価を得ており、本研究計画の要となる技術について十分な成果が得られたものと考える。もう一つの重要項目である知識グラフの構築と活用法に関しても、研究実績２で述べた通り着実な成果が得られている。このように、研究計画を実現するにあたり必要となる技術の開発はほぼ完了している。この他、研究計画全体を下支えする基礎技術である画像認識や単語埋め込み等に関しても多くの成果発表を行っており、全体として十分な進捗が得られていると考える。
Strategy for Future Research Activity	残す主な課題は、研究実績１で述べたマルチモーダル埋め込み手法と、研究実績２で述べたグラフ構築・活用法の統合実装と評価である。既に取り組んでいるゼロショット画像キャプショニングや画像ストーリー生成を評価タスクとし、埋め込み空間上で構築したグラフの探索により、出力の精度や多様性の向上を実現することを目指す。また、現状の提案手法は非言語のマルチメディア情報として静止画しか利用できていないため、時系列Transformerを用いて動画像へ対応させることで、画像概念の時空間的な共起関係を扱えるように拡張する。

Research Products
(20 results)

All 2022 2021

All Journal Article (9 results) (of which Peer Reviewed: 9 results, Open Access: 6 results) Presentation (11 results) (of which Int'l Joint Research: 10 results, Invited: 2 results)

[Journal Article] Pixel to Binary Embedding Towards Robustness for CNNs2022
- Author(s)
  Ikki Kishida, Hideki Nakayama
- Journal Title
  
  Proceedings of the 26th International Conference on Pattern Recognition (ICPR)
  
  Volume: - Pages: 2279-2285
- DOI
  10.1109/ICPR56361.2022.9956572
- Peer Reviewed
[Journal Article] PPCD-GAN: Progressive Pruning and Class-Aware Distillation for Large-Scale Conditional GANs Compression2022
- Author(s)
  Duc Minh Vo, Akihiro Sugimoto, Hideki Nakayama
- Journal Title
  
  Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
  
  Volume: - Pages: 1422-1430
- DOI
  10.1109/WACV51458.2022.00149
- Peer Reviewed / Open Access
[Journal Article] Meta Approach to Data Augmentation Optimization2022
- Author(s)
  Ryuichiro Hataya, Jan Zdenek, Kazuki Yoshizoe, Hideki Nakayama
- Journal Title
  
  Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
  
  Volume: - Pages: 3535-3544
- DOI
  10.1109/WACV51458.2022.00359
- Peer Reviewed / Open Access
[Journal Article] NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge2022
- Author(s)
  Duc Minh Vo, Hong Chen, Akihiro Sugimoto, Hideki Nakayama
- Journal Title
  
  Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  
  Volume: - Pages: -
- Peer Reviewed / Open Access
[Journal Article] OSSGAN: Open-Set Semi-Supervised Image Generation2022
- Author(s)
  Kai Katsumata, Duc Minh Vo, Hideki Nakayama
- Journal Title
  
  Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  
  Volume: - Pages: -
- Peer Reviewed / Open Access
[Journal Article] DCT-based Fast Spectral Convolution for Deep Convolutional Neural Networks2021
- Author(s)
  Yuhao Xu and Hideki Nakayama
- Journal Title
  
  Proceedings of the International Joint Conference on Neural Networks (IJCNN)
  
  Volume: - Pages: 1-8
- DOI
  10.1109/IJCNN52387.2021.9534135
- Peer Reviewed
[Journal Article] GraphPlan: Story Generation by Planning with Event Graph2021
- Author(s)
  Hong Chen, Raphael Shu, Hiroya Takamura, Hideki Nakayama
- Journal Title
  
  Proceedings of the 14th International Conference on Natural Language Generation (INLG)
  
  Volume: - Pages: 377-386
- Peer Reviewed / Open Access
[Journal Article] JokerGAN: Memory-Efficient Model for Handwritten Text Generation with Text Line Awareness2021
- Author(s)
  Jan Zdenek and Hideki Nakayama
- Journal Title
  
  Proceedings of the 29th ACM International Conference on Multimedia (ACMMM)
  
  Volume: - Pages: 5655-5663
- DOI
  10.1145/3474085.3475713
- Peer Reviewed
[Journal Article] SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation2021
- Author(s)
  Hong Chen, Hiroya Takamura, Hideki Nakayama
- Journal Title
  
  Findings of the Association for Computational Linguistics: EMNLP 2021
  
  Volume: - Pages: 1483-1492
- DOI
  10.18653/v1/2021.findings-emnlp.128
- Peer Reviewed / Open Access
[Presentation] PPCD-GAN: Progressive Pruning and Class-Aware Distillation for Large-Scale Conditional GANs Compression2022
- Author(s)
  Duc Minh Vo, Akihiro Sugimoto, Hideki Nakayama
- Organizer
  IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
- Int'l Joint Research
[Presentation] Meta Approach to Data Augmentation Optimization2022
- Author(s)
  Ryuichiro Hataya, Jan Zdenek, Kazuki Yoshizoe, Hideki Nakayama
- Organizer
  IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
- Int'l Joint Research
[Presentation] NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge2022
- Author(s)
  Duc Minh Vo, Hong Chen, Akihiro Sugimoto, Hideki Nakayama
- Organizer
  IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Int'l Joint Research
[Presentation] OSSGAN: Open-Set Semi-Supervised Image Generation2022
- Author(s)
  Kai Katsumata, Duc Minh Vo, Hideki Nakayama
- Organizer
  Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Int'l Joint Research
[Presentation] Pixel to Binary Embedding Towards Robustness for CNNs2022
- Author(s)
  Ikki Kishida, Hideki Nakayama
- Organizer
  International Conference on Pattern Recognition (ICPR)
- Int'l Joint Research
[Presentation] Efficient Training of Neural Module Networks and Applications2021
- Author(s)
  Hideki Nakayama
- Organizer
  Fifth International Workshop on Symbolic-Neural Learning (SNL-2021)
- Int'l Joint Research / Invited
[Presentation] DCT-based Fast Spectral Convolution for Deep Convolutional Neural Networks2021
- Author(s)
  Yuhao Xu and Hideki Nakayama
- Organizer
  International Joint Conference on Neural Networks (IJCNN)
- Int'l Joint Research
[Presentation] GraphPlan: Story Generation by Planning with Event Graph2021
- Author(s)
  Hong Chen, Raphael Shu, Hiroya Takamura, Hideki Nakayama
- Organizer
  International Conference on Natural Language Generation (INLG)
- Int'l Joint Research
[Presentation] JokerGAN: Memory-Efficient Model for Handwritten Text Generation with Text Line Awareness2021
- Author(s)
  Jan Zdenek and Hideki Nakayama
- Organizer
  ACM International Conference on Multimedia (ACMMM)
- Int'l Joint Research
[Presentation] SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation2021
- Author(s)
  Hong Chen, Hiroya Takamura, Hideki Nakayama
- Organizer
  Conference on Empirical Methods in Natural Language Processing (EMNLP)
- Int'l Joint Research
[Presentation] 深層学習におけるデータ拡張の原理と最新動向2021
- Author(s)
  中山英樹, 幡谷龍一郎
- Organizer
  第27回画像センシングシンポジウム OS2：続・限られたデータからの深層学習
- Invited

2021 Fiscal Year Annual Research Report

Building World Knowledge by Grounding Language and Multimedia

Principal Investigator

中山 英樹 東京大学, 大学院情報理工学系研究科, 准教授 (00643305)

Current Status of Research Progress

Reason

Research Products

[Journal Article] Pixel to Binary Embedding Towards Robustness for CNNs2022

Author(s)

Journal Title

DOI

[Journal Article] PPCD-GAN: Progressive Pruning and Class-Aware Distillation for Large-Scale Conditional GANs Compression2022

Author(s)

Journal Title

DOI

[Journal Article] Meta Approach to Data Augmentation Optimization2022

Author(s)

Journal Title

DOI

[Journal Article] NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge2022

Author(s)

Journal Title

[Journal Article] OSSGAN: Open-Set Semi-Supervised Image Generation2022

Author(s)

Journal Title

[Journal Article] DCT-based Fast Spectral Convolution for Deep Convolutional Neural Networks2021

Author(s)

Journal Title

DOI

[Journal Article] GraphPlan: Story Generation by Planning with Event Graph2021

Author(s)

Journal Title

[Journal Article] JokerGAN: Memory-Efficient Model for Handwritten Text Generation with Text Line Awareness2021

Author(s)

Journal Title

DOI

[Journal Article] SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation2021

Author(s)

Journal Title

DOI

[Presentation] PPCD-GAN: Progressive Pruning and Class-Aware Distillation for Large-Scale Conditional GANs Compression2022

Author(s)

Organizer

[Presentation] Meta Approach to Data Augmentation Optimization2022

Author(s)

Organizer

[Presentation] NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge2022

Author(s)

Organizer

[Presentation] OSSGAN: Open-Set Semi-Supervised Image Generation2022

Author(s)

Organizer

[Presentation] Pixel to Binary Embedding Towards Robustness for CNNs2022

Author(s)

Organizer

[Presentation] Efficient Training of Neural Module Networks and Applications2021

Author(s)

Organizer

[Presentation] DCT-based Fast Spectral Convolution for Deep Convolutional Neural Networks2021

Author(s)

Organizer

[Presentation] GraphPlan: Story Generation by Planning with Event Graph2021

Author(s)

Organizer

[Presentation] JokerGAN: Memory-Efficient Model for Handwritten Text Generation with Text Line Awareness2021

Author(s)

Organizer

[Presentation] SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation2021

Author(s)

Organizer

[Presentation] 深層学習におけるデータ拡張の原理と最新動向2021

Author(s)

Organizer

中山英樹東京大学, 大学院情報理工学系研究科, 准教授 (00643305)