2022 Fiscal Year Annual Research Report

Building World Knowledge by Grounding Language and Multimedia

Research Project

Project/Area Number	19H04166
Research Institution	The University of Tokyo
Principal Investigator	中山英樹東京大学, 大学院情報理工学系研究科, 准教授 (00643305)
Project Period (FY)	2019-04-01 – 2023-03-31
Keywords	自然言語処理 / 画像認識 / 知識グラフ / ゼロショット認識 / 未来予測 / マルチモーダル / 知識獲得
Outline of Annual Research Achievements	１．2020年度の成果である[Chen+, AAAI'21]を拡張し、マルチメディアから時間的なダイナミクスを内包する知識グラフを抽出する手法を開発した。まず、画像時系列から物体・イベントなどのコンセプトを抽出し、その時空間的な共起や遷移関係をグラフ構造の形で表現する。さらに、ここへ外部の大規模知識グラフ(ConceptNet)を接続し、トップダウンな常識的知識を加えた広範な知識グラフを構築することができる。具体的な応用タスクとして、与えられた画像時系列の未来の状況を予測して文章で表す予知キャプショニングを提案し、これを実現する手法を開発した。本手法はコンピュータビジョンの最難関国際会議であるCVPR2023へ採択された。２．辞書中のテキストデータ(Wiktionary)から特徴抽出を行った単語概念と画像領域特徴をアラインメントし、共通の埋め込み空間を学習する手法を提案した。その具体的な応用として、画像中の未知物体に対し埋め込み空間上の最近傍の単語を検索することで、ゼロショットの画像キャプショニングを高い精度で実現できることを示した。本手法はコンピュータビジョンの最難関国際会議であるCVPR2022で発表された。３．２の手法では、画像特徴とテキスト特徴のアラインメントは小規模なキャプショニングデータセットで行われており、ゼロショット認識のための外部情報リソースはあくまでテキスト特徴のみで表現されていた。そのため、より画像と親和性が高く汎用的な情報リソースを得ることを目的とし、辞書中のテキストデータに加え画像情報を用いた学習によって外部知識自体をマルチモーダルな空間上で表現するように手法の拡張を行った。このようにして得られる埋め込み空間は、ゼロショット認識の改善はもちろん、概念が為す知識グラフを構築する上で一般的に有効であり、幅広い応用につながる基盤を為すと期待できる。
Research Progress Status	令和4年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	令和4年度が最終年度であるため、記入しない。

Research Products
(27 results)

All 2023 2022 Other

All Int'l Joint Research (2 results) Journal Article (12 results) (of which Int'l Joint Research: 3 results, Peer Reviewed: 12 results, Open Access: 10 results) Presentation (12 results) (of which Int'l Joint Research: 11 results, Invited: 1 results) Book (1 results)

[Int'l Joint Research] University of California, Los Angeles/Amazon(米国)
- Country Name
  U.S.A.
- Counterpart Institution
  University of California, Los Angeles/Amazon
[Int'l Joint Research] National Yang Ming Chiao Tung University/Academia Sinica/National Taiwan University(その他の国・地域)
- Country Name
  その他の国・地域
- Counterpart Institution
  National Yang Ming Chiao Tung University/Academia Sinica/National Taiwan University
[Journal Article] A-CAP: Anticipation Captioning with Commonsense Knowledge2023
- Author(s)
  Duc Minh Vo, Quoc-An Luong, Akihiro Sugimoto, Hideki Nakayama
- Journal Title
  
  Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  
  Volume: - Pages: -
- Peer Reviewed / Open Access
[Journal Article] LED: A Dataset for Life Event Extraction from Dialogs2023
- Author(s)
  Yi-Pei Chen, An-Zi Yen, Hen-Hsen Huang, Hideki Nakayama, Hsin-Hsi Chen
- Journal Title
  
  Findings of the Association for Computational Linguistics: EACL 2023
  
  Volume: - Pages: 384-398
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Indirect Adversarial Losses via an Intermediate Distribution for Training GANs2023
- Author(s)
  Rui Yang, Duc Minh Vo, Hideki Nakayama
- Journal Title
  
  Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
  
  Volume: - Pages: 4641-4650
- DOI
  10.1109/WACV56688.2023.00463
- Peer Reviewed / Open Access
[Journal Article] Character-Centric Story Visualization via Visual Planning and Token Alignment2022
- Author(s)
  Hong Chen, Rujun Han, Te-Lin Wu, Hideki Nakayama and Nanyun Peng
- Journal Title
  
  Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)
  
  Volume: - Pages: 8259-8272
- Peer Reviewed / Open Access
[Journal Article] StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning2022
- Author(s)
  Hong Chen, Duc Minh Vo, Hiroya Takamura, Yusuke Miyao, Hideki Nakayama
- Journal Title
  
  Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)
  
  Volume: - Pages: 1739-1753
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Weakly Supervised Formula Learner for Solving Mathematical Problems2022
- Author(s)
  Yuxuan Wu, Hideki Nakayama
- Journal Title
  
  Proceedings of the 29th International Conference on Computational Linguistics (COLING)
  
  Volume: - Pages: 1743-1752
- Peer Reviewed / Open Access
[Journal Article] Neural Networks in a Product of Hyperbolic Spaces2022
- Author(s)
  Jun Takeuchi, Noriki Nishida, Hideki Nakayama
- Journal Title
  
  Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop
  
  Volume: - Pages: 211-221
- Peer Reviewed / Open Access
[Journal Article] Improving Noised Gradient Penalty with Synchronized Activation Function for Generative Adversarial Networks2022
- Author(s)
  Rui Yang, Raphael Shu, Hideki Nakayama
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: E105-D Pages: 1537-1545
- DOI
  10.1587/transinf.2022EDP7019
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] DJMix: Unsupervised Task-agnostic Image Augmentation for Improving Robustness of Convolutional Neural Networks"2022
- Author(s)
  Ryuichiro Hataya, Hideki Nakayama
- Journal Title
  
  Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN)
  
  Volume: - Pages: 1-8
- DOI
  10.1109/IJCNN55064.2022.9892068
- Peer Reviewed
[Journal Article] Pixel to Binary Embedding Towards Robustness for CNNs2022
- Author(s)
  Ikki Kishida, Hideki Nakayama
- Journal Title
  
  Proceedings of the 26th International Conference on Pattern Recognition (ICPR)
  
  Volume: - Pages: 2279-2285
- DOI
  10.1109/ICPR56361.2022.9956572
- Peer Reviewed
[Journal Article] NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge2022
- Author(s)
  Duc Minh Vo, Hong Chen, Akihiro Sugimoto, Hideki Nakayama
- Journal Title
  
  Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  
  Volume: - Pages: 17979-17987
- DOI
  10.1109/CVPR52688.2022.01747
- Peer Reviewed / Open Access
[Journal Article] OSSGAN: Open-Set Semi-Supervised Image Generation2022
- Author(s)
  Kai Katsumata, Duc Minh Vo, Hideki Nakayama
- Journal Title
  
  Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  
  Volume: - Pages: 11175-11183
- DOI
  10.1109/CVPR52688.2022.01090
- Peer Reviewed / Open Access
[Presentation] A-CAP: Anticipation Captioning with Commonsense Knowledge2023
- Author(s)
  Duc Minh Vo, Quoc-An Luong, Akihiro Sugimoto, Hideki Nakayama
- Organizer
  The 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- Int'l Joint Research
[Presentation] Indirect Adversarial Losses via an Intermediate Distribution for Training GANs2023
- Author(s)
  Rui Yang, Duc Minh Vo, Hideki Nakayama
- Organizer
  The 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
- Int'l Joint Research
[Presentation] Character-Centric Story Visualization via Visual Planning and Token Alignment2022
- Author(s)
  Hong Chen, Rujun Han, Te-Lin Wu, Hideki Nakayama and Nanyun Peng
- Organizer
  The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)
- Int'l Joint Research
[Presentation] StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning2022
- Author(s)
  Hong Chen, Duc Minh Vo, Hiroya Takamura, Yusuke Miyao, Hideki Nakayama
- Organizer
  The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)
- Int'l Joint Research
[Presentation] Weakly Supervised Formula Learner for Solving Mathematical Problems2022
- Author(s)
  Yuxuan Wu, Hideki Nakayama
- Organizer
  The 29th International Conference on Computational Linguistics (COLING)
- Int'l Joint Research
[Presentation] Neural Networks in a Product of Hyperbolic Spaces2022
- Author(s)
  Jun Takeuchi, Noriki Nishida, Hideki Nakayama
- Organizer
  The 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop
- Int'l Joint Research
[Presentation] DJMix: Unsupervised Task-agnostic Image Augmentation for Improving Robustness of Convolutional Neural Networks"2022
- Author(s)
  Ryuichiro Hataya, Hideki Nakayama
- Organizer
  The 2022 International Joint Conference on Neural Networks (IJCNN)
- Int'l Joint Research
[Presentation] Pixel to Binary Embedding Towards Robustness for CNNs2022
- Author(s)
  Ikki Kishida, Hideki Nakayama
- Organizer
  The 26th International Conference on Pattern Recognition (ICPR)
- Int'l Joint Research
[Presentation] NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge2022
- Author(s)
  Duc Minh Vo, Hong Chen, Akihiro Sugimoto, Hideki Nakayama
- Organizer
  The 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- Int'l Joint Research
[Presentation] OSSGAN: Open-Set Semi-Supervised Image Generation2022
- Author(s)
  Kai Katsumata, Duc Minh Vo, Hideki Nakayama
- Organizer
  The 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- Int'l Joint Research
[Presentation] ラベルノイズ付きオープンセット半教師あり画像生成2022
- Author(s)
  勝又海, Duc Minh Vo, 原田達也, 中山英樹
- Organizer
  第25回画像の認識・理解シンポジウム
[Presentation] Incorporating External Knowledge for Vision and Language Systems2022
- Author(s)
  Hideki Nakayama
- Organizer
  2nd Workshop on Trends and Advances in Machine Learning and Automated Reasoning for Intelligent Robots and Systems (in conjunction with IROS 2022)
- Int'l Joint Research / Invited
[Book] 深層学習からマルチモーダル情報処理へ2022
- Author(s)
  中山英樹、二反田篤史、田村晃裕、井上中順、牛久祥孝
- Total Pages
  248
- Publisher
  サイエンス社
- ISBN
  978-4-7819-1554-8

2022 Fiscal Year Annual Research Report

Building World Knowledge by Grounding Language and Multimedia

Principal Investigator

中山 英樹 東京大学, 大学院情報理工学系研究科, 准教授 (00643305)

Research Products

[Int'l Joint Research] University of California, Los Angeles/Amazon(米国)

Country Name

Counterpart Institution

[Int'l Joint Research] National Yang Ming Chiao Tung University/Academia Sinica/National Taiwan University(その他の国・地域)

Country Name

Counterpart Institution

[Journal Article] A-CAP: Anticipation Captioning with Commonsense Knowledge2023

Author(s)

Journal Title

[Journal Article] LED: A Dataset for Life Event Extraction from Dialogs2023

Author(s)

Journal Title

[Journal Article] Indirect Adversarial Losses via an Intermediate Distribution for Training GANs2023

Author(s)

Journal Title

DOI

[Journal Article] Character-Centric Story Visualization via Visual Planning and Token Alignment2022

Author(s)

Journal Title

[Journal Article] StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning2022

Author(s)

Journal Title

[Journal Article] Weakly Supervised Formula Learner for Solving Mathematical Problems2022

Author(s)

Journal Title

[Journal Article] Neural Networks in a Product of Hyperbolic Spaces2022

Author(s)

Journal Title

[Journal Article] Improving Noised Gradient Penalty with Synchronized Activation Function for Generative Adversarial Networks2022

Author(s)

Journal Title

DOI

[Journal Article] DJMix: Unsupervised Task-agnostic Image Augmentation for Improving Robustness of Convolutional Neural Networks"2022

Author(s)

Journal Title

DOI

[Journal Article] Pixel to Binary Embedding Towards Robustness for CNNs2022

Author(s)

Journal Title

DOI

[Journal Article] NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge2022

Author(s)

Journal Title

DOI

[Journal Article] OSSGAN: Open-Set Semi-Supervised Image Generation2022

Author(s)

Journal Title

DOI

[Presentation] A-CAP: Anticipation Captioning with Commonsense Knowledge2023

Author(s)

Organizer

[Presentation] Indirect Adversarial Losses via an Intermediate Distribution for Training GANs2023

Author(s)

Organizer

[Presentation] Character-Centric Story Visualization via Visual Planning and Token Alignment2022

Author(s)

Organizer

[Presentation] StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning2022

Author(s)

Organizer

[Presentation] Weakly Supervised Formula Learner for Solving Mathematical Problems2022

Author(s)

Organizer

[Presentation] Neural Networks in a Product of Hyperbolic Spaces2022

Author(s)

Organizer

[Presentation] DJMix: Unsupervised Task-agnostic Image Augmentation for Improving Robustness of Convolutional Neural Networks"2022

Author(s)

Organizer

[Presentation] Pixel to Binary Embedding Towards Robustness for CNNs2022

Author(s)

Organizer

[Presentation] NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge2022

Author(s)

Organizer

中山英樹東京大学, 大学院情報理工学系研究科, 准教授 (00643305)