三次元空間情報を用いた実世界質問応答基盤の創出

Research Project

Project/Area Number	22K12159
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	Advanced Telecommunications Research Institute International
Principal Investigator	宮西大樹株式会社国際電気通信基礎技術研究所, 脳情報通信総合研究所, 研究員 (10737521)
Project Period (FY)	2022-04-01 – 2025-03-31
Project Status	Granted (Fiscal Year 2023)
Budget Amount *help	¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000) Fiscal Year 2024: ¥130,000 (Direct Cost: ¥100,000、Indirect Cost: ¥30,000) Fiscal Year 2023: ¥520,000 (Direct Cost: ¥400,000、Indirect Cost: ¥120,000) Fiscal Year 2022: ¥3,380,000 (Direct Cost: ¥2,600,000、Indirect Cost: ¥780,000)
Keywords	3D点群 / 3D and Language / Embodied AI / コンピュータビジョン / 自然言語処理 / 3D点群データ
Outline of Research at the Start	本研究では、二次元動画像の視覚的質問応答手法（2D- VQA）と三次元空間認識技術を統合することで、三次元空間情報について自然言語を介して問い合わせできる三次元空間質問応答技術（3D-VQA）を確立する。複数の屋内環境をRGB-Dスキャンした三次元空間情報の質問応答データセットを作成し、従来の 2D-VQAとの比較検証を行い、実空間の意味的・立体的理解が必要な問い合わせに三次元空間データが有用であることを実証する。本技術により、実空間の意味内容を理解して対話指示できるロボットや、実世界やVR・ARの三次元空間情報に自由にアクセスできる検索エンジンなどへの展開が期待できる。
Outline of Annual Research Achievements	本研究では、申請者がこれまで取り組んできた二次元動画像の視覚的質問応答手法と近年発展が著しい三次元空間認識技術を統合することで、三次元空間情報について自然言語を介して問い合わせできる三次元空間質問応答技術を確立する。複数の屋内環境をRGB-Dスキャンした三次元空間情報の質問応答データセットを作成し、従来の2D-VQAとの比較検証を行い、実空間の意味的・立体的理解が必要な問い合わせに三次元空間データが有用であることを実証する。本技術で実現される実空間を理解して応答する機能は、実空間の意味内容を理解して対話指示できるロボットや、実世界やVR・ARの三次元空間情報に自由にアクセスできる検索エンジンなど、幅広い分野で応用展開が期待できる。今年度は以下の項目に取り組んだ。①二次元空間質問応答モデルとの比較による優位性の検証：提案手法の工学的価値を検証するため、既存の二次元画像の視覚的質問応答（2D-VQA）で使用されている手法を昨年度作成した三次元質問応答デーセット課題に適用し、3D-VQAとの性能比較を行った。②既存の二次元画像と三次元点群データを融合した3D Visual Grounding手法の開発：前項で、三次元質問応答課題では2D-VQAと比較して3D-VQAがより高い精度を示したことが判明した。しかし、2D画像は3D点群データと比較して解像度が高く、より詳細な情報を捉えることができる利点がある。そこで、2D画像と3D点群データを組み合わせた3D Visual Grounding手法の開発を行った。
Current Status of Research Progress	Current Status of Research Progress 1: Research has progressed more than it was originally planned. Reason 本年度は、①実世界の状況が変化した屋内環境の3Dスキャンデータ3RScanに対して、屋内環境の物体の内容を表すテキストデータのアノテーションを行い3D Visual GroundingデータセットRIOReferを作成した。②また、スキャン時に取得される2D動画像データと3D点群データを融合した3D Visual Grounding手法の開発し、RIOReferを用いて提案手法の有効性を検証した。③さらに、3D Visual Grounding手法を都市スケールのデータに適用し、有効性の検証を行った。研究結果をまとめた論文が人工知能のTop-tierの国際会議NeurIPS 2023 D&Bと3D Visionの国際会議3DV 2024に採択されたため、「(1)当初の計画以上に進展している」の評価が妥当と考える。
Strategy for Future Research Activity	現在、3D質問応答データセットScanQAを更に拡張したEmbodied QAデータセットの作成を行ってる。この新しいデータセットを用いて、Embodiedエージェントの評価実験を実施している。次年度には、ロボットの実機を使用したEmbodied QAエージェントの実装と実験に取り組む予定である。さらに、3D Visual Groundingの結果を活用した言語指示に基づくナビゲーション手法の開発も行う。

Report

(2 results)

2023 Research-status Report
2022 Research-status Report

Research Products
(5 results)

All 2024 2023 2022

All Journal Article (2 results) (of which Peer Reviewed: 2 results) Presentation (3 results) (of which Int'l Joint Research: 1 results, Invited: 1 results)

[Journal Article] Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans2024
- Author(s)
  Taiki Miyanishi, Daichi Azuma, Shuhei Kurita, and Motoki Kawanabe
- Journal Title
  
  In Proceedings of the 11th International Conference on 3D Vision 2024 (3DV 2024)
  
  Volume: -
- Related Report
  2023 Research-status Report
- Peer Reviewed
[Journal Article] CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud Data2023
- Author(s)
  Taiki Miyanishi*, Fumiya Kitamori*, Shuhei Kurita, Jungdae Lee, Motoaki Kawanabe, and Nakamasa Inoue
- Journal Title
  
  In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS D&B 2023).
  
  Volume: -
- Related Report
  2023 Research-status Report
- Peer Reviewed
[Presentation] 異なるRGB-Dスキャンを用いたデータセット横断3D言語接地2023
- Author(s)
  宮西大樹, 東大地, 栗田修平, 川鍋一晃
- Organizer
  2023年度人工知能学会全国大会（第37回）
- Related Report
  2023 Research-status Report
[Presentation] ScanQA: 3D Question Answering for Spatial Scene Understanding2022
- Author(s)
  Azuma Daichi、Miyanishi Taiki、Kurita Shuhei、Kawanabe Motoaki
- Organizer
  The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- Related Report
  2022 Research-status Report
- Int'l Joint Research
[Presentation] ScanQA: 3D Question Answering for Spatial Scene Understanding2022
- Author(s)
  Azuma Daichi、Miyanishi Taiki、Kurita Shuhei、Kawanabe Motoaki
- Organizer
  MIRU2022 第25回画像の認識・理解シンポジウム
- Related Report
  2022 Research-status Report
- Invited

三次元空間情報を用いた実世界質問応答基盤の創出

Principal Investigator

宮西 大樹 株式会社国際電気通信基礎技術研究所, 脳情報通信総合研究所, 研究員 (10737521)

¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000)

Current Status of Research Progress

Reason

Report

Research Products

[Journal Article] Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans2024

Author(s)

Journal Title

Related Report

[Journal Article] CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud Data2023

Author(s)

Journal Title

Related Report

[Presentation] 異なるRGB-Dスキャンを用いたデータセット横断3D言語接地2023

Author(s)

Organizer

Related Report

[Presentation] ScanQA: 3D Question Answering for Spatial Scene Understanding2022

Author(s)

Organizer

Related Report

[Presentation] ScanQA: 3D Question Answering for Spatial Scene Understanding2022

Author(s)

Organizer

Related Report

宮西大樹株式会社国際電気通信基礎技術研究所, 脳情報通信総合研究所, 研究員 (10737521)