専門知に基づいて画像を理解し説明する対話型AIの実現

Research Project

Project/Area Number	23H00482
Research Category	Grant-in-Aid for Scientific Research (A)
Allocation Type	Single-year Grants
Section	一般
Review Section	Medium-sized Section 61:Human informatics and related fields
Research Institution	Tohoku University
Principal Investigator	岡谷貴之東北大学, 情報科学研究科, 教授 (00312637)
Co-Investigator(Kenkyū-buntansha)	菅沼雅徳東北大学, 情報科学研究科, 助教 (00815813)
Project Period (FY)	2023-04-01 – 2027-03-31
Project Status	Granted (Fiscal Year 2025)
Budget Amount *help	¥46,800,000 (Direct Cost: ¥36,000,000、Indirect Cost: ¥10,800,000) Fiscal Year 2025: ¥11,570,000 (Direct Cost: ¥8,900,000、Indirect Cost: ¥2,670,000) Fiscal Year 2024: ¥11,570,000 (Direct Cost: ¥8,900,000、Indirect Cost: ¥2,670,000) Fiscal Year 2023: ¥13,260,000 (Direct Cost: ¥10,200,000、Indirect Cost: ¥3,060,000)
Keywords	コンピュータビジョン / マルチモーダルAI / 深層学習 / 知識獲得 / 画像理解 / 車載カメラ画像 / 自然言語 / 人工知能 / 対話型AI
Outline of Research at the Start	画像に写る物・事象を理解し、自然言語で説明する対話型のAIであって、特に専門知に基づいて行う判断・意思決定において人をサポート可能なものを実現する。そのためには専門知を取り込み、画像理解に利用できるマルチモーダル表現として保持し、さらにそれに基づく仮説推論を行える必要がある。専門書や論文などのテキストデータとして得られる専門知を視覚概念と対応付ける方法などの必要な方法を実現し、目標を達成する。
Outline of Annual Research Achievements	研究実施計画に従って研究を進め、以下のような成果を得た。まず、車載カメラの画像から運転時の危険を予測・説明するタスクを設計し、そのためのデータセットDHPR（Driving Hazard Prediction and Reasoning）を作った。既存の車載画像のデータセットの画像に，クラウドソーシングでアノテーションを行った。その結果をまとめたものは現在、論文誌に投稿中である。研究項目「画像記述表現高度化」では、マルチモーダルAIのための画像特徴抽出の方法を研究した。さらに、画像からの異常検知を題材に、画質記述表現の高度化につながる特徴抽出に関する研究を複数行い、国際会議INDIN2023、 WACV2024等で発表した。また、画像セグメンテーションのための無教師ドメイン適応手法を新たに開発し、Computer Vision and Image Understanding誌にて発表した。研究項目「知識の表現・利用方法」では、上述の画像特徴抽出方法を大規模言語モデルと統合したマルチモーダルAIモデルを構築した。上述のDHPRを用いて、複数のモデルの学習（文脈内学習含む）と推論性能の評価実験を行い、一定の精度で推論を行えることと、実用レベルまでには改善の余地が残ることを確認した。また、橋梁の画像点検タスクを対象に、橋梁の変状を認識し説明可能なマルチモーダルAIのモデル構築と性能評価を行った。成果はComputer-Aided Civil and Infrastructure Engineering誌にて発表した。さらに、未知の屋内環境を探索し、地図を構築するタスクを対象に、暗黙知を学習し推論に活用できるAIモデルの研究を行い、International Journal of Computer Vision誌にて発表した。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 車載カメラ画像を対象とした運転リスク予測を対象としたデータセットDHPRを世界に先駆けて構築し、モデルの評価を行い、医療画像診断のための既存のマルチモーダルAIの性能評価を行うなど、順調に研究を進めることが出来ている。成果は、International Journal of Computer Vision誌、Computer Vision and Image Understanding誌、Computer Aided Civil and Infrastracture Engineering誌など、関連分野のトップレベルの論文誌複数に論文が採択されている。
Strategy for Future Research Activity	いわゆる生成AIの研究開発、特に大規模言語モデル（LLM）や、LLMをマルチモーダル入力を扱えるように拡張したものの進展が著しい。メジャーなテック企業が、これらのモデルの学習を、百億円オーダーの金額を計算機使用料に充てて大規模に行うことが常態化している。このような中にあって限られた予算で優れた研究を行うべく、最先端の動向を把握し、未解決の問題を的確に見極めることを重視している。幸い、現行のAIにはその規模のいかんによらず明確な限界があるとわれわれは考えており、研究すべき残された課題の特定はうまくできていると考えている。

Report

(2 results)

2023 Comments on the Screening Results Annual Research Report

Research Products
(14 results)

All 2024 2023

All Journal Article (10 results) (of which Peer Reviewed: 10 results, Open Access: 9 results) Presentation (4 results) (of which Invited: 4 results)

[Journal Article] SBCFormer: Lightweight Network Capable of Full-size ImageNet Classification at 1 FPS on Single Board Computers2024
- Author(s)
  Lu Xiangyong、Suganuma Masanori、Okatani Takayuki
- Journal Title
  
  Proceedings of IEEE/CVF Winter Conference on Applications of Computer Vision
  
  Volume: - Pages: 1112-1122
- DOI
  10.1109/wacv57701.2024.00116
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] That’s BAD: blind anomaly detection by implicit local feature clustering2024
- Author(s)
  Zhang Jie、Suganuma Masanori、Okatani Takayuki
- Journal Title
  
  Machine Vision and Applications
  
  Volume: 35 Issue: 2
- DOI
  10.1007/s00138-024-01511-9
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Contextual Affinity Distillation for Image Anomaly Detection2024
- Author(s)
  Zhang Jie、Suganuma Masanori、Okatani Takayuki
- Journal Title
  
  Proceedings of IEEE/CVF Winter Conference on Applications of Computer Vision
  
  Volume: - Pages: 148-157
- DOI
  10.1109/wacv57701.2024.00022
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Improving visual question answering for bridge inspection by pre‐training with external data of image?text pairs2023
- Author(s)
  Kunlamai Thannarot、Yamane Tatsuro、Suganuma Masanori、Chun Pang‐Jo、Okatani Takayaki
- Journal Title
  
  Computer-Aided Civil and Infrastructure Engineering
  
  Volume: 39 Issue: 3 Pages: 345-361
- DOI
  10.1111/mice.13086
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Symmetry-aware Neural Architecture for Embodied Visual Navigation2023
- Author(s)
  Liu Shuang、Suganuma Masanori、Okatani Takayuki
- Journal Title
  
  International Journal of Computer Vision
  
  Volume: 132 Issue: 4 Pages: 1091-1107
- DOI
  10.1007/s11263-023-01909-4
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Unsupervised domain adaptation for semantic segmentation via cross-region alignment2023
- Author(s)
  Wang Zhijie、Liu Xing、Suganuma Masanori、Okatani Takayuki
- Journal Title
  
  Computer Vision and Image Understanding
  
  Volume: 234 Pages: 103743-103743
- DOI
  10.1016/j.cviu.2023.103743
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] How Do Label Errors Affect Thin Crack Detection by DNNs2023
- Author(s)
  Xu Liang、Zou Han、Okatani Takayuki
- Journal Title
  
  Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
  
  Volume: - Pages: 4414-4423
- DOI
  10.1109/cvprw59228.2023.00464
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Geometry Enhanced Reference-based Image Super-resolution2023
- Author(s)
  Zou Han、Xu Liang、Okatani Takayuki
- Journal Title
  
  Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
  
  Volume: - Pages: 6124-6133
- DOI
  10.1109/cvprw59228.2023.00652
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Accurate Single-Image Defocus Deblurring Based on Improved Integration with Defocus Map Estimation2023
- Author(s)
  Ye Qian、Suganuma Masanori、Okatani Takayuki
- Journal Title
  
  Proceedings of International Conference on Image Processing
  
  Volume: - Pages: 750-754
- DOI
  10.1109/icip49359.2023.10223146
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Network Pruning and Fine-tuning for Few-shot Industrial Image Anomaly Detection2023
- Author(s)
  Zhang Jie、Suganuma Masanori、Okatani Takayuki
- Journal Title
  
  Proceedings of IEEE International Conference on Industrial Informatics
  
  Volume: - Pages: 1-6
- DOI
  10.1109/indin51400.2023.10218283
- Related Report
  2023 Annual Research Report
- Peer Reviewed
[Presentation] 深層学習が牽引するAIの現在と今後2023
- Author(s)
  岡谷貴之
- Organizer
  日本医学物理学会
- Related Report
  2023 Annual Research Report
- Invited
[Presentation] コンピュータビジョンにおける深層学習モデルの現状2023
- Author(s)
  岡谷貴之
- Organizer
  日本心理学会87大会
- Related Report
  2023 Annual Research Report
- Invited
[Presentation] 主に画像を対象とするAI(深層学習)の現在と今後2023
- Author(s)
  岡谷貴之
- Organizer
  自動制御連合会
- Related Report
  2023 Annual Research Report
- Invited
[Presentation] Applying Vision and Language AI to Real-World Problems to Real-World Problems2023
- Author(s)
  Takayuki Okatani
- Organizer
  国立精華大学と東北大学合同ワークショップ
- Related Report
  2023 Annual Research Report
- Invited

専門知に基づいて画像を理解し説明する対話型AIの実現

Principal Investigator

岡谷 貴之 東北大学, 情報科学研究科, 教授 (00312637)

¥46,800,000 (Direct Cost: ¥36,000,000、Indirect Cost: ¥10,800,000)

Current Status of Research Progress

Reason

Report

Research Products

[Journal Article] SBCFormer: Lightweight Network Capable of Full-size ImageNet Classification at 1 FPS on Single Board Computers2024

Author(s)

Journal Title

DOI

Related Report

[Journal Article] That’s BAD: blind anomaly detection by implicit local feature clustering2024

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Contextual Affinity Distillation for Image Anomaly Detection2024

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Improving visual question answering for bridge inspection by pre‐training with external data of image?text pairs2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Symmetry-aware Neural Architecture for Embodied Visual Navigation2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Unsupervised domain adaptation for semantic segmentation via cross-region alignment2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] How Do Label Errors Affect Thin Crack Detection by DNNs2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Geometry Enhanced Reference-based Image Super-resolution2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Accurate Single-Image Defocus Deblurring Based on Improved Integration with Defocus Map Estimation2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Network Pruning and Fine-tuning for Few-shot Industrial Image Anomaly Detection2023

Author(s)

Journal Title

DOI

Related Report

[Presentation] 深層学習が牽引するAIの現在と今後2023

Author(s)

Organizer

Related Report

[Presentation] コンピュータビジョンにおける 深層学習モデルの現状2023

Author(s)

Organizer

Related Report

[Presentation] 主に画像を対象とするAI(深層学習)の現在と今後2023

Author(s)

Organizer

Related Report

[Presentation] Applying Vision and Language AI to Real-World Problems to Real-World Problems2023

Author(s)

Organizer

Related Report

岡谷貴之東北大学, 情報科学研究科, 教授 (00312637)

[Presentation] コンピュータビジョンにおける深層学習モデルの現状2023