2023 Fiscal Year Annual Research Report

Development of Motion Generation Technology to Realize Robots that Perform Various Tasks according to Natural Language Instructions

Research Project

Project/Area Number	21H04910
Research Institution	OMRON SINIC X Corporation
Principal Investigator	橋本敦史オムロンサイニックエックス株式会社, リサーチアドミニストレイティブディビジョン, シニアリサーチャー (80641753)
Co-Investigator(Kenkyū-buntansha)	井上中順東京工業大学, 情報理工学院, 准教授 (10733397) 牛久祥孝オムロンサイニックエックス株式会社, リサーチアドミニストレイティブディビジョン, プリンシパルインベスティゲーター (10784142) 濱屋政志オムロンサイニックエックス株式会社, リサーチアドミニストレイティブディビジョン, シニアリサーチャー (10869176) 松原崇充奈良先端科学技術大学院大学, 先端科学技術研究科, 教授 (20508056) 森信介京都大学, 学術情報メディアセンター, 教授 (90456773) ベルトランエルナンデスクリスティアンカミロオムロンサイニックエックス株式会社, リサーチアドミニストレイティブディビジョン, リサーチエンジニア (30984017)
Project Period (FY)	2021-04-05 – 2024-03-31
Keywords	自然言語処理 / クロスモーダル処理 / ロボティクス
Outline of Annual Research Achievements	当初計画では、行為の7段階モデルのうち、最初の4段階である目標の決定、意図の推定、行為の詳細化、行為の実行に着目して、言語指示に応じて動作する汎用ロボットのフレームワークを明らかにし、コンセプトの実証として、最低限のサラダを作ることができるロボットの実現を目標とした。ただし、目標の決定は自然言語による入力を受け付けることで人間が行うこととしており、実際の技術課題は(a)意図の推定、(b)行為の詳細化、(c)行為の実行の3つである。最終的に、(a)-(c)をそれぞれ完成させることができた。 (b)については当初予定では自動生成を行う予定であったが、物の移動や引き出しの開閉といった単純な動作に関してはRT-Xなど別グループによって実現された。一方でRT-Xのような手法では、学習データが溜まっていない高度なスキルは組み合わせることが不可能であること、学習に含まれないロボットは動かせないこと、また、説明性が皆無であるといった多くの実用上の問題がある。これらの実用上の問題を解決するため、古典的なシンボリックプランナーであるPDDLを(b)に用いることで解決を行う方策を着想し、実現した。 (b)にPDDLを使うことを前提として、(a)では初期状態と目標状態のペアからなる「意図」を観測データと言語指示からPDDLのフォーマットで出力するViLaInという手法を実現した。また、(c)ではサラダに必須となり、かつ、RT-Xなどでは実行不可能な食材の切断スキルを学習する手法を開発した。これらの成果は本実績報告の時点で、ICRA2024というロボティクスのトップカンファレンスで発表済みとなっている。また、PDDLからシンプルなサラダを作成するシステムまで作成し、論理的にはViLaInと接続することで言語指示により非常にシンプルなサラダまでは作ることができるようになった。
Research Progress Status	令和5年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	令和5年度が最終年度であるため、記入しない。

Research Products
(14 results)

All 2024 2023 Other

All Journal Article (3 results) (of which Peer Reviewed: 3 results, Open Access: 2 results) Presentation (8 results) (of which Int'l Joint Research: 6 results, Invited: 1 results) Remarks (3 results)

[Journal Article] Recipe Generation from Unsegmented Cooking Videos2024
- Author(s)
  Nishimura Taichi、Hashimoto Atsushi、Ushiku Yoshitaka、Kameko Hirotaka、Mori Shinsuke
- Journal Title
  
  ACM Transactions on Multimedia Computing, Communications, and Applications
  
  Volume: - Pages: -
- DOI
  10.1145/3649137
- Peer Reviewed / Open Access
[Journal Article] State-aware video procedural captioning2023
- Author(s)
  Nishimura Taichi、Hashimoto Atsushi、Ushiku Yoshitaka、Kameko Hirotaka、Mori Shinsuke
- Journal Title
  
  Multimedia Tools and Applications
  
  Volume: 82 Pages: 37273～37301
- DOI
  10.1007/s11042-023-14774-7
- Peer Reviewed
[Journal Article] 調理動作後の物体の視覚的状態予測を目指した Visual Recipe Flow データセットの構築と評価2023
- Author(s)
  Shirai Keisuke、Hashimoto Atsushi、Nishimura Taichi、Kameko Hirotaka、Kurita Shuhei、Mori Shinsuke
- Journal Title
  
  Journal of Natural Language Processing
  
  Volume: 30 Pages: 1042～1060
- DOI
  10.5715/jnlp.30.1042
- Peer Reviewed / Open Access
[Presentation] Vision-Language Interpreter for Robot Task Planning2024
- Author(s)
  Keisuke Shirai, Cristian C. Beltran-Hernandez, Masashi Hamaya, Atsushi Hashimoto, Shohei Tanaka, Kento Kawaharazuka, Kazutoshi Tanaka, Yoshitaka Ushiku, and Shinsuke Mori
- Organizer
  International Conference on Robotics and Automation
- Int'l Joint Research
[Presentation] PolarDB: Formula-driven Dataset for Pre-training Trajectory Encoders2024
- Author(s)
  Sota Miyamoto, Takuma Yagi, Yuto Makimoto, Mahiro Ukai, Yoshitaka Ushiku, Atsushi Hashimoto, Nakamasa Inoue
- Organizer
  International Conference on Acoustics, Speech, and Signal Processing
- Int'l Joint Research
[Presentation] 「人と機械の融和における生成AIの社会実装」2024
- Author(s)
  橋本敦史
- Organizer
  日本鉄鋼協会　計測・制御・システム工学部会　シンポジウム「生成AIの産業応用における期待と課題」
- Invited
[Presentation] 調理作業理解のための言語資源付き固定視点映像データセットの構築2024
- Author(s)
  橋本敦史, 前田航希, 平澤寅庄, 原島純, Rybicki Leszek, 深澤祐援, 牛久祥孝
- Organizer
  人工知能学会全国大会
[Presentation] SliceIt!--A Dual Simulator Framework for Learning Robot Food Slicing2024
- Author(s)
  Cristian C. Beltran-Hernandez, Nicolas Erbetti, and Masashi Hamaya.
- Organizer
  International Conference on Robotics and Automation
- Int'l Joint Research
[Presentation] Integrated Task and Motion Planning for Real-World Cooking Tasks2024
- Author(s)
  Jeremy Siburian, Cristian Camilo Beltran-Hernandez, Masashi Hamaya
- Organizer
  International Conference on Robotics and Automation Workshop
- Int'l Joint Research
[Presentation] Learning Food Picking without Food: Fracture Anticipation by Breaking Reusable Fragile Objects2023
- Author(s)
  Rinto Yagawa, Rena Ishikawa, Masashi Hamaya, Kazutoshi Tanaka, Atsushi Hashimoto, Hideo Saito
- Organizer
  International Conference on Robotics and Automation
- Int'l Joint Research
[Presentation] Deep Segmented DMP Networks for Learning Discontinuous Motions2023
- Author(s)
  Edgar Anarossi, Hirotaka Tahara, Naoto Komeno, Takamitsu Matsubara
- Organizer
  IEEE International Conference on Automation Science and Engineering
- Int'l Joint Research
[Remarks] Vision-Language Interpreter
- URL
  https://kskshr.github.io/vilain/
[Remarks] SliceIt!
- URL
  https://omron-sinicx.github.io/sliceit/
[Remarks] Integrated TaMP for Real-World Cooking Tasks
- URL
  https://www.youtube.com/watch?v=PS0CYS2NgZY

2023 Fiscal Year Annual Research Report

Development of Motion Generation Technology to Realize Robots that Perform Various Tasks according to Natural Language Instructions

Principal Investigator

橋本 敦史 オムロンサイニックエックス株式会社, リサーチアドミニストレイティブディビジョン, シニアリサーチャー (80641753)

Research Products

[Journal Article] Recipe Generation from Unsegmented Cooking Videos2024

Author(s)

Journal Title

DOI

[Journal Article] State-aware video procedural captioning2023

Author(s)

Journal Title

DOI

[Journal Article] 調理動作後の物体の視覚的状態予測を目指した Visual Recipe Flow データセットの構築と評価2023

Author(s)

Journal Title

DOI

[Presentation] Vision-Language Interpreter for Robot Task Planning2024

Author(s)

Organizer

[Presentation] PolarDB: Formula-driven Dataset for Pre-training Trajectory Encoders2024

Author(s)

Organizer

[Presentation] 「人と機械の融和における生成AIの社会実装」2024

Author(s)

Organizer

[Presentation] 調理作業理解のための言語資源付き固定視点映像データセットの構築2024

Author(s)

Organizer

[Presentation] SliceIt!--A Dual Simulator Framework for Learning Robot Food Slicing2024

Author(s)

Organizer

[Presentation] Integrated Task and Motion Planning for Real-World Cooking Tasks2024

Author(s)

Organizer

[Presentation] Learning Food Picking without Food: Fracture Anticipation by Breaking Reusable Fragile Objects2023

Author(s)

Organizer

[Presentation] Deep Segmented DMP Networks for Learning Discontinuous Motions2023

Author(s)

Organizer

[Remarks] Vision-Language Interpreter

URL

[Remarks] SliceIt!

URL

[Remarks] Integrated TaMP for Real-World Cooking Tasks

URL

橋本敦史オムロンサイニックエックス株式会社, リサーチアドミニストレイティブディビジョン, シニアリサーチャー (80641753)