Multi-modal Deep Learning Model by Disentangling Shape and Style for Analysis of Deep 'SHITSUKAN' Analysis and Synthesis

Publicly Offered Research

Project Area	Analysis and synthesis of deep SHITSUKAN information in the real world
Project/Area Number	21H05812
Research Category	Grant-in-Aid for Transformative Research Areas (A)
Allocation Type	Single-year Grants
Review Section	Transformative Research Areas, Section (IV)
Research Institution	The University of Electro-Communications
Principal Investigator	柳井啓司電気通信大学, 大学院情報理工学研究科, 教授 (20301179)
Project Period (FY)	2021-09-10 – 2023-03-31
Project Status	Completed (Fiscal Year 2022)
Budget Amount *help	¥7,800,000 (Direct Cost: ¥6,000,000、Indirect Cost: ¥1,800,000) Fiscal Year 2022: ¥3,900,000 (Direct Cost: ¥3,000,000、Indirect Cost: ¥900,000) Fiscal Year 2021: ¥3,900,000 (Direct Cost: ¥3,000,000、Indirect Cost: ¥900,000)
Keywords	深層学習 / 画像生成モデル / 基盤モデル / 画像・言語モデル / 質感 / 特徴分離 / 画像生成
Outline of Research at the Start	本研究では，(1)大量の画像と言語のペアデータから画像の質感部分と言語の質感表現の対応付けを自動的に学習し，画像質感特徴量と言語質感特徴量の共通質感埋め込み空間を構築し，画像と言語の双方向検索(認識)を実現する．(2)さらに質感埋め込みベクトルと画像の形状特徴量を融合させることによって，新たな質感を持つ画像生成を実現する．これを統一的に実現する深層学習モデルを提案することが本研究の目的である．提案モデルを用いることで，(A) 大量のデータを用いた画像及び言語表現に関する「深奥な」質感分析の実現，(B) 言語による微妙な画像質感操作の実現，が可能となる．
Outline of Annual Research Achievements	本研究の当初の目的は，(1)大量の画像と言語のペアデータから画像の質感部分と言語の質感表現の対応付けを自動的に学習し，画像質感特徴量と言語質感特徴量の共通質感埋め込み空間を構築し，画像と言語の双方向検索(認識)を実現，(2)さらに質感埋め込みベクトルと画像の形状特徴量を融合させることによって，新たな質感を持つ画像生成を実現する，ことで，これを統一的に実現する深層学習モデルを提案することを目標としていた．これに対して，本研究では2年間の研究期間の間に，次の3点の研究成果を得た．(1)クロスモーダルレシピデータセットを用いて，言語と画像双方から埋め込み可能なレシピ情報空間中のレシピベクトルと，食事の形状特徴を融合させることで，任意形状のレシピ情報に基づく食事画像生成を実現した．(2)事前学習済の画像・言語のクロスモーダル巨大モデルCLIPを用いて，画像の質感操作を実現し，その操作の度合を自由に制御する方法を提案した．(3)微分可能レンダラーを用いたフォント生成に対してCLIPを適用して，任意の言葉に対応したスタイルをもつフォント画像の生成手法も提案した．
Research Progress Status	令和4年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	令和4年度が最終年度であるため、記入しない。

Report

(2 results)

2022 Annual Research Report
2021 Annual Research Report

Research Products
(17 results)

All 2023 2022 2021

All Journal Article (2 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 2 results, Open Access: 2 results) Presentation (15 results) (of which Int'l Joint Research: 12 results)

[Journal Article] Material Translation Based on Neural Style Transfer with Ideal Style Image Retrieval2022
- Author(s)
  Benitez-Garcia Gibran、Takahashi Hiroki、Yanai Keiji
- Journal Title
  
  Sensors
  
  Volume: 22 Issue: 19 Pages: 7317-7317
- DOI
  10.3390/s22197317
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] FASSD-Net: Fast and Accurate Real-Time Semantic Segmentation for Embedded Systems2021
- Author(s)
  Rosas-Arias Leonel、Benitez-Garcia Gibran、Portillo-Portillo Jose、Olivares-Mercado Jesus、Sanchez-Perez Gabriel、Yanai Keiji
- Journal Title
  
  IEEE Transactions on Intelligent Transportation Systems
  
  Volume: - Issue: 9 Pages: 1-12
- DOI
  10.1109/tits.2021.3127553
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] Patent Image RetrievalUsing Cross-entropy-based Metric Learning2023
- Author(s)
  Kotaro Higuchi,Yuma Honbu,Keiji Yanai
- Organizer
  Proc.of International Workshop on Frontiers of Computer Vision (IW-FCV),
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Virtual Try-On Considering Temporal Consistency for Videoconferencing.2023
- Author(s)
  Daiki Shimizu,Keiji Yanai
- Organizer
  Proc. of the International Multimedia Modeling Conference (MMM)
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Transformer-Based Cross-Modal Recipe Embeddings with Large Batch Training.2023
- Author(s)
  Jing Yang,Junwen Chen,Keiji Yanai
- Organizer
  Proc. of the International Multimedia Modeling Conference (MMM)
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Zero-shot Font Style Transfer with a Differentiable Renderer2022
- Author(s)
  Kota Izumi,Keiji Yanai
- Organizer
  Proc. of ACM Multimedia Asia
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Parallel Queries for Human-Object Interaction Detection2022
- Author(s)
  Junwen Chen,Keiji Yanai
- Organizer
  Proc. of ACM Multimedia Asia
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] SetMealAsYouLike: Sketch-based Set Meal Image Synthesis with Plate Annotations2022
- Author(s)
  Yuma Honbu,Keiji Yanai
- Organizer
  Proc. of ACMMM Workshop on Multimedia Assisted Dietary Management (MADIMA)
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] DepthGrillCam: A Mobile Application for Real-time Eating Action Recording Using RGB-D Images2022
- Author(s)
  Kento Adachi,Keiji Yanai
- Organizer
  Proc. of ACMMM Workshop on Multimedia Assisted Dietary Management (MADIMA)
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Text-based Image Editing for Food Images with CLIP2022
- Author(s)
  Kohei Yamamoto,Keiji Yanai
- Organizer
  Proc. of ACMMM Workshop on Multimedia Assisted Dietary Management (MADIMA)
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Real Scale 3D Reconstruction of a Dish and a Plate using Implicit Function and a Single RGB-D Image2022
- Author(s)
  Shu Naritomi,Keiji Yanai
- Organizer
  Proc. of ACMMM Workshop on Multimedia Assisted Dietary Management (MADIMA)
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Continual Learning in Vision Transformer2022
- Author(s)
  Mana Takeda,Keiji Yanai
- Organizer
  Proc.of IEEE International Conference on Image Processing (ICIP)
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] StyleGAN-based CLIP-guided Image Shape Manipulation2022
- Author(s)
  Yuchen Qian,Kohei Yamamoto,Keiji Yanai
- Organizer
  Proc.of International Conference on Content-based Multimedia Indexing (CBMI)
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Unseen Food Segmentation2022
- Author(s)
  Yuma Honbu,Keiji Yanai
- Organizer
  Proc.of ACM International Conference on Multimedia Retrieval (ICMR)
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] クロスモーダルレシピエンベッティングによるマスクに基づく食事画像生成2022
- Author(s)
  陳　仲涛，本部勇真，柳井啓司
- Organizer
  電子情報通信学会パターン認識・メディア理解研究会（PRMU）
- Related Report
  2021 Annual Research Report
[Presentation] Transformerを用いたクロスモーダルレシピ検索・画像生成2022
- Author(s)
  楊　景，柳井啓司
- Organizer
  電子情報通信学会パターン認識・メディア理解研究会（PRMU）
- Related Report
  2021 Annual Research Report
[Presentation] StyleGANによるCLIP-Guidedな画像形状特徴編集2022
- Author(s)
  銭　雨晨，柳井啓司
- Organizer
  電子情報通信学会パターン認識・メディア理解研究会（PRMU）
- Related Report
  2021 Annual Research Report

Multi-modal Deep Learning Model by Disentangling Shape and Style for Analysis of Deep 'SHITSUKAN' Analysis and Synthesis

Principal Investigator

柳井 啓司 電気通信大学, 大学院情報理工学研究科, 教授 (20301179)

¥7,800,000 (Direct Cost: ¥6,000,000、Indirect Cost: ¥1,800,000)

Report

Research Products

[Journal Article] Material Translation Based on Neural Style Transfer with Ideal Style Image Retrieval2022

Author(s)

Journal Title

DOI

Related Report

[Journal Article] FASSD-Net: Fast and Accurate Real-Time Semantic Segmentation for Embedded Systems2021

Author(s)

Journal Title

DOI

Related Report

[Presentation] Patent Image RetrievalUsing Cross-entropy-based Metric Learning2023

Author(s)

Organizer

Related Report

[Presentation] Virtual Try-On Considering Temporal Consistency for Videoconferencing.2023

Author(s)

Organizer

Related Report

[Presentation] Transformer-Based Cross-Modal Recipe Embeddings with Large Batch Training.2023

Author(s)

Organizer

Related Report

[Presentation] Zero-shot Font Style Transfer with a Differentiable Renderer2022

Author(s)

Organizer

Related Report

[Presentation] Parallel Queries for Human-Object Interaction Detection2022

Author(s)

Organizer

Related Report

[Presentation] SetMealAsYouLike: Sketch-based Set Meal Image Synthesis with Plate Annotations2022

Author(s)

Organizer

Related Report

[Presentation] DepthGrillCam: A Mobile Application for Real-time Eating Action Recording Using RGB-D Images2022

Author(s)

Organizer

Related Report

[Presentation] Text-based Image Editing for Food Images with CLIP2022

Author(s)

Organizer

Related Report

[Presentation] Real Scale 3D Reconstruction of a Dish and a Plate using Implicit Function and a Single RGB-D Image2022

Author(s)

Organizer

Related Report

[Presentation] Continual Learning in Vision Transformer2022

Author(s)

Organizer

Related Report

[Presentation] StyleGAN-based CLIP-guided Image Shape Manipulation2022

Author(s)

Organizer

Related Report

[Presentation] Unseen Food Segmentation2022

Author(s)

Organizer

Related Report

[Presentation] クロスモーダルレシピエンベッティングによるマスクに基づく食事画像生成2022

Author(s)

Organizer

Related Report

[Presentation] Transformerを用いたクロスモーダルレシピ検索・画像生成2022

Author(s)

Organizer

Related Report

[Presentation] StyleGANによるCLIP-Guidedな画像形状特徴編集2022

Author(s)

Organizer

Related Report

柳井啓司電気通信大学, 大学院情報理工学研究科, 教授 (20301179)