Analysis of the visual characteristics of language information and its application to multimedia integrated processing

Research Project

Project/Area Number	23K24868
Project/Area Number (Other)	22H03612 (2022-2023)
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Multi-year Fund (2024) Single-year Grants (2022-2023)
Section	一般
Review Section	Basic Section 61010:Perceptual information processing-related
Research Institution	Nagoya University
Principal Investigator	井手一郎名古屋大学, 情報学研究科, 教授 (10332157)
Co-Investigator(Kenkyū-buntansha)	平山高嗣人間環境大学, 環境科学部, 教授 (10423021) 駒水孝裕名古屋大学, 数理・データ科学教育研究センター, 准教授 (30756367) 川西康友国立研究開発法人理化学研究所, 情報統合本部, チームリーダー (50755147) 道満恵介中京大学, 工学部, 准教授 (90645748) KASTNER MarcAurel 広島市立大学, 情報科学研究科, 助教 (30966700)
Project Period (FY)	2022-04-01 – 2026-03-31
Project Status	Granted (Fiscal Year 2024)
Budget Amount *help	¥17,290,000 (Direct Cost: ¥13,300,000、Indirect Cost: ¥3,990,000) Fiscal Year 2025: ¥4,550,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥1,050,000) Fiscal Year 2024: ¥3,900,000 (Direct Cost: ¥3,000,000、Indirect Cost: ¥900,000) Fiscal Year 2023: ¥4,420,000 (Direct Cost: ¥3,400,000、Indirect Cost: ¥1,020,000) Fiscal Year 2022: ¥4,420,000 (Direct Cost: ¥3,400,000、Indirect Cost: ¥1,020,000)
Keywords	言語情報 / 視覚情報 / マルチメディア / 統合処理 / 印象
Outline of Research at the Start	いわゆる「セマンティックギャップ」を越えて言語情報と視覚情報を関連付けるための方法論を提案する．従来，視覚情報から言語情報を表現する特徴を抽出する方法論，いわば「視覚情報がもつ言語的性質」の解明について取り組まれてきたのと逆に，「言語情報がもつ視覚的性質」の解明に取り組む．これは従来，高コストの主観評価実験によって定量化されてきたが，画像生成技術を用いたデータ駆動型手法で，これを低コストで定量化する．また，印象の程度に応じて挙動が変化する応用事例を通じて，視覚情報の言語的性質及び言語情報の視覚的性質の両者に基づいてセマンティックギャップを縮小したうえで，マルチメディア統合処理の効果を実証する．
Outline of Annual Research Achievements	本研究課題では，言語情報がもつ様々な視覚的性質として，事象に内在する静的印象と，事象の動きに関する動的印象に分けて分析し，与えられた言語情報がそれらをどの程度もっているか定量化する手法を提案する．さらに，それらの印象の程度に基づいて挙動が変化するマルチメディア統合処理による応用事例を提案する．具体的には，言語情報がもつ視覚的性質を明らかにするために，【課題1】名詞に注目した，事象に内在する静的印象の定量化，【課題2】動詞に注目した，事象の動きに関する動的印象の定量化，という2つの課題に取り組む．また，印象の程度に応じて挙動が変化するマルチメディア統合処理による応用事例において，提案する方法論の有効性を実証的に明らかにする．令和5年度は，【課題1】について，令和4年度に引き続き，単語に対する静的印象の推定手法を実現する第一段階として，未知語の印象を反映した画像生成手法について検討した．また，【課題2】について，動的印象を推定するモデルを直接構築せずに，生成されるキャプションの動的印象をパラメトリックに制御した画像キャプショニング手法を開発した．また，これらの研究を進める過程で，言語の発音が印象に与える影響について注目するようになり，複数の応用事例においてその効果を検証すべく初期検討を行った．さらに，これらのキャプショニング技術の応用事例として，複数画像のキャプショニング手法について検討した．
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 【課題1】【課題2】ともに概ね当初の計画通り進展しているが，研究を進める過程で，言語の発音が印象に与える影響について新たに注目するようになり，その効果を検証すべく初期検討を行っている．
Strategy for Future Research Activity	当初の計画にはなかった着目点として，言語の発音が印象に与える影響について新たに注目するようになったため，応用事例においてその効果を検証する．

Report

(2 results)

2023 Annual Research Report
2022 Annual Research Report

Research Products
(25 results)

All 2024 2023 2022 Other

All Int'l Joint Research (2 results) Journal Article (5 results) (of which Int'l Joint Research: 2 results, Peer Reviewed: 5 results, Open Access: 4 results) Presentation (18 results) (of which Int'l Joint Research: 8 results, Invited: 3 results)

[Int'l Joint Research] アムステルダム大学(オランダ)
- Related Report
  2023 Annual Research Report
[Int'l Joint Research] North Carolina State University(米国)
- Related Report
  2022 Annual Research Report
[Journal Article] Image-Collection Summarization Using Scene-Graph Generation With External Knowledge2024
- Author(s)
  Phueaksri Itthisak、Kastner Marc A.、Kawanishi Yasutomo、Komamizu Takahiro、Ide Ichiro
- Journal Title
  
  IEEE Access
  
  Volume: 12 Pages: 17499-17512
- DOI
  10.1109/access.2024.3360113
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Computational measurement of perceived pointiness from pronunciation2024
- Author(s)
  Chihaya Matsuhira, Marc Aurel Kastner, Takahiro Komamizu, Ichiro Ide, Takatsugu Hirayama, Yasutomo Kawanishi, Keisuke Doman, Daisuke Deguchi
- Journal Title
  
  Multimedia Tools and Applications
  
  Volume: 83 Issue: 9 Pages: 26183-26210
- DOI
  10.1007/s11042-023-15732-z
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Interpolating the text-to-image correspondence based on phonetic and phonological similarities for nonword-to-image generation2024
- Author(s)
  Chihaya Matsuhira, Marc Aurel Kastner, Takahiro Komamizu, Takatsugu Hirayama, Keisuke Doman, Yasutomo Kawanishi, Ichiro Ide
- Journal Title
  
  IEEE Access
  
  Volume: 12 Pages: 41299-41316
- DOI
  10.1109/access.2024.3378095
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] An Approach to Generate a Caption for an Image Collection Using Scene Graph Generation2023
- Author(s)
  Phueaksri Itthisak、Kastner Marc A.、Kawanishi Yasutomo、Komamizu Takahiro、Ide Ichiro
- Journal Title
  
  IEEE Access
  
  Volume: 11 Pages: 128245-128260
- DOI
  10.1109/access.2023.3332098
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Towards Captioning an Image Collection from a Combined Scene Graph Representation Approach2023
- Author(s)
  Phueaksri Itthisak、Kastner Marc A.、Kawanishi Yasutomo、Komamizu Takahiro、Ide Ichiro
- Journal Title
  
  Lecture Notes in Computer Science book series
  
  Volume: 13833 Pages: 178-190
- DOI
  10.1007/978-3-031-27077-2_14
- ISBN
  9783031270765, 9783031270772
- Related Report
  2022 Annual Research Report
- Peer Reviewed
[Presentation] 歌詞の自動翻訳のための発音を考慮した訳語選択に関する研究2024
- Author(s)
  池田昂太郎，松平茅隼，加藤大貴，平山高嗣，駒水孝裕，井手一郎
- Organizer
  電子情報通信学会メディアエクスペリエンス・バーチャル環境基礎研究会
- Related Report
  2023 Annual Research Report
[Presentation] Image impression estimation by clustering people with similar tastes2023
- Author(s)
  Banri Kojima, Takahiro Komamizu, Yasutomo Kawanishi, Keisuke Doman, Ichiro Ide
- Organizer
  Proc. 18th Int. Conf. on Machine Vision Applications (MVA2023)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Toward scene graph summarization enhancing relation predictor with external knowledge2023
- Author(s)
  Itthisak Phueaksri, Marc Aurel Kastner, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide
- Organizer
  画像の認識・理解シンポジウム（MIRU）2023
- Related Report
  2023 Annual Research Report
[Presentation] Leverage semantic alignment of object relations for image captioning2023
- Author(s)
  Da Huo, Marc Aurel Kastner, Takatsugu Hirayama, Takahiro Komamizu, Ichiro Ide
- Organizer
  画像の認識・理解シンポジウム（MIRU）2023
- Related Report
  2023 Annual Research Report
[Presentation] 類音語の連想性を考慮した未知語の発音に対する画像生成2023
- Author(s)
  松平茅隼，カストナーマークアウレル，駒水孝裕，平山高嗣，道満恵介，井手一郎
- Organizer
  画像の認識・理解シンポジウム（MIRU）2023
- Related Report
  2023 Annual Research Report
[Presentation] Nonword-to-image generation considering perceptual association of phonetically similar words2023
- Author(s)
  Chihaya Matsuhira, Marc Aurel Kastner, Takahiro Komamizu, Takatsugu Hirayama, Keisuke Doman, Ichiro Ide
- Organizer
  1st Int. Workshop on Multimedia Content Generation and Practice (McGE'23) in conjunction with ACM MM2023
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Discovering phonesthemic clusters in readings of Kanji characters toward exploring phonestheme in Japanese2023
- Author(s)
  Akira Yoshida, Chihaya Matsuhira, Hirotaka Kato, Takatsugu Hirayama, Takahiro Komamizu, Ichiro Ide
- Organizer
  37th Pacific Asia Conf. on Language, Information and Computation (PACLIC 37)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Towards captioning an image collection from a combined scene graph representation approach2023
- Author(s)
  Itthisak Phueaksri, Marc Aurel Kastner, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide
- Organizer
  29th Int Conf on Multimedia Modeling (MMM2023)
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] 発話の交換を考慮した対話システムにおけるユーザ感情推定手法の検討2023
- Author(s)
  宮川由衣，加藤大貴，松平茅隼，平山高嗣，駒水孝裕，井手一郎
- Organizer
  言語処理学会第29回年次大会
- Related Report
  2022 Annual Research Report
[Presentation] 漢字の音読みにおける象徴素のデータ駆動的探索の試み2023
- Author(s)
  吉田　晶，松平茅隼，加藤大貴，平山高嗣，駒水孝裕，井手一郎
- Organizer
  言語処理学会第29回年次大会
- Related Report
  2022 Annual Research Report
[Presentation] Evoked emotion distribution learning through analysis of temporal user comments in social media videos2023
- Author(s)
  Yiming Wang, Marc A. Kastner, Da Huo, Takahiro Komamizu, Takatsugu Hirayama, Ichiro Ide
- Organizer
  電子情報通信学会メディアエクスペリエンス・バーチャル環境基礎研究会
- Related Report
  2022 Annual Research Report
[Presentation] Tailoring applications to users through multi-modal understanding2022
- Author(s)
  Ichiro Ide
- Organizer
  1st Int. Workshop on Multimodal Understanding for the Web and Social Media (MUWS2022) in conjunction with ACM TheWebConf2022
- Related Report
  2022 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] Challenges on bridging the gap between Vision and Language (V&L) information2022
- Author(s)
  Ichiro Ide
- Organizer
  28th Int. Conf. on MultiMedia Modeling (MMM2022)
- Related Report
  2022 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] Towards captioning an image collection using scene graph2022
- Author(s)
  Itthisak Phueaksri, Marc Aurel Kastner, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide
- Organizer
  第25回画像の認識・理解シンポジウム（MIRU2022）
- Related Report
  2022 Annual Research Report
[Presentation] 画像生成を介した語感から受ける印象の可視化の検討2022
- Author(s)
  松平茅隼，カストナーマークアウレル，駒水孝裕，平山高嗣，道満恵介，川西康友，井手一郎
- Organizer
  第25回画像の認識・理解シンポジウム（MIRU2022）
- Related Report
  2022 Annual Research Report
[Presentation] On estimating evoked emotions of social media videos through user comments analysis2022
- Author(s)
  Yiming Wang, Marc Aurel Kastner, Takahiro Komamizu, Yasutomo Kawanishi, Takatsugu Hirayama, Ichiro Ide
- Organizer
  第25回画像の認識・理解シンポジウム（MIRU2022）
- Related Report
  2022 Annual Research Report
[Presentation] Action semantic alignment for image captioning2022
- Author(s)
  Da Huo, Marc A. Kastner, Takahiro Komamizu, Ichiro Ide
- Organizer
  5th IEEE Int. Conf. on Multimedia Information Processing and Retrieval (MIPR2022)
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Intuitive gait modeling using mimetic-words for gait description and generation2022
- Author(s)
  Hirotaka Kato, Takatsugu Hirayama, Keisuke Doman, Ichiro Ide, Yasutomo Kawanishi, Takahiro Komamizu, Daisuke Deguchi, Hiroshi Murase
- Organizer
  5th IEEE Int. Conf. on Multimedia Information Processing and Retrieval (MIPR2022)
- Related Report
  2022 Annual Research Report
- Int'l Joint Research / Invited

Analysis of the visual characteristics of language information and its application to multimedia integrated processing

Principal Investigator

井手 一郎 名古屋大学, 情報学研究科, 教授 (10332157)

¥17,290,000 (Direct Cost: ¥13,300,000、Indirect Cost: ¥3,990,000)

Current Status of Research Progress

Reason

Report

Research Products

[Int'l Joint Research] アムステルダム大学(オランダ)

Related Report

[Int'l Joint Research] North Carolina State University(米国)

Related Report

[Journal Article] Image-Collection Summarization Using Scene-Graph Generation With External Knowledge2024

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Computational measurement of perceived pointiness from pronunciation2024

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Interpolating the text-to-image correspondence based on phonetic and phonological similarities for nonword-to-image generation2024

Author(s)

Journal Title

DOI

Related Report

[Journal Article] An Approach to Generate a Caption for an Image Collection Using Scene Graph Generation2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Towards Captioning an Image Collection from a Combined Scene Graph Representation Approach2023

Author(s)

Journal Title

DOI

ISBN

Related Report

[Presentation] 歌詞の自動翻訳のための発音を考慮した訳語選択に関する研究2024

Author(s)

Organizer

Related Report

[Presentation] Image impression estimation by clustering people with similar tastes2023

Author(s)

Organizer

Related Report

[Presentation] Toward scene graph summarization enhancing relation predictor with external knowledge2023

Author(s)

Organizer

Related Report

[Presentation] Leverage semantic alignment of object relations for image captioning2023

Author(s)

Organizer

Related Report

[Presentation] 類音語の連想性を考慮した未知語の発音に対する画像生成2023

Author(s)

Organizer

Related Report

[Presentation] Nonword-to-image generation considering perceptual association of phonetically similar words2023

Author(s)

Organizer

Related Report

[Presentation] Discovering phonesthemic clusters in readings of Kanji characters toward exploring phonestheme in Japanese2023

Author(s)

Organizer

Related Report

[Presentation] Towards captioning an image collection from a combined scene graph representation approach2023

Author(s)

Organizer

Related Report

[Presentation] 発話の交換を考慮した対話システムにおけるユーザ感情推定手法の検討2023

Author(s)

Organizer

Related Report

[Presentation] 漢字の音読みにおける象徴素のデータ駆動的探索の試み2023

Author(s)

Organizer

Related Report

[Presentation] Evoked emotion distribution learning through analysis of temporal user comments in social media videos2023

Author(s)

井手一郎名古屋大学, 情報学研究科, 教授 (10332157)