授業における非言語モダリティセンシングの高度化による情報保障・授業改善支援技術

Research Project

Project/Area Number	23K20729
Project/Area Number (Other)	21H00901 (2021-2023)
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Multi-year Fund (2024) Single-year Grants (2021-2023)
Section	一般
Review Section	Basic Section 09070:Educational technology-related
Research Institution	University of Yamanashi
Principal Investigator	西崎博光山梨大学, 大学院総合研究部, 教授 (40362082)
Co-Investigator(Kenkyū-buntansha)	豊浦正広山梨大学, 大学院総合研究部, 教授 (80550780) 北岡教英豊橋技術科学大学, 工学(系)研究科(研究院), 教授 (10333501) 小林彰夫大和大学, 情報学部, 教授 (10741168) 宇津呂武仁筑波大学, システム情報系, 教授 (90263433)
Project Period (FY)	2021-04-01 – 2025-03-31
Project Status	Granted (Fiscal Year 2024)
Budget Amount *help	¥15,860,000 (Direct Cost: ¥12,200,000、Indirect Cost: ¥3,660,000) Fiscal Year 2024: ¥2,470,000 (Direct Cost: ¥1,900,000、Indirect Cost: ¥570,000) Fiscal Year 2023: ¥4,810,000 (Direct Cost: ¥3,700,000、Indirect Cost: ¥1,110,000) Fiscal Year 2022: ¥4,940,000 (Direct Cost: ¥3,800,000、Indirect Cost: ¥1,140,000) Fiscal Year 2021: ¥3,640,000 (Direct Cost: ¥2,800,000、Indirect Cost: ¥840,000)
Keywords	音声認識 / 音声印象評価 / 機械翻訳 / 文字認識 / 高齢者音声認識 / 音声認識誤り訂正 / OCR / 話し方の特徴 / 話し方評価 / 情報保障 / 非言語モダリティセンシング / 音声字幕 / 字幕作成 / 行動分析 / 話し方分析 / 非言語現象の整形 / 印象評定 / 授業改善 / 教室映像解析 / 非言語情報 / 深層学習 / 授業改善支援 / 映像分析
Outline of Research at the Start	本研究は、対面・遠隔授業の質の向上を目的とし，非言語モダリティセンシング技術を高度化することで，情報保障・授業改善支援技術の開発を行う。音声認識による授業音声の字幕・翻訳化，講師の音声の話し方評価とフィードバック，映像・音響センシングによる参加度・理解度評価，を組み合わせた「情報保障・授業改善支援システム」を構築し，教育改善を図っていく。
Outline of Annual Research Achievements	本研究の目的は，様々な形態で実施される授業において，講師の授業音声や受講者側の映像，教室内に設置されたカメラやマイクから取得した情報に含まれる非言語モダリティセンシング基盤技術の高度化を目的とする。そして，それを基に情報保障・授業改善支援技術を開発し，学生に対する授業のインプットの質を向上させ，教育改善が実現できることを実証する。2023年度の実績は次のとおりである。【非言語現象を考慮した音声認識と機械翻訳技術】音声認識の阻害要因には，不明瞭性，雑音・残響環境，言い直し・フィラーといった非言語現象がある。2023年度は，特に不明瞭な音声である高齢者音声認識に着目した研究を行った。特に，高齢者音声に見られる発話速度についての研究調査を行い，End-to-end型の音声認識よりも時間制御をしやすいハイブリッド型HMM-DNN音声認識技術の可能性を見出した。また，雑音・残響環境の音声に対する音声強調の研究も行い，拡散モデルによる音声強調手法の研究を行った。加えて，音声認識結果に含まれる誤認識文字を後処理で訂正する方法を考案し，大規模言語モデルを用いることで大幅に音声認識が改善できることを示した。授業音声の字幕化については、音声認識結果の翻訳は高い精度で実現できるようになり、スライドの文字や黒板文字の文字認識技術の開発に取り組み，高いOCR精度を実現することができた。【話し方を特徴付ける特徴量】話し方を特徴付ける特徴量を調査するために，大規模日本語音声コーパスを使った聴取実験（アンケート）を実施し、50人の被験者にそれぞれ約100発話（30秒程度）の音声を聞いてもらい，聴きやすさ，理解しやすさなどの約30項目の印象評価をおこなったデータセットを収集した。【成果とりまとめ】開発した各要素技術は，国内学会で発表をおこなった。また，雑誌論文ならびに国際会議論文にまとめ，投稿済みである。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason おおむね順調に進展している理由は次のとおりである。まず，【非言語現象を考慮した音声認識と機械翻訳技術】に関して，2023年度初頭では，音声認識を阻害する要因である不明瞭性，雑音・残響環境，言い直し・淀み・フィラーといった非言語現象に着目し，非言語現象を考慮した認識改善方法を研究し，発話に含まれる言語断片の解消などのテキスト整形や翻訳しやすい文への整形技術を開発することで翻訳精度を改善することを予定していた。非言語現象を考慮した認識改善方法を開発し，これを利用した機械翻訳字幕の提示方法を確立した。高齢者音声認識についてもend-to-endやHMM-DNNハイブリッド法を検討した。雑音・残響環境についても拡散モデルベースの音声強調アルゴリズムの開発を行った。音声認識の精度改善については，誤認識文字を後処理で訂正する方法を考案し，大規模言語モデルを用いることで大幅に音声認識が改善できることを示した。したがって，概ね進捗通りだと言える。次に，【話し方を特徴付ける特徴量】については，2023年度初頭では，音響・韻律特徴に加えて，言語的な特徴を加えた話し方を決定付ける特徴量を開発するための聴取実験を計画していた。2022年度に開発した音声を聞いて印象評定が入力できるシステムを用いて，50名程度の被験者から音声印象に関する大規模な印象データを収集した。【教室の映像・音響センシング】については，研究がある程度完了し，ジャーナル論文として投稿中であり，計画通りに進んだと言える。以上のことから，本研究は順調に進展していると言える。
Strategy for Future Research Activity	2024年度は，これまでの成果を活用して特に留学生向けの授業保障の枠組みを構築することを目指す。【非言語現象を考慮した音声認識技術による字幕・翻訳化】音声認識を阻害する要因である話者の多様性（年齢，性別など），音声の不明瞭性，雑音・残響環境に頑健な音声認識技術の研究を進める。具体的には，大規模言語モデルを用いた音声認識誤りの自動訂正や，対話型AI（ChatGPTなど）を活用して音声認識結果を自動的に分かりやすい翻訳テキストに整形する方法も検討する。これにより，音声認識の精度と翻訳品質の向上を図る。授業音声の字幕化については，授業資料であるスライドの文字や黒板文字の翻訳には課題が残る。そこで，スライドや黒板の文字認識技術の精度改善と，認識結果を効果的に翻訳する技術の開発を行う。【話し方を特徴付ける特徴量】2022年度に引き続き，50名程度の被験者から音声に対する印象評価に関する大規模なデータを収集する。2年間で収集したデータを分析し，話し方を評価する技術を開発する。収集したデータはデータセットとして整備し，公開する予定である。【授業の情報保障の枠組みを構築】上記の要素技術を統合し，日本語の授業に対して英語字幕付きの動画を自動生成するシステムを構築する。このシステムを用いて，大学の授業に参加している日本語が理解できない留学生に字幕付き動画を提示し，授業内容の理解度や満足度などの調査を実施する。【成果とりまとめ】国内外の人工知能・音声・言語処理・教育工学関連の学会で積極的に発表し，研究コミュニティへの貢献を図る。また，論文としてまとめ，学術誌への投稿も行う。本研究の目的は，音声認識・翻訳技術，話し方評価技術によって情報保障システムを開発し，教育の質の向上と学生の理解促進に寄与することである。2024年度の研究活動を通じて，研究目的の達成に向けて邁進する。

Report

(3 results)

Research Products
(27 results)

All 2023 2022 2021

All Journal Article (14 results) (of which Peer Reviewed: 14 results, Open Access: 6 results) Presentation (13 results) (of which Invited: 1 results)

[Journal Article] Single-Line Text Detection in Multi-Line Text with Narrow Spacing for Line-Based Character Recognition2023
- Author(s)
  LEOW Chee Siang、YAJIMA Hideaki、KITAGAWA Tomoki、NISHIZAKI Hiromitsu
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: E106.D Issue: 12 Pages: 2097-2106
- DOI
  10.1587/transinf.2023EDP7070
- ISSN
  0916-8532, 1745-1361
- Year and Date
  2023-12-01
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] A Lightweight End-to-End Speech Recognition System on Embedded Devices2023
- Author(s)
  WANG Yu、NISHIZAKI Hiromitsu
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: E106.D Issue: 7 Pages: 1230-1239
- DOI
  10.1587/transinf.2022EDP7221
- ISSN
  0916-8532, 1745-1361
- Year and Date
  2023-07-01
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Comparative Evaluation of Diverse Features in Fluency Evaluation of Spontaneous Speech2023
- Author(s)
  DENG Huaijin、UTSURO Takehito、KOBAYASHI Akio、NISHIZAKI Hiromitsu
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: E106.D Issue: 1 Pages: 36-45
- DOI
  10.1587/transinf.2022EDP7047
- ISSN
  0916-8532, 1745-1361
- Year and Date
  2023-01-01
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Metric Learning Approach for End-to-End Multilingual Automatic Speech Recognition Model2023
- Author(s)
  Dobashi Akihiro、Leow Chee Siang、Nishizaki Hiromitsu
- Journal Title
  
  Proceedings of the 2023 IEEE 12th Global Conference on Consumer Electronics (GCCE 2023)
  
  Volume: - Pages: 845-849
- DOI
  10.1109/gcce59613.2023.10315608
- Related Report
  2023 Annual Research Report
- Peer Reviewed
[Journal Article] Data Augmentation with Automatically Generated Images for Character Classifier Model Training2023
- Author(s)
  Leow Chee Siang、Kitagawa Tomoki、Yajima Hideaki、Nishizaki Hiromitsu
- Journal Title
  
  Proceedings of the 2023 IEEE 12th Global Conference on Consumer Electronics
  
  Volume: - Pages: 845-849
- DOI
  10.1109/gcce59613.2023.10315447
- Related Report
  2023 Annual Research Report
- Peer Reviewed
[Journal Article] A new speech corpus of super-elderly Japanese for acoustic modeling2023
- Author(s)
  Fukuda Meiko、Nishimura Ryota、Nishizaki Hiromitsu、Horii Koharu、Iribe Yurie、Yamamoto Kazumasa、Kitaoka Norihide
- Journal Title
  
  Computer Speech & Language
  
  Volume: 77 Pages: 101424-101424
- DOI
  10.1016/j.csl.2022.101424
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Automatic Selection of Appropriate Data Augmentation Operation for Acoustic Scene Classification Model Training2022
- Author(s)
  Sugiura Toki、Kobayashi Akio、Utsuro Takehito、Nishizaki Hiromitsu
- Journal Title
  
  Proceedings of the 2022 IEEE 11th Global Conference on Consumer Electronics
  
  Volume: － Pages: 355-358
- DOI
  10.1109/gcce56475.2022.10014333
- Related Report
  2022 Annual Research Report
- Peer Reviewed
[Journal Article] Implicit language information replacing method in Japanese encoder?decode ASR model2022
- Author(s)
  Mori Daiki、Ohta Kengo、Nishimura Ryota、Kitaoka Norihide
- Journal Title
  
  Proceedings of the 2022 9th International Conference on Advanced Informatics: Concepts, Theory and Applications
  
  Volume: － Pages: 1-6
- DOI
  10.1109/icaicta56449.2022.9932915
- Related Report
  2022 Annual Research Report
- Peer Reviewed
[Journal Article] Comparison of Static and Time-Sequential Features in Automatic Fluency Detection of Spontaneous Speech2021
- Author(s)
  Deng Huaijin、Utsuro Takehito、Kobayashi Akio、Nishizaki Hiromitsu
- Journal Title
  
  Proceedings of the 24th Conference of the Oriental COCOSDA
  
  Volume: - Pages: 158-163
- DOI
  10.1109/o-cocosda202152914.2021.9660601
- Related Report
  2021 Annual Research Report
- Peer Reviewed
[Journal Article] ExKaldi-RT: A Real-Time Automatic Speech Recognition Extension Toolkit of Kaldi2021
- Author(s)
  Wang Yu, Chee Siang Leow, Akio Kobayashi, Takehito Utsuro, Hiromitsu Nishizaki
- Journal Title
  
  Proceedings of 2020 IEEE 10th Global Conference on Consumer Electronics (GCCE)
  
  Volume: - Pages: 346-350
- DOI
  10.1109/gcce53005.2021.9621992
- Related Report
  2021 Annual Research Report
- Peer Reviewed
[Journal Article] Audio Synthesis-based Data Augmentation Considering Audio Event Class2021
- Author(s)
  Sugiura Toki、Kobayashi Akio、Utsuro Takehito、Nishizaki Hiromitsu
- Journal Title
  
  Proceedings of the 2021 IEEE 10th Global Conference on Consumer Electronics
  
  Pages: 72-76
- DOI
  10.1109/gcce53005.2021.9621828
- Related Report
  2021 Annual Research Report
- Peer Reviewed
[Journal Article] Corpus Design and Automatic Speech Recognition for Deaf and Hard-of-Hearing People2021
- Author(s)
  Kobayashi Akio、Yasu Keiichi、Nishizaki Hiromitsu、Kitaoka Norihide
- Journal Title
  
  2021 IEEE 10th Global Conference on Consumer Electronics (GCCE)
  
  Volume: N.A. Pages: 17-18
- DOI
  10.1109/gcce53005.2021.9621959
- Related Report
  2021 Annual Research Report
- Peer Reviewed
[Journal Article] Language and Speaker-Independent Feature Transformation for End-to-End Multilingual Speech Recognition2021
- Author(s)
  Tomoaki Hayakawa, Chee Siang Leow, Akio Kobayashi, Takehito Utsuro, and Hiromitsu Nishizaki
- Journal Title
  
  Proceedings of INTERSPEECH2021
  
  Volume: - Pages: 2431-2435
- DOI
  10.21437/interspeech.2021-390
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Voice Activity Detection for Live Speech of Baseball Game Based on Tandem Connection with Speech/Noise Separation Model2021
- Author(s)
  Nonaka Yuto、Leow Chee Siang、Kobayashi Akio、Utsuro Takehito、Nishizaki Hiromitsu
- Journal Title
  
  Proceedings of INTERSPEECH2021
  
  Pages: 351-355
- DOI
  10.21437/interspeech.2021-792
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Open Access
[Presentation] 生成文字画像を用いた単・複数行テキストに対する文字認識精度向上の検討2023
- Author(s)
  レオチーシャン，北川智樹，矢島英明，西崎博光
- Organizer
  情報処理学会第86回全国大会講演論文集
- Related Report
  2023 Annual Research Report
[Presentation] Transformerデコーダを用いた画像内のテキスト領域検出の検討2023
- Author(s)
  矢島英明，レオチーシャン，北川智樹，西崎博光
- Organizer
  情報処理学会第86回全国大会講演論文集
- Related Report
  2023 Annual Research Report
[Presentation] End-to-End 複数言語音声認識モデル訓練における距離学習の効果2023
- Author(s)
  土橋晃弘，レオ　チーシャン，西崎博光
- Organizer
  日本音響学会2023年秋季研究発表会講演論文集
- Related Report
  2023 Annual Research Report
[Presentation] 日本語音声認識結果の整形に基づく分かりやすい英語字幕の生成2023
- Author(s)
  堀田慎，堀井こはる，北岡教英，西崎博光
- Organizer
  情報処理学会第85回全国大会
- Related Report
  2022 Annual Research Report
[Presentation] 言語・非言語情報タグを付与する音声認識モデルの検討2023
- Author(s)
  塩根凪人，若林佑幸，北岡教英
- Organizer
  SPEASIPワークショップ
- Related Report
  2022 Annual Research Report
[Presentation] 音韻特徴を用いた聴覚障害者音声のEnd-to-End音声認識2023
- Author(s)
  小林彰夫，安啓一
- Organizer
  情報処理学会第85回全国大会
- Related Report
  2022 Annual Research Report
[Presentation] 品質劣化したラジオ音声を対象とした音声強調手法の検討2023
- Author(s)
  小林彰夫，安啓一
- Organizer
  情報処理学会第85回全国大会
- Related Report
  2022 Annual Research Report
[Presentation] Density Ratio Approachに基づく複数Encoder-Decoder音声認識モデル統合手法2022
- Author(s)
  北條圭悟，森大輝，若林佑幸，小川厚徳，北岡教英
- Organizer
  第24回音声言語および第9回自然言語処理シンポジウム
- Related Report
  2022 Annual Research Report
[Presentation] ドメイン外音響情報で補強したEncoder-Decoder音声認識モデルの設計2022
- Author(s)
  森大輝, 太田健吾, 西村良太, 北岡教英
- Organizer
  日本音響学会2022年秋季研究発表会
- Related Report
  2022 Annual Research Report
[Presentation] 周波数軸注意機構を用いた特徴変換モデルに基づく複数言語音声認識2022
- Author(s)
  土橋晃弘，レオチーシャン，西崎博光
- Organizer
  日本音響学会2022年春季研究発表会
- Related Report
  2021 Annual Research Report
[Presentation] Peer Collaborative Learningを用いた音響イベント区間検出2022
- Author(s)
  遠藤颯人，西崎博光
- Organizer
  日本音響学会2022年春季研究発表会
- Related Report
  2021 Annual Research Report
[Presentation] Kaldiベースのストリーミング音声認識システムの開発2021
- Author(s)
  レオチーシャン，王宇，小林彰夫，宇津呂武仁，西崎博光
- Organizer
  日本音響学会2021年秋季研究発表会
- Related Report
  2021 Annual Research Report
[Presentation] 深層学習技術の発展と共に歩む音声認識研究2021
- Author(s)
  西崎博光
- Organizer
  電子情報通信学会，信学技法，ネットワークシステム研究会
- Related Report
  2021 Annual Research Report
- Invited

授業における非言語モダリティセンシングの高度化による情報保障・授業改善支援技術

Principal Investigator

西崎 博光 山梨大学, 大学院総合研究部, 教授 (40362082)

¥15,860,000 (Direct Cost: ¥12,200,000、Indirect Cost: ¥3,660,000)

Current Status of Research Progress

Reason

Report

Research Products

[Journal Article] Single-Line Text Detection in Multi-Line Text with Narrow Spacing for Line-Based Character Recognition2023

Author(s)

Journal Title

DOI

ISSN

Year and Date

Related Report

[Journal Article] A Lightweight End-to-End Speech Recognition System on Embedded Devices2023

Author(s)

Journal Title

DOI

ISSN

Year and Date

Related Report

[Journal Article] Comparative Evaluation of Diverse Features in Fluency Evaluation of Spontaneous Speech2023

Author(s)

Journal Title

DOI

ISSN

Year and Date

Related Report

[Journal Article] Metric Learning Approach for End-to-End Multilingual Automatic Speech Recognition Model2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Data Augmentation with Automatically Generated Images for Character Classifier Model Training2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] A new speech corpus of super-elderly Japanese for acoustic modeling2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Automatic Selection of Appropriate Data Augmentation Operation for Acoustic Scene Classification Model Training2022

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Implicit language information replacing method in Japanese encoder?decode ASR model2022

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Comparison of Static and Time-Sequential Features in Automatic Fluency Detection of Spontaneous Speech2021

Author(s)

Journal Title

DOI

Related Report

[Journal Article] ExKaldi-RT: A Real-Time Automatic Speech Recognition Extension Toolkit of Kaldi2021

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Audio Synthesis-based Data Augmentation Considering Audio Event Class2021

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Corpus Design and Automatic Speech Recognition for Deaf and Hard-of-Hearing People2021

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Language and Speaker-Independent Feature Transformation for End-to-End Multilingual Speech Recognition2021

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Voice Activity Detection for Live Speech of Baseball Game Based on Tandem Connection with Speech/Noise Separation Model2021

西崎博光山梨大学, 大学院総合研究部, 教授 (40362082)