Classification of regional dialects in Japanese

Research Project

Project/Area Number	20K20702
Research Category	Grant-in-Aid for Challenging Research (Exploratory)
Allocation Type	Multi-year Fund
Review Section	Medium-sized Section 2:Literature, linguistics, and related fields
Research Institution	The University of Tokushima
Principal Investigator	服部恒太徳島大学, 大学院社会産業理工学研究部(社会総合科学域), 講師 (10758387)
Co-Investigator(Kenkyū-buntansha)	岸江信介奈良大学, 文学部, 教授 (90271460)
Project Period (FY)	2020-07-30 – 2025-03-31
Project Status	Granted (Fiscal Year 2023)
Budget Amount *help	¥5,200,000 (Direct Cost: ¥4,000,000、Indirect Cost: ¥1,200,000) Fiscal Year 2022: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000) Fiscal Year 2021: ¥910,000 (Direct Cost: ¥700,000、Indirect Cost: ¥210,000) Fiscal Year 2020: ¥2,730,000 (Direct Cost: ¥2,100,000、Indirect Cost: ¥630,000)
Keywords	Dialect classification / Random forest / DBSCAN / Machine learning / Pilot study / COVID-19 / Interruption / Japanese dialects / Classification / Data science / Speech science
Outline of Research at the Start	これまでに日本方言の区分は研究者が記述的な方法を用いて行ってきた。しかし、日本語母語話者が実際に方言を聞いてどのように区分をするのかという科学的な検証は未だに行われていない。また、彼らが世代間で方言の認識の仕方をどのように変化させているのかも科学的に検証されていない。本研究では幅広い年齢層（若年層と中年層）の日本語母語話者に各都道府県の老年層の話す方言を聞いてもらい、その区分を行ってもらう。本研究は彼らの区分を統計的に分析することで日本人自身が自分たちの方言をどのように区分しているのか、そしてどの程度若い世代のあいだで方言の消失が進んでいるのかを初めて科学的に明らかにすることを目的とする。
Outline of Annual Research Achievements	2023年には、さらにrandom forestがどのように日本語の方言を分類するかを調査した。このアルゴリズムを利用し、関西地方と中国地方の老年層の日本語話者の居住している県を予測するモデルを作成した。その結果、最適なモデルは被験者の居住県を高い精度で予測した。しかし、関西の中部（大阪府北部、奈良県北部、兵庫県東部）や中国地方の県境付近に居住する一部の被験者の居住県は、精度よく予測できなかった。この結果は、The 184th Meeting of the Acoustical Society of Americaで発表した。その後、中部地方に住む老年日本語話者をrandom forestで分類できるかどうかを検証した。そのモデルの結果は、予測確率が50%以上の話者（n = 415）は、概ね居住する県に分布しており、各県に方言があることを示唆した。予測確率が50％未満であった話者は、居住する県とその周辺地域、特に愛知県、岐阜県、静岡県、長野県、群馬県、新潟県に居住しており、これらの地域の話者は各県の方言の特徴を共有していることが示唆された。このことが、話者の出身地の予測の低い精度につながった可能性がある。また、これらの県方言の広がりは日本アルプスによって制限されているようで、山脈の東側で話される方言と西側で話される方言は、それぞれ一般にその地域にのみ広まっている。私たちは、これらの結果をThe 185th Meeting of the Acoustical Society of Americaで発表した。2023年度の結果から明らかになりつつあることは、従来の方言の分類は、必ずしも方言の実際の分布を捉えていないということである。今後、機械学習のどのようなアルゴリズムが方言の分布をよりよく捉えているのか、さらに検討する必要がある。
Current Status of Research Progress	Current Status of Research Progress 3: Progress in research has been slightly delayed. Reason 2023年には、年配の日本人話者が話す日本語の方言がどのように分類されるかをさらに調査した。今のところ私たちが明らかにしてきていることは、random forestは方言話者をある程度分類できるが、その精度は高くないということである。これは、方言を都道府県の境界で分類するのは良い方法ではないことを示唆しているようだ。つまり、教師ありモデルは、方言がどのように広がっているかを捉えるには必ずしも適していない。むしろ、教師なし学習、特にaffinity propagationが方言の分布をよく捉えていることがこれまでの研究の中でわかってきている。さらに、前回報告したように、私たちは西日本の若い日本語話者の音声サンプルを収集している。こちらもまだサンプルを収集している最中である。
Strategy for Future Research Activity	Affinity propagationがうまく機能していることを踏まえて、私たちは、これまで使用したすべてのデータで、もう一度モデルを構築する。さらに、multivariate imputationを適用し、より良いモデルが構築できるかどうかを検証する。また、randaom forestとaffinity propagationを使って、老年層の関西方言がどのように分類されるかを検証して論文にまとめる予定である。さらに、私たちはオンラインでの記録システムの開発にも取り組む。2023年に学会に参加した際、そのようなシステムの構築に協力してくれるアメリカ人の学者に出会った。今年度中にこのシステムを立ち上げ、日本各地でさまざまな世代の音声サンプルを集めたい。そうすることで、2年間取り組んできた録音作業を完了させることができ、かつ新たなデータを得ることができる。

Report

(4 results)

Research Products
(5 results)

All 2023 2022

All Presentation (5 results) (of which Int'l Joint Research: 4 results)

[Presentation] Classification of Kansai and Chugoku dialects spoken by old Japanese speakers2023
- Author(s)
  Kota Hattori and Shinsuke Kishie
- Organizer
  The 184th Meeting of the Acoustical Society of America
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] Classification of Chubu-region dialects using random forest2023
- Author(s)
  Kota Hattori and Shinsuke Kishie
- Organizer
  The 185th Meeting of the Acoustical Society of America
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] Classification of Kansai dialects2022
- Author(s)
  Kota Hattori and Shinsuke Kishie
- Organizer
  The Seventeenth International Conference on Methods in Dialectology
- Related Report
  2022 Research-status Report
- Int'l Joint Research
[Presentation] Classification of Kansai and Chugoku dialects2022
- Author(s)
  Kota Hattori
- Organizer
  統計数理研究所共同利用研究集会「データ解析環境Rの整備と利用」
- Related Report
  2022 Research-status Report
[Presentation] Classification of Japanese dialects using phonetic distance2022
- Author(s)
  Kota Hattori and Shinsuke Kishie
- Organizer
  The Seventeenth International Conference on Methods in Dialectology
- Related Report
  2021 Research-status Report
- Int'l Joint Research

Classification of regional dialects in Japanese

Principal Investigator

服部 恒太 徳島大学, 大学院社会産業理工学研究部(社会総合科学域), 講師 (10758387)

¥5,200,000 (Direct Cost: ¥4,000,000、Indirect Cost: ¥1,200,000)

Current Status of Research Progress

Reason

Report

Research Products

[Presentation] Classification of Kansai and Chugoku dialects spoken by old Japanese speakers2023

Author(s)

Organizer

Related Report

[Presentation] Classification of Chubu-region dialects using random forest2023

Author(s)

Organizer

Related Report

[Presentation] Classification of Kansai dialects2022

Author(s)

Organizer

Related Report

[Presentation] Classification of Kansai and Chugoku dialects2022

Author(s)

Organizer

Related Report

[Presentation] Classification of Japanese dialects using phonetic distance2022

Author(s)

Organizer

Related Report

服部恒太徳島大学, 大学院社会産業理工学研究部(社会総合科学域), 講師 (10758387)