2013 Fiscal Year Annual Research Report

少数言語向け多言語音声認識の知識統合フレームワーク

Research Project

Project/Area Number	24700172
Research Institution	Nara Institute of Science and Technology
Principal Investigator	サクリアニサクティ奈良先端科学技術大学院大学, 情報科学研究科, 助教 (00395005)
Keywords	国際研究者交流
Research Abstract	1. 少数言語のデータ収集：インドネシアに現存する４つの主要な少数言語からデータ収集を始めた（ジャワ語：ジャワ島中部、スンダ語：ジャワ島西部、バリ語：バリ島、バタック語：スマトラ島北部）。インドネシア少数言語のテキストコーパスは、地元の新聞・雑誌から収集し、ジャワ語、スンダ語、バリ語、バタック語の各少数言語につき2000～3000文を収集できた。うち、1000文を選んでネイティブの話者による校閲を行った。その後、Greedyアルゴリズムを用いて音素バランス付き文を処理し、各言語につき225の音素バランス付き文を生成した。(a)音素バランス付き音声コーパス：上記の音素バランス付き225文を用い、各少数言語につき10人（男性5人、女性5人）の話者による音声データの収集を行った。収録はインドネシア国内で実施した。各語につきそれぞれ音素バランス付きコーパスを合計2250文収集した。(b)対訳音声コーパス：音素バランス付きコーパスの他に、インドネシア語からジャワ語、スンダ語、バリ語、バタック語に翻訳された50文の対訳音声コーパスも収集した。この音声データについては、各言語につき10人（男性5人、女性5人）の話者による発話を収録した。 2. 音声認識：インドネシア語の音声認識ベースラインは、既存のインドネシア語音声データを用いて学習した。このコーパスでは、400人（男性200人、女性200人）の発話が収録されており、標準インドネシア語のアクセントとバタック語、ジャワ語、スンダ語のアクセントが付けられている。各話者は210文を発話しており、合計84,000発話、80時間の音声が収められている。また、上記の音声リソースおよび音声認識ベースラインに基づき、インドネシア少数言語の特性を分析した。また、少数言語の音声コーパスを音声認識ベースラインに適用し、少数言語の音声認識システムを開発した。

Research Products
(4 results)

All 2014 2013

All Journal Article (2 results) (of which Peer Reviewed: 2 results) Presentation (2 results)

[Journal Article] Recent Progress in Developing Graphame-based Speech Recognition for Indonesian Ethnic Languages: Javanese, Sundanese, Balinese, and Bataks2014
- Author(s)
  Sakriani Sakti
- Journal Title
  
  Proc. of SLTU 2014
  
  Volume: CD-ROM Pages: TBD
- Peer Reviewed
[Journal Article] Towards Language Preservation: Design and Collection of Graphemically Balanced and Parallel Speech Corpora of Indonesian Ethnic Languages2013
- Author(s)
  Sakriani Sakti
- Journal Title
  
  Proc. of Oriental COCOSDA 2013
  
  Volume: CD-ROM Pages: 60.1-60.5
- Peer Reviewed
[Presentation] Recent Progress in Developing Graphame-based Speech Recognition for Indonesian Ethnic Languages: Javanese, Sundanese, Balinese, and Bataks2014
- Author(s)
  Sakriani Sakti
- Organizer
  International Workshop on Spoken Language Technologies for Under-resourced Languages 2014
- Place of Presentation
  University ITMO (St. Petersburug, Russia)
- Year and Date
  20140514-20140516
[Presentation] Towards Language Preservation: Design and Collection of Graphemically Balanced and Parallel Speech Corpora of Indonesian Ethnic Languages2013
- Author(s)
  Sakriani Sakti
- Organizer
  Oriental COCOSDA 2013
- Place of Presentation
  KIIT Campus (Delhi, India)
- Year and Date
  20131125-20131127

2013 Fiscal Year Annual Research Report

少数言語向け多言語音声認識の知識統合フレームワーク

Principal Investigator

サクリアニ サクティ 奈良先端科学技術大学院大学, 情報科学研究科, 助教 (00395005)

Research Products

[Journal Article] Recent Progress in Developing Graphame-based Speech Recognition for Indonesian Ethnic Languages: Javanese, Sundanese, Balinese, and Bataks2014

Author(s)

Journal Title

[Journal Article] Towards Language Preservation: Design and Collection of Graphemically Balanced and Parallel Speech Corpora of Indonesian Ethnic Languages2013

Author(s)

Journal Title

[Presentation] Recent Progress in Developing Graphame-based Speech Recognition for Indonesian Ethnic Languages: Javanese, Sundanese, Balinese, and Bataks2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Towards Language Preservation: Design and Collection of Graphemically Balanced and Parallel Speech Corpora of Indonesian Ethnic Languages2013

Author(s)

Organizer

Place of Presentation

Year and Date

サクリアニサクティ奈良先端科学技術大学院大学, 情報科学研究科, 助教 (00395005)