2023 Fiscal Year Final Research Report
Improving Speaker Recognition by Using Linguistic Information Inherent in Speech
Project/Area Number |
21K11967
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Review Section |
Basic Section 61010:Perceptual information processing-related
|
Research Institution | Yokohama City University |
Principal Investigator |
|
Project Period (FY) |
2021-04-01 – 2024-03-31
|
Keywords | 筆者認識 / 話者認識 / 生体認証 / 深層学習 / 大規模言語モデル / 生成AI / ディープフェイク検出 |
Outline of Final Research Achievements |
Aiming to improve speaker recognition technology, which has not yet been put into practical use compared to facial recognition or vein recognition, we investigated methods for utilizing the linguistic information contained in speech and demonstrated the effectiveness of recent deep learning models. We also examined the discriminability of generative AI, including large-scale language models, which have made remarkable progress in recent years, and humans. Furthermore, to clarify the capabilities and behavior of generative AI, we compared the quality of text data produced by large-scale language models and demonstrated that state-of-the-art models, such as GPT-4, have text generation capabilities equal to or greater than humans. We also examined image captioning models, quantitatively measuring the capabilities of the latest models in image classification tasks.
|
Free Research Field |
知能情報学,知覚情報処理
|
Academic Significance and Societal Importance of the Research Achievements |
ディジタル社会の進展に伴い, ユーザの本人確認を安全かつ簡便に行う技術が求められている中で, 顔認証や静脈認証などと並んで普及が期待される音声認証の精度を改善し, より安全で便利な社会の実現に貢献する. 加えて, 近年急速に発展して社会的な注目度も高い, 大規模言語モデルなどのいわゆる生成AIの性質を明らかにすることにより, AIの社会への普及を促進し, AI技術の健全な発展に貢献する.
|