Project/Area Number |
23K16954
|
Research Category |
Grant-in-Aid for Early-Career Scientists
|
Allocation Type | Multi-year Fund |
Review Section |
Basic Section 61030:Intelligent informatics-related
|
Research Institution | The Institute of Statistical Mathematics |
Principal Investigator |
Tran Duc・Vu 統計数理研究所, リスク解析戦略研究センター, 特任助教 (90910240)
|
Project Period (FY) |
2023-04-01 – 2026-03-31
|
Project Status |
Granted (Fiscal Year 2023)
|
Budget Amount *help |
¥4,680,000 (Direct Cost: ¥3,600,000、Indirect Cost: ¥1,080,000)
Fiscal Year 2025: ¥910,000 (Direct Cost: ¥700,000、Indirect Cost: ¥210,000)
Fiscal Year 2024: ¥910,000 (Direct Cost: ¥700,000、Indirect Cost: ¥210,000)
Fiscal Year 2023: ¥2,860,000 (Direct Cost: ¥2,200,000、Indirect Cost: ¥660,000)
|
Keywords | large language models / effective prompting / sentiment analysis / social media data / public sentiments / social media / NLP / deep learning / statistical modeling |
Outline of Research at the Start |
It is a research on predicting sentiments of random social media users and contributes to the research direction on assessing public sentiments in a manner similar to conducting questionnaire surveys, timely, progressively, and low-cost, and applicable in economics, public health, and politics.
|
Outline of Annual Research Achievements |
In the first year of the research, promising results have been achieved for sentiment analysis using advanced natural language processing techniques on social media data. Especially, with the emergence of even more powerful large language models available for use and for fine-tuning to the public. Experiments were conducted with several high-end large language models on both Twitter ("X") and Reddit data for analyzing users. Experimental results showed that large language models can achieve good performance of analyzing social media texts with optimally designed prompting techniques, a way to make effective inputs to large language models. The results have been published in international conferences/workshops.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
From the large social media data collected for the research, several subsets have been analyzed to obtain topic-based sentiment using large language models with optimally designed prompting techniques. A number of large language models are available for utilization internally at the research institute: Llama, Mixtral, StableLM, etc. ChatGPT-4 is also utilized for sentiment analysis.
After the acquisition of Twitter Inc. (now called "X") by Elon Musk, in 2023, Twitter stoped its api for scientific research. Due to the incident, large Twitter data is no longer available via its service API at low cost. Nevertheless, abundant Twitter data was collected beforehand and is to be used continuously.
|
Strategy for Future Research Activity |
In the second year of the research, techniques for modeling social relationships for users and topics are to be investigated. From the current experimental results, ensemble of multiple sentiment analysis tools is an effective way to have more accurate analysis outputs which are the inputs to the social relationship models.
|