技術流出防止のための、人工知能による新規のガイダンスシステムの開発
Project/Area Number |
23K11757
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Review Section |
Basic Section 90020:Library and information science, humanistic and social informatics-related
|
Research Institution | Hokkaido University |
Principal Investigator |
大林 明彦 北海道大学, 産学・地域協働推進機構, 教授 (80798124)
|
Co-Investigator(Kenkyū-buntansha) |
RZEPKA Rafal 北海道大学, 情報科学研究院, 助教 (80396316)
|
Project Period (FY) |
2023-04-01 – 2026-03-31
|
Project Status |
Granted (Fiscal Year 2023)
|
Budget Amount *help |
¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000)
Fiscal Year 2025: ¥910,000 (Direct Cost: ¥700,000、Indirect Cost: ¥210,000)
Fiscal Year 2024: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Fiscal Year 2023: ¥1,820,000 (Direct Cost: ¥1,400,000、Indirect Cost: ¥420,000)
|
Keywords | export control / dialog system / question answering / trade security education / 該非判定 / エキスパートシステム / 対話システム / テキスト分類 / 質疑応答 |
Outline of Research at the Start |
本研究では、輸出管理と人工知能の専門家が協働して、研究者自身が研究領域の機微度を容易にチェックできるユーザフレンドリーなガイダンスシステムの提供を目指す。本システムは、輸出管理の法令文を計算処理可能な形式に変換し、研究者から提供される情報から自動的に機微度を推論する。また不足する情報は、外部知識により機械学習で強化されたチャットボットとの対話により補完する。本システムにより、専門知識を持たない者でも輸出管理を正しく運用し、機微技術の流出を防止することが可能となる。本研究で開発したガイダンスシステムは、無償提供とし、オープンソースとして自由に拡張できるものとする。
|
Outline of Annual Research Achievements |
For the first year of the grant we have planned to concentrate on developing an ontology for teaching our dialog system rules underneath trade security to be able to add explanation abilities to the existing system. However, when we were experimenting with matching masked language models like RoBERTa to help with retrieving graph nodes, OpenAI has released Chat-GPT, a large language model surpassing previous applications of natural language processing. This forced us to start broad tests to see if this closed model as well as other open-source ones can replace modules we have developed so far. We have performed experiments with GPT-3.4 and GPT-4 and have observed that they were trained on vast export control-related texts which helps them to answer questions about regulations in Japanese language. However, the expert evaluation showed several problems. For example, even the top (at the time) commercial model (GPT-4) hallucinated names of the regulations, cited ones that exist but not in Japan, etc. We shared our experimental findings in a international conference publication. We have tested models trained on Japanese language directly, but the performance was poor.
|
Current Status of Research Progress |
Current Status of Research Progress
3: Progress in research has been slightly delayed.
Reason
One one hand, the sudden technological jump in natural language processing caused an unexpected turn in our plans, as we had to assimilate the new trend and perform experiments which have not been planned. As the grant topic is rare, waiting for somebody to do the testing of newest GPTs on export control-related topics was not a promising idea, we performed experiments ourselves. On the other hand, we have learned the latest trends in prompting and few-shot learning, and employed a basic RAG (Retrieval-Augmented Generation) algorithm. This delay used for learning will possibly pay off in the next year as we are now able to extend our ideas for creating an export control ontology in more automatic manner. We also now have a strong baseline model for experimenting with explanations and teaching in a more user-friendly manner because large language models thrive in making quizzes and assessing users’ knowledge. Nevertheless, we must carry out very careful tests to avoid hallucination and utilizing the newest and safest RAGs equipped with fact-checking capabilities.
|
Strategy for Future Research Activity |
As the export control-related data is scarce, methods like fine tuning are relatively difficult to implement, we started trials with RAG (Retrieval-Augmented Generation). Searching (retrieval) module seems to improve finding related regulations, but generation causes errors and does not help with explaining. Our basic hypothesis that ontological knowledge graph can be useful with explaining dangers and understanding what users do not know, stays unchanged. However, new opportunities have appeared - short semantic relations can be now populated with large language models, they also can be used as examples in few-shot learning approaches and in fine-tuning which requires many more data. Combining rule-based trustful methods with masked language models and latest GPTs can bring new opportunities to develop our system faster than planned. The system can be also extended sooner when it comes to the educational side of the chatbot. If the RAG’s generation shows hallucinations, we will keep the retrieval module and concentrate on refining results via interaction with the user.
|
Report
(1 results)
Research Products
(1 results)