Malicious entity detection using fine-grained DNA-inspired behavioural modelling

Research Project

Project/Area Number	21F20785
Research Category	Grant-in-Aid for JSPS Fellows
Allocation Type	Single-year Grants
Section	外国
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	National Institute of Informatics
Principal Investigator	高須淳宏 (2021) 国立情報学研究所, コンテンツ科学研究系, 教授 (90216648)
Co-Investigator(Kenkyū-buntansha)	ANDRIOTIS PANAGIOTIS 国立情報学研究所, コンテンツ科学研究系, 外国人特別研究員
Host Researcher	高須淳宏 (2022) 国立情報学研究所, コンテンツ科学研究系, 教授 (90216648)
Foreign Research Fellow	ANDRIOTIS PANAGIOTIS 国立情報学研究所, コンテンツ科学研究系, 外国人特別研究員
Project Period (FY)	2021-11-18 – 2023-03-31
Project Status	Discontinued (Fiscal Year 2022)
Budget Amount *help	¥1,100,000 (Direct Cost: ¥1,100,000) Fiscal Year 2022: ¥600,000 (Direct Cost: ¥600,000) Fiscal Year 2021: ¥500,000 (Direct Cost: ¥500,000)
Keywords	cyber security / language model / information security
Outline of Research at the Start	本研究は，ソーシャルメディアにおけるボットに代表される各種迷惑行為を行うアカウントの検出技術を構築することを目的としている.迷惑行為を行うアカウントは特有な行動履歴パターンを持つという仮定のもと，新たな行動履歴のパターンを記述するためのパターンの表現法とそ獲得アルゴリズム，パターン表現を用いた迷惑行為アカウントの判定手法を構築することにより，より安全なソーシャルメディアの利用環境を構築することをめざす．
Outline of Annual Research Achievements	We initially explored how we can embed Bioinformatics to the problem of quantifying complexity (and security) for graphical passwords. We therefore implemented a graphical password scheme and tested its usability on mobile devices running the Android operating system. Then, we used Lothaire’s Combinatorics theory (definition of “finite word”, followed by the simple metric “complexity of a word”) with basic Bioinformatics (k-mers) and we introduced a complexity metric to quantify the proposed graphical password scheme space, aiming to identify how it compares with other graphical password schemes. Additionally, we collect data from the Twitter environment and compare our method with the SOTA in bot detection. At a later stage we will also use another dataset as a baseline to investigate how our approach of incorporating GNNs with DNA-inspired behavioural modelling works against well-known solutions based on GNNs. The fundamental idea of our approach is to use neighbourhood information derived from an account into question (2-hops away from it) utilising GNNs and accumulate this information with the account’s behavioural patterns, namely their digital DNA. Therefore, the use a series of different GNN architectures is needed to learn graph representations focused on the account(s) into question; their digital DNA are used as parts of their node features. Additionally, instead of using ROBERTa to contextually analyse the timeline of the account, we utilise more targeted methods (BERTweet, TwitterRoberta) which seem to work better with content derived from Twitter.
Research Progress Status	令和4年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	令和4年度が最終年度であるため、記入しない。