詳細な行動モデリングを用いたマリシャスエンティティの検出に関する研究

研究課題

研究課題/領域番号	21F20785
研究種目	特別研究員奨励費
配分区分	補助金
応募区分	外国
審査区分	小区分61030:知能情報学関連
研究機関	国立情報学研究所
研究代表者	高須淳宏 (2021) 国立情報学研究所, コンテンツ科学研究系, 教授 (90216648)
研究分担者	ANDRIOTIS PANAGIOTIS 国立情報学研究所, コンテンツ科学研究系, 外国人特別研究員
受入研究者	高須淳宏 (2022) 国立情報学研究所, コンテンツ科学研究系, 教授 (90216648)
外国人特別研究員	ANDRIOTIS PANAGIOTIS 国立情報学研究所, コンテンツ科学研究系, 外国人特別研究員
研究期間 (年度)	2021-11-18 – 2023-03-31
研究課題ステータス	中途終了 (2022年度)
配分額 *注記	1,100千円 (直接経費: 1,100千円) 2022年度: 600千円 (直接経費: 600千円) 2021年度: 500千円 (直接経費: 500千円)
キーワード	cyber security / language model / information security
研究開始時の研究の概要	本研究は，ソーシャルメディアにおけるボットに代表される各種迷惑行為を行うアカウントの検出技術を構築することを目的としている.迷惑行為を行うアカウントは特有な行動履歴パターンを持つという仮定のもと，新たな行動履歴のパターンを記述するためのパターンの表現法とそ獲得アルゴリズム，パターン表現を用いた迷惑行為アカウントの判定手法を構築することにより，より安全なソーシャルメディアの利用環境を構築することをめざす．
研究実績の概要	We initially explored how we can embed Bioinformatics to the problem of quantifying complexity (and security) for graphical passwords. We therefore implemented a graphical password scheme and tested its usability on mobile devices running the Android operating system. Then, we used Lothaire’s Combinatorics theory (definition of “finite word”, followed by the simple metric “complexity of a word”) with basic Bioinformatics (k-mers) and we introduced a complexity metric to quantify the proposed graphical password scheme space, aiming to identify how it compares with other graphical password schemes. Additionally, we collect data from the Twitter environment and compare our method with the SOTA in bot detection. At a later stage we will also use another dataset as a baseline to investigate how our approach of incorporating GNNs with DNA-inspired behavioural modelling works against well-known solutions based on GNNs. The fundamental idea of our approach is to use neighbourhood information derived from an account into question (2-hops away from it) utilising GNNs and accumulate this information with the account’s behavioural patterns, namely their digital DNA. Therefore, the use a series of different GNN architectures is needed to learn graph representations focused on the account(s) into question; their digital DNA are used as parts of their node features. Additionally, instead of using ROBERTa to contextually analyse the timeline of the account, we utilise more targeted methods (BERTweet, TwitterRoberta) which seem to work better with content derived from Twitter.
現在までの達成度 (段落)	令和4年度が最終年度であるため、記入しない。
今後の研究の推進方策	令和4年度が最終年度であるため、記入しない。