Understanding malware semantics by AI-supported formal methods

Research Project

Project/Area Number	20K20625
Research Category	Grant-in-Aid for Challenging Research (Pioneering)
Allocation Type	Multi-year Fund
Review Section	Medium-sized Section 60:Information science, computer engineering, and related fields
Research Institution	Japan Advanced Institute of Science and Technology
Principal Investigator	小川瑞史北陸先端科学技術大学院大学, 先端科学技術研究科, 教授 (40362024)
Co-Investigator(Kenkyū-buntansha)	NGUYEN MinhLe 北陸先端科学技術大学院大学, 先端科学技術研究科, 教授 (30509401) 寺内多智弘早稲田大学, 理工学術院, 教授 (70447150) 関浩之名古屋大学, 情報学研究科, 教授 (80196948)
Project Period (FY)	2020-07-30 – 2026-03-31
Project Status	Granted (Fiscal Year 2023)
Budget Amount *help	¥25,870,000 (Direct Cost: ¥19,900,000、Indirect Cost: ¥5,970,000) Fiscal Year 2025: ¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000) Fiscal Year 2024: ¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000) Fiscal Year 2023: ¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000) Fiscal Year 2022: ¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000) Fiscal Year 2021: ¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000) Fiscal Year 2020: ¥4,550,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥1,050,000)
Keywords	動的記号実行 / 命令セット / バイナリコード / マルウェア / 自然言語処理 / RE-DOS攻撃 / プライバシー / マルウェア解析 / 記号実行 / API / ネィティブコード / マルウェア意味解析 / 形式的意味自動抽出 / 形式仕様自動抽出 / 深層学習
Outline of Research at the Start	人間には解釈困難なバイナリコードに対し(1)操作的意味はレジスタ・フラッグ・メモリ・スタック上の状態遷移系として定義可能、(2)各命令仕様はリジッドな自然言語記述、(3)エミュレータ等テスト環境が完備などの観察に立脚し、英文マニュアルから命令の操作的意味自動抽出によりBE-PUM(x86), Corana(ARM), SiMIPS(MIPS)等のツールをGitHubで公開してきた。本研究は、多数のMPU/MCUの動的記号実行器の自動生成に加え、構造隠蔽前のペイロードの自動復元・抽出を行い、教師無し学習による特徴抽出、自然言語処理を用いた意味解釈を通じた新規感染手法検出・系統樹生成を目的とする。
Outline of Annual Research Achievements	R5年度の研究は、代表者および分担者（グエン）は形式手法ツール開発および実装および実験、分担者（寺内、関）は主にセキュリティに関する基礎理論部分について研究を進めた。代表者は、指導する博士課程学生 Nguyen Van Anh氏と共同でAndroid/apk上の記号実行器HyberidSE https://github.com/vananhnt/corana の研究開発（ロレーヌ大学と共同）を進めた。また、過去に開発した x86/win上の記号実行器 BE-PUMを用いて、パッカーを用いた制御構想隠蔽を持つマルウェア群に対し、グラフカーネルに基づく類似性に基づくパッカー同定・Entry検出を試み、良好な結果を得た。分担者（グエン）は、インストラクションマニュアルと同様に論理構造を明確に持つ法令文書を対象に深層学習や生成AIを応用した論理帰結構造の抽出を進めた。現在、インストラクションマニュアルへの同様な手法の適用による形式的意味抽出を検討している。分担者（寺内）は、近年注目されるスクリプト言語に対するRE-DOS攻撃の理論的基礎として、後方参照や先読みといった拡張機能を含む正規表現の表現力についての研究を行った。加えて、文字列抽出のための拡張正規表現の合成・修正に関する研究を行った。さらに、代数的エフェクトを含むプログラムの型による静的解析に関する研究を行った。分担者（関）は、ゲームにおけるプレイヤーの勝利目的をそのプレイヤー（ユーザ）のプライバシー情報とみなし，ゲームを観測する敵対者がプレイヤーの勝利目的を推測するのをできる限り困難とするような戦略（識別不能戦略）および，どのプレイヤーも単独では推測困難性を大きくすることはできないような戦略の組（目的識別不能均衡）を定義し，それらに関する判定問題の判定可能性や計算量を解析した。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 本研究で以前より進めてきたx86/win上のBE-PUM実装（もともとは32bit版であったが、現在、ベトナムの研究協力者、Le Quy Don技術大学のPhan Viet Anh博士のグループにより64bit版への拡張の実装を進めている）に加え、代表者の進めるAnroid/apk上の記号実行器 HyberidSEのプロトタイプ実装が完成した。Apk形式はEntryを記述するmanifest.xml, dex bytecode, .soに含まれる native code からなりOSライブラリAPIの呼出も持つため、異環境をシームレスにつなぐ必要がある。しかし異環境下の環境の受け渡しは容易ではなくインターフェースの（半）自動抽出などを試みている。現在、x86/win上のマルウェア上での実験に加え、Androidマルウェアを含むapkファイル群上の実験を開始している。x86/winマルウェアはパッカー利用によるバイナリコードの暗号化・復号化などが代表的なパッカーにより施されるのに比べ、Android/apkマルウェアはmanifest.xmlの記述隠蔽によるEntry秘匿や、dex bytecodeのnative codeへのコンパイル、native codeの動的読み込みなどと手法が異なることを観察している。なお、論文投稿が遅れているが、現在、2件の国際会議投稿を準備中である。分担者（グエン）の研究の進捗は、今後の記号実行器のサポートするインストラクションセットの拡張にむけて応用を期待している。また分担者（寺内、関）の研究の進捗は現状までは直接に記号実行器の開発実装には関連していないが、今後、記号実行器により得られた制御フローグラフに対する（AI手法を含む）統計的解析手法のターゲットとなる性質　として利用する予定である。
Strategy for Future Research Activity	マルウェアを含むバイナリコードの形式手法としての記号実行器の開発実装は、現在までに x86-32/Win (BE-PUM), ARM-32/Android (CORANA, HybridsSE), MIPS-32/linux (SIMIPS) に対して行ってきた。近年の潮流として、インストラクションセットは PCはx86, スマホやタブレットはARMが主流であるが、新たな流れとして、制御装置におけるMPUをFPGAでRISC-Vインストラクションセットの実装が散見される。今後、x86-32, ARM-32 の64bit への拡張に加え、ベトナム国家大学ハノイ校との協働で RISC-V に対する記号実行器の開発実装の可能性を模索している。これは過去にインストラクションマニュアルからの各インストラクションの形式的意味の実装をルールベースの古典的自然言語処理に加え、深層学習やLLMを用いた自動化を検討し一定の成功を収めてきた。この手法の適用を、分担者（グエン）と共同で行う。記号実行器の実装がされると、得られる最も大事な情報は制御構造グラフ(Control Flow Graph)である。制御構造グラフの類似性、脆弱性を持つ特徴的な制御列、また情報のソースからリークへの到達可能性など、ターゲットとなるセキュリティの性質の形式化に基づき、マルウェア分類、脆弱性および脆弱性攻撃の検出、情報の漏洩可能性、悪意の侵略可能性などの性質の自動検出をめざす。

Report

(4 results)

Research Products
(24 results)

All 2024 2023 2022 2021 2020 Other

All Int'l Joint Research (2 results) Journal Article (7 results) (of which Int'l Joint Research: 3 results, Peer Reviewed: 7 results, Open Access: 5 results) Presentation (13 results) (of which Int'l Joint Research: 10 results) Remarks (2 results)

[Int'l Joint Research] ロレーヌ大学(フランス)
- Related Report
  2023 Research-status Report
[Int'l Joint Research] Le Quy Don技術大学(ベトナム)
- Related Report
  2023 Research-status Report
[Journal Article] Attentive deep neural networks for legal document retrieval.2024
- Author(s)
  Ha-Thanh Nguyen, Manh-Kien Phi, Xuan-Bach Ngo, Vu Tran, Le-Minh Nguyen, Minh-Phuong Tu
- Journal Title
  
  Artif. Intell. Law
  
  Volume: 32(1) Issue: 1 Pages: 57-86
- DOI
  10.1007/s10506-022-09341-8
- Related Report
  2023 Research-status Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] On Lookaheads in Regular Expressions with Backreferences2023
- Author(s)
  Nariyoshi Chida, Tachio Terauchi.
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: E106.D Issue: 5 Pages: 959-975
- DOI
  10.1587/transinf.2022EDP7098
- ISSN
  0916-8532, 1745-1361
- Year and Date
  2023-05-01
- Related Report
  2023 Research-status Report
- Peer Reviewed
[Journal Article] Trace Effects for a Language with Algebraic Effect Handlers2023
- Author(s)
  川俣楓河, 寺内多智弘
- Journal Title
  
  Computer Software
  
  Volume: 40 Issue: 2 Pages: 2_19-2_48
- DOI
  10.11309/jssst.40.2_19
- ISSN
  0289-6540
- Year and Date
  2023-04-21
- Related Report
  2023 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Reduction of Register Pushdown Systems with Freshness Property to Pushdown Systems in LTL Model Checking2022
- Author(s)
  TAKATA Yoshiaki、SENDA Ryoma、SEKI Hiroyuki
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: E105.D Issue: 9 Pages: 1620-1623
- DOI
  10.1587/transinf.2022EDL8030
- ISSN
  0916-8532, 1745-1361
- Year and Date
  2022-09-01
- Related Report
  2022 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] SM-BERT-CR: a deep learning approach for case law retrieval with supporting model2022
- Author(s)
  Vuong Yen Thi-Hai、Bui Quan Minh、Nguyen Ha-Thanh、Nguyen Thi-Thu-Trang、Tran Vu、Phan Xuan-Hieu、Satoh Ken、Nguyen Le-Minh
- Journal Title
  
  Artificial Intelligence and Law
  
  Volume: 30 Issue: 3 Pages: 1-28
- DOI
  10.1007/s10506-022-09319-6
- Related Report
  2023 Research-status Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Complexity results on register context-free grammars and related formalisms2022
- Author(s)
  Senda Ryoma、Takata Yoshiaki、Seki Hiroyuki
- Journal Title
  
  Theoretical Computer Science
  
  Volume: 923 Pages: 99-125
- DOI
  10.1016/j.tcs.2022.04.055
- Related Report
  2022 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Reachability of Patterned Conditional Pushdown Systems2020
- Author(s)
  Li Xin、Gardy Patrick、Deng Yu-Xin、Seki Hiroyuki
- Journal Title
  
  Journal of Computer Science and Technology
  
  Volume: 35 Issue: 6 Pages: 1295-1311
- DOI
  10.1007/s11390-020-0541-z
- Related Report
  2020 Research-status Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] Answer Refinement Modification: Refinement Type System for Algebraic Effects and Handlers.2024
- Author(s)
  Fuga Kawamata, Taro Sekiyama, Hiroshi Unno, Tachio Terauchi
- Organizer
  51st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2024)
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] Original Entry Point detection based on graph similarity.2023
- Author(s)
  Pham Thanh Hung, Mizuhito Ogawa
- Organizer
  16th International Symposium on Foundations & Practice of Security (FPS 2023)
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] Repairing Regular Expressions for Extraction.2023
- Author(s)
  Nariyoshi Chida, Tachio Terauchi
- Organizer
  44th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2023)
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] On the Expressive Power of Regular Expressions with Backreferences.2023
- Author(s)
  Taisei Nogami, Tachio Terauchi
- Organizer
  48th Mathematical Foundations of Computer Science (MFCS 2023)
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] A Game-theoretic Approach to Indistinguishability of Winning Objectives as User Privacy,2023
- Author(s)
  Rindo Nakanishi, Yoshiaki Takata, Hiroyuki Seki
- Organizer
  20th International Colloquium on Theoretical Aspects of Computing (ICTAC 2023)
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] Modular Primal-Dual Fixpoint Logic Solving for Temporal Verification2023
- Author(s)
  Hiroshi Unno, Tachio Terauchi, Yu Gu, Eric Koskinen
- Organizer
  50th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2023), pp.2111-2140
- Related Report
  2022 Research-status Report
- Int'l Joint Research
[Presentation] 後方参照付正規表現の表現力について2023
- Author(s)
  野上大成, 寺内多智弘
- Organizer
  ソフトウェア科学会第25回プログラミングおよびプログラミング言語ワークショップ (PPL 2023)
- Related Report
  2022 Research-status Report
[Presentation] Automatic Stub Generation for Dynamic Symbolic Execution of ARM binary2022
- Author(s)
  Nguyen Van Anh, Mizuhito Ogawa .
- Organizer
  11th International Symposium on Information and Communication Technology (SoICT 2022), pp.352-359
- Related Report
  2022 Research-status Report
- Int'l Joint Research
[Presentation] Repairing DoS Vulnerability of Real-World Regexes2022
- Author(s)
  Nariyoshi Chida, Tachio Terauchi
- Organizer
  43rd IEEE Symposium on Security and Privacy (S&P 2022), pp.2060-2077
- Related Report
  2022 Research-status Report
- Int'l Joint Research
[Presentation] On Lookaheads in Regular Expressions with Backreferences2022
- Author(s)
  Nariyoshi Chida, Tachio Terauchi
- Organizer
  7th International Conference on Formal Structures for Computation and Deduction (FSCD 2022), LIPICS Vol. 228, pp.15:1-15:18
- Related Report
  2022 Research-status Report
- Int'l Joint Research
[Presentation] Active Learning for Deterministic Bottom-up Nominal Tree Automata2022
- Author(s)
  Rindo Nakanishi, Yoshiaki Takata, Hiroyuki Seki
- Organizer
  19th International Colloquium on Theoretical Aspects of Computing (ICTAC 2022), LNCS 13572, pp.342-359
- Related Report
  2022 Research-status Report
- Int'l Joint Research
[Presentation] Constraint-based Relational Verification2021
- Author(s)
  Hiroshi Unno, Tachio Terauchi, Eric Koskinen
- Organizer
  33rd International Conference on Computer-Aided Verification (CAV 2021), Springer LNCS 12759, pp.742-766
- Related Report
  2021 Research-status Report
[Presentation] Reactive Synthesis from Visibly Register Pushdown Automata2021
- Author(s)
  Ryoma Senda, Yoshiaki Takata, Hiroyuki Seki
- Organizer
  18th International Colloquium on Theoretical Aspects of Computing (ICTAC 2021), Springer LNCS 12819, pp.334-353
- Related Report
  2021 Research-status Report
[Remarks] Corana（実装公開）
- URL
  https://github.com/anhvvcs/corana
- Related Report
  2021 Research-status Report
[Remarks] Corana/API（実装公開）
- URL
  https://github.com/vananhnt/corana
- Related Report
  2021 Research-status Report

Understanding malware semantics by AI-supported formal methods

Principal Investigator

小川 瑞史 北陸先端科学技術大学院大学, 先端科学技術研究科, 教授 (40362024)

¥25,870,000 (Direct Cost: ¥19,900,000、Indirect Cost: ¥5,970,000)

Current Status of Research Progress

Reason

Report

Research Products

[Int'l Joint Research] ロレーヌ大学(フランス)

Related Report

[Int'l Joint Research] Le Quy Don技術大学(ベトナム)

Related Report

[Journal Article] Attentive deep neural networks for legal document retrieval.2024

Author(s)

Journal Title

DOI

Related Report

[Journal Article] On Lookaheads in Regular Expressions with Backreferences2023

Author(s)

Journal Title

DOI

ISSN

Year and Date

Related Report

[Journal Article] Trace Effects for a Language with Algebraic Effect Handlers2023

Author(s)

Journal Title

DOI

ISSN

Year and Date

Related Report

[Journal Article] Reduction of Register Pushdown Systems with Freshness Property to Pushdown Systems in LTL Model Checking2022

Author(s)

Journal Title

DOI

ISSN

Year and Date

Related Report

[Journal Article] SM-BERT-CR: a deep learning approach for case law retrieval with supporting model2022

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Complexity results on register context-free grammars and related formalisms2022

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Reachability of Patterned Conditional Pushdown Systems2020

Author(s)

Journal Title

DOI

Related Report

[Presentation] Answer Refinement Modification: Refinement Type System for Algebraic Effects and Handlers.2024

Author(s)

Organizer

Related Report

[Presentation] Original Entry Point detection based on graph similarity.2023

Author(s)

Organizer

Related Report

[Presentation] Repairing Regular Expressions for Extraction.2023

Author(s)

Organizer

Related Report

[Presentation] On the Expressive Power of Regular Expressions with Backreferences.2023

Author(s)

Organizer

Related Report

[Presentation] A Game-theoretic Approach to Indistinguishability of Winning Objectives as User Privacy,2023

Author(s)

Organizer

Related Report

[Presentation] Modular Primal-Dual Fixpoint Logic Solving for Temporal Verification2023

Author(s)

Organizer

Related Report

[Presentation] 後方参照付正規表現の表現力について2023

Author(s)

Organizer

小川瑞史北陸先端科学技術大学院大学, 先端科学技術研究科, 教授 (40362024)