Understanding malware semantics by AI-supported formal methods

Research Project

Project/Area Number	20K20625
Research Category	Grant-in-Aid for Challenging Research (Pioneering)
Allocation Type	Multi-year Fund
Review Section	Medium-sized Section 60:Information science, computer engineering, and related fields
Research Institution	Japan Advanced Institute of Science and Technology
Principal Investigator	小川瑞史北陸先端科学技術大学院大学, 先端科学技術研究科, 教授 (40362024)
Co-Investigator(Kenkyū-buntansha)	NGUYEN MinhLe 北陸先端科学技術大学院大学, 先端科学技術研究科, 教授 (30509401) 寺内多智弘早稲田大学, 理工学術院, 教授 (70447150) 関浩之名古屋大学, 情報学研究科, 教授 (80196948)
Project Period (FY)	2020-07-30 – 2026-03-31
Project Status	Granted (Fiscal Year 2022)
Budget Amount *help	¥25,870,000 (Direct Cost: ¥19,900,000、Indirect Cost: ¥5,970,000) Fiscal Year 2025: ¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000) Fiscal Year 2024: ¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000) Fiscal Year 2023: ¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000) Fiscal Year 2022: ¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000) Fiscal Year 2021: ¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000) Fiscal Year 2020: ¥4,550,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥1,050,000)
Keywords	動的記号実行 / 命令セット / バイナリコード / マルウェア / 自然言語処理 / マルウェア解析 / 記号実行 / API / ネィティブコード / マルウェア意味解析 / 形式的意味自動抽出 / 形式仕様自動抽出 / 深層学習
Outline of Research at the Start	人間には解釈困難なバイナリコードに対し(1)操作的意味はレジスタ・フラッグ・メモリ・スタック上の状態遷移系として定義可能、(2)各命令仕様はリジッドな自然言語記述、(3)エミュレータ等テスト環境が完備などの観察に立脚し、英文マニュアルから命令の操作的意味自動抽出によりBE-PUM(x86), Corana(ARM), SiMIPS(MIPS)等のツールをGitHubで公開してきた。本研究は、多数のMPU/MCUの動的記号実行器の自動生成に加え、構造隠蔽前のペイロードの自動復元・抽出を行い、教師無し学習による特徴抽出、自然言語処理を用いた意味解釈を通じた新規感染手法検出・系統樹生成を目的とする。
Outline of Annual Research Achievements	R4年度は、主にARM/Androidを対象とした動的記号実行器Coranaの拡張Corana-Xの実装・開発を進めた。Androidマルウェアは主にApkファイル形式であり、Javaに準ずる記述に加え native code (ARM, x86等）やLinuxライブラリ関数呼出しを含むため、実行が複数の異環境にわたる。本研究では、JavaはSymbolic Path Finder (NASAが開発)、ARMはCorana、Linux関数呼出しについてはOS環境下で実行するAPI stub（ともに本グループで実装）を組み合わせApkファイルの記号実行を行う。異環境下の記号実行の連携には、コード実行と同様な環境の転送が必要となる。しかしデータは単に32bitや64bitの値であるほか、メモリアドレスのセルやバッファを指す場合など、それぞれを区別し必要に応じてポインターをたどる必要がある。それには引数の型情報が必要であり、コードやマニュアルの記述から自動抽出する。現在、実世界Androidマルウェアの記号実行が可能となり、Drebin dataset(5560個）において実験を行っている。その他の研究項目は、命令セットマニュアルからの自然言語処理による意味抽出における解釈規則導出（分担者 Nguyen Minh Leと共同）、隠蔽手法を用いるPCマルウェア（x86/win) の記号実行結果に対する機械学習によるOEP（original entry point）検出の検討を進めている。分担者による研究項目として、悪意あるコード生成手法（Dos攻撃）に正規表現を用いた新たな手法を提案した（分担者寺内多智弘）。またマルウェア解析への機械学習の応用を視野に入れた言語学習理論として，決定性上昇型ノミナル木オートマトンのアクティブ学習アルゴリズムを提案した（分担者関浩之）。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason ソフトウェア工学的に中心的な課題である動的記号実行器の実装は順調に進んでいる。特に過去のx86（32bit）/WindowsのBE-PUMに加え、ARM（32bit）/Android(linux)をターゲットとするCorana-Xの実装のプロトタイプが完成している。これは実世界マルウェアによる実験環境が基本的に整ったことを意味し、今後、目的とするマルウェア手法（隠蔽手法、感染手法、攻撃手法）の理解を機械学習等の統計的手法と組み合わせて進めることが可能となった。また、現在のOEP（original entry point）検出は、過去に行った隠蔽手法の利用頻度情報に基づくパッカー同定（2017年発表）の再検討、および、分担者（関浩之）の進めている決定性上昇型ノミナル木オートマトンのアクティブ学習アルゴリズムに基づく言語学習理論の応用を検討している。また分担者（寺内多智弘）の進めたDoS攻撃の正規表現に基づく新手法は攻撃手法の新展開であり、今後、感染手法の理解を進める上で参考とする。
Strategy for Future Research Activity	ソフトウェア工学的に中心的な課題である動的記号実行器の実装は順調に進んでいる。今後、x86 (32bit), ARM (32bit) で確立した手法を一般化し、他の命令セット（64bit含む）に対しても命令セットマニュアルの自然言語処理に基づき動的記号実行器を自動生成手法を確立する。マルウェア手法（隠蔽手法、感染手法、攻撃手法）の理解については、隠蔽手法については一定の確立を既に得ているが、OEP検出を通じてパッカー手法を明らかにしていく。感染手法については、OEPから始まるマルウェアのペイロード解析、およびVulnarability report（cve.mitre.org等）の自然言語処理から vulnarability とその攻撃コードの関係の抽出をめざす。攻撃手法については、主に分担者が進める。

Report

(3 results)

Research Products

(13 results)

All 2023 2022 2021 2020 Other

All Journal Article (3 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 3 results, Open Access: 3 results) Presentation (8 results) (of which Int'l Joint Research: 5 results) Remarks (2 results)

[Journal Article] Reduction of Register Pushdown Systems with Freshness Property to Pushdown Systems in LTL Model Checking2022
- Author(s)
  TAKATA Yoshiaki、SENDA Ryoma、SEKI Hiroyuki
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: E105.D Issue: 9 Pages: 1620-1623
- DOI
  10.1587/transinf.2022EDL8030
- ISSN
  0916-8532, 1745-1361
- Year and Date
  2022-09-01
- Related Report
  2022 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Complexity results on register context-free grammars and related formalisms2022
- Author(s)
  Senda Ryoma、Takata Yoshiaki、Seki Hiroyuki
- Journal Title
  
  Theoretical Computer Science
  
  Volume: 923 Pages: 99-125
- DOI
  10.1016/j.tcs.2022.04.055
- Related Report
  2022 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Reachability of Patterned Conditional Pushdown Systems2020
- Author(s)
  Li Xin、Gardy Patrick、Deng Yu-Xin、Seki Hiroyuki
- Journal Title
  
  Journal of Computer Science and Technology
  
  Volume: 35 Issue: 6 Pages: 1295-1311
- DOI
  10.1007/s11390-020-0541-z
- Related Report
  2020 Research-status Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] Modular Primal-Dual Fixpoint Logic Solving for Temporal Verification2023
- Author(s)
  Hiroshi Unno, Tachio Terauchi, Yu Gu, Eric Koskinen
- Organizer
  50th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2023), pp.2111-2140
- Related Report
  2022 Research-status Report
- Int'l Joint Research
[Presentation] 後方参照付正規表現の表現力について2023
- Author(s)
  野上大成, 寺内多智弘
- Organizer
  ソフトウェア科学会第25回プログラミングおよびプログラミング言語ワークショップ (PPL 2023)
- Related Report
  2022 Research-status Report
[Presentation] Automatic Stub Generation for Dynamic Symbolic Execution of ARM binary2022
- Author(s)
  Nguyen Van Anh, Mizuhito Ogawa .
- Organizer
  11th International Symposium on Information and Communication Technology (SoICT 2022), pp.352-359
- Related Report
  2022 Research-status Report
- Int'l Joint Research
[Presentation] Repairing DoS Vulnerability of Real-World Regexes2022
- Author(s)
  Nariyoshi Chida, Tachio Terauchi
- Organizer
  43rd IEEE Symposium on Security and Privacy (S&P 2022), pp.2060-2077
- Related Report
  2022 Research-status Report
- Int'l Joint Research
[Presentation] On Lookaheads in Regular Expressions with Backreferences2022
- Author(s)
  Nariyoshi Chida, Tachio Terauchi
- Organizer
  7th International Conference on Formal Structures for Computation and Deduction (FSCD 2022), LIPICS Vol. 228, pp.15:1-15:18
- Related Report
  2022 Research-status Report
- Int'l Joint Research
[Presentation] Active Learning for Deterministic Bottom-up Nominal Tree Automata2022
- Author(s)
  Rindo Nakanishi, Yoshiaki Takata, Hiroyuki Seki
- Organizer
  19th International Colloquium on Theoretical Aspects of Computing (ICTAC 2022), LNCS 13572, pp.342-359
- Related Report
  2022 Research-status Report
- Int'l Joint Research
[Presentation] Constraint-based Relational Verification2021
- Author(s)
  Hiroshi Unno, Tachio Terauchi, Eric Koskinen
- Organizer
  33rd International Conference on Computer-Aided Verification (CAV 2021), Springer LNCS 12759, pp.742-766
- Related Report
  2021 Research-status Report
[Presentation] Reactive Synthesis from Visibly Register Pushdown Automata2021
- Author(s)
  Ryoma Senda, Yoshiaki Takata, Hiroyuki Seki
- Organizer
  18th International Colloquium on Theoretical Aspects of Computing (ICTAC 2021), Springer LNCS 12819, pp.334-353
- Related Report
  2021 Research-status Report
[Remarks] Corana（実装公開）
- URL
  https://github.com/anhvvcs/corana
- Related Report
  2021 Research-status Report
[Remarks] Corana/API（実装公開）
- URL
  https://github.com/vananhnt/corana
- Related Report
  2021 Research-status Report

Understanding malware semantics by AI-supported formal methods

Principal Investigator

小川 瑞史 北陸先端科学技術大学院大学, 先端科学技術研究科, 教授 (40362024)

¥25,870,000 (Direct Cost: ¥19,900,000、Indirect Cost: ¥5,970,000)

Current Status of Research Progress

Reason

Report

Research Products

[Journal Article] Reduction of Register Pushdown Systems with Freshness Property to Pushdown Systems in LTL Model Checking2022

Author(s)

Journal Title

DOI

ISSN

Year and Date

Related Report

[Journal Article] Complexity results on register context-free grammars and related formalisms2022

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Reachability of Patterned Conditional Pushdown Systems2020

Author(s)

Journal Title

DOI

Related Report

[Presentation] Modular Primal-Dual Fixpoint Logic Solving for Temporal Verification2023

Author(s)

Organizer

Related Report

[Presentation] 後方参照付正規表現の表現力について2023

Author(s)

Organizer

Related Report

[Presentation] Automatic Stub Generation for Dynamic Symbolic Execution of ARM binary2022

Author(s)

Organizer

Related Report

[Presentation] Repairing DoS Vulnerability of Real-World Regexes2022

Author(s)

Organizer

Related Report

[Presentation] On Lookaheads in Regular Expressions with Backreferences2022

Author(s)

Organizer

Related Report

[Presentation] Active Learning for Deterministic Bottom-up Nominal Tree Automata2022

Author(s)

Organizer

Related Report

[Presentation] Constraint-based Relational Verification2021

Author(s)

Organizer

Related Report

[Presentation] Reactive Synthesis from Visibly Register Pushdown Automata2021

Author(s)

Organizer

Related Report

[Remarks] Corana（実装公開）

URL

Related Report

[Remarks] Corana/API（実装公開）

URL

Related Report

小川瑞史北陸先端科学技術大学院大学, 先端科学技術研究科, 教授 (40362024)