• 研究課題をさがす
  • 研究者をさがす
  • KAKENの使い方
  1. 課題ページに戻る

2020 年度 実施状況報告書

Machine learning driven system level heterogeneous memory management for high-performance computing

研究課題

研究課題/領域番号 19K11993
研究機関国立研究開発法人理化学研究所

研究代表者

GEROFI BALAZS  国立研究開発法人理化学研究所, 計算科学研究センター, 上級研究員 (70633501)

研究期間 (年度) 2019-04-01 – 2022-03-31
キーワードMemory access tracking / Neural network training / I/O of deep learning
研究実績の概要

We have completed the extension to the gem5 simulator for supporting heterogeneous memory systems by adding capabilities to define an arbitrary number of different memory devices with specific performance characteristics.
We completed the python interface for real-time memory access communication between gem5 and PyTorch and developed test codes to run simple analysis on the captured data.
Due to the high runtime overhead of gem5 we also started working on a simplified simulator based on leading-loads model using gem5 results, this runtime estimator will be more suitable for plugging it into a reinforcement learning framework.
As a side topic, we tarted exploring I/O implications of large scale training that is necessary for distributed training of large neural networks in supercomputing environments.

現在までの達成度 (区分)
現在までの達成度 (区分)

4: 遅れている

理由

Our PostDoc student who was scheduled to work on this project couldn't come to Japan due to COVID-19 and resigned from his RIKEN position. We are lacking man-power at the moment for the agenda to progress as originally planned.

今後の研究の推進方策

Continue implementation of leading-load based runtime estimator.
Continue exploration of memory sensitive applications.
Start investigating an alternative runtime estimator based on precise-event based sampling and heterogeneous memory platforms (Intel Optane+DRAM or DRAM+MCDRAM configurations as primary targets).
Continue development of I/O improvements for large-scale training.

次年度使用額が生じた理由

Most of the fund will be used for renting compute capacity in order to run experiments.
Depending on the COVID situation, some of the funds may be used for international travel.

  • 研究成果

    (2件)

すべて 2021

すべて 雑誌論文 (1件) (うち国際共著 1件、 査読あり 1件) 学会発表 (1件) (うち国際学会 1件、 招待講演 1件)

  • [雑誌論文] Why Globally Re-shuffle? An I/O Perspective on Data Shuffling in Large Scale Deep Learning2021

    • 著者名/発表者名
      TruongThao Nguyen, Balazs Gerofi, Liao Jianwei, Francois Trahay, Mohamed Wahib
    • 雑誌名

      International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) [submitted]

      巻: 1 ページ: 10

    • 査読あり / 国際共著
  • [学会発表] Directions for Operating Systems Research2021

    • 著者名/発表者名
      Balazs Gerofi
    • 学会等名
      DOE ASCR OS Research Roundtable'21
    • 国際学会 / 招待講演

URL: 

公開日: 2021-12-27  

サービス概要 検索マニュアル よくある質問 お知らせ 利用規程 科研費による研究の帰属

Powered by NII kakenhi