Machine learning driven system level heterogeneous memory management for high-performance computing

Research Project

Project/Area Number	19K11993
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 60090:High performance computing-related
Research Institution	Institute of Physical and Chemical Research
Principal Investigator	GEROFI BALAZS 国立研究開発法人理化学研究所, 計算科学研究センター, 上級研究員 (70633501)
Project Period (FY)	2019-04-01 – 2023-03-31
Project Status	Discontinued (Fiscal Year 2022)
Budget Amount *help	¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000) Fiscal Year 2021: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000) Fiscal Year 2020: ¥2,080,000 (Direct Cost: ¥1,600,000、Indirect Cost: ¥480,000) Fiscal Year 2019: ¥910,000 (Direct Cost: ¥700,000、Indirect Cost: ¥210,000)
Keywords	Memory access tracing / Runtime approximation / Distributed ML / Neural network training / I/O of deep learning / Distributed learning / Memory access tracking / heterogeneous memory / gem5 / architectural simulator / non-uniform memory / machine learning / reinforcement learning / long-short term memory / transformer attention / Memory management / Machine learning / HPC
Outline of Research at the Start	This research studies the combination of system software level mechanisms with machine learning driven policies for heterogeneous memory management in high-performance computing. It involves automatic discovery and characterization of memory devices, online application profiling based on hardware performance counters, machine learning driven decision processes for data management, and transparent, operating system level data movement.
Outline of Annual Research Achievements	Results have been achieved in two parallel efforts of the project. We found that system-software-level heterogeneous memory management solutions utilizing machine learning, in particular nonsupervised learning- based methods such as reinforcement learning, require rapid estimation of execution runtime as a function of the data layout across memory devices for exploring different data placement strategies, which renders architecture-level simulators impractical for this purpose. We proposed a differential tracing-based approach using memory access traces obtained by high-frequency sampling-based methods (e.g., Intel's PEBS) on real hardware using of different memory devices. We developed a runtime estimator based on such traces that provides an execution time estimate orders of magnitude faster than full-system simulators. On a number of HPC mini applications we showed that the estimator predicts runtime with an average error of 4.4% compared to measurements on real hardware. For the deep learning data shuffling subtopic, we investigated the viability of partitioning the dataset among DL workers and performing only a partial distributed exchange of samples in each training epoch. Through extensive experiments on up to 2048 GPUs of ABCI and 4096 compute nodes of Fugaku, we demonstrated that in practice validation accuracy of global shuffling can be maintained when carefully tuning the partial distributed exchange. We provided an implementation in PyTorch that enables users to control the proposed data exchange scheme.

Report

(4 results)

Research Products
(8 results)

All 2022 2021 2020 2019 Other

All Int'l Joint Research (2 results) Journal Article (1 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 1 results) Presentation (5 results) (of which Int'l Joint Research: 5 results, Invited: 2 results)

[Int'l Joint Research] Telecom Sudparis(フランス)
- Related Report
  2022 Annual Research Report
[Int'l Joint Research] Argonne National Laboratory(米国)
- Related Report
  2022 Annual Research Report
[Journal Article] Why Globally Re-shuffle? An I/O Perspective on Data Shuffling in Large Scale Deep Learning2021
- Author(s)
  TruongThao Nguyen, Balazs Gerofi, Liao Jianwei, Francois Trahay, Mohamed Wahib
- Journal Title
  
  International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) [submitted]
  
  Volume: 1 Pages: 10-10
- Related Report
  2020 Research-status Report
- Peer Reviewed / Int'l Joint Research
[Presentation] Why Globally Re-shuffle? Revisiting Data Shuffling in Large Scale Deep Learning2022
- Author(s)
  Truong Thao Nguyen, Francois Trahay, Jens Domke, Aleksandr Drozd, Emil Vatai, Jianwei Liao, Mohamed Wahib, Balazs Gerofi
- Organizer
  36th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2022)
- Related Report
  2022 Annual Research Report 2021 Research-status Report
- Int'l Joint Research
[Presentation] Rapid Execution Time Estimation for Heterogeneous Memory Systems through Differential Tracing2022
- Author(s)
  Nicolas Denoyelle, Swann Perarnau, Kamil Iskra, Balazs Gerofi
- Organizer
  International Conference on High Performance Computing (ISC)
- Related Report
  2022 Annual Research Report 2021 Research-status Report
- Int'l Joint Research
[Presentation] Directions for Operating Systems Research2021
- Author(s)
  Balazs Gerofi
- Organizer
  DOE ASCR OS Research Roundtable'21
- Related Report
  2020 Research-status Report
- Int'l Joint Research / Invited
[Presentation] 2020 SIAM Conference on Parallel Processing for Scientific Computing2020
- Author(s)
  Balazs Gerofi
- Organizer
  Operating System Support for Intelligent Management of Heterogeneous Memor
- Related Report
  2019 Research-status Report
- Int'l Joint Research
[Presentation] Towards Intelligent Management of Heterogeneous Memory: A Reinforcement Learning Approach2019
- Author(s)
  Balazs Gerofi
- Organizer
  Fifth Workshop on Programming Abstractions for Data Locality (PADAL'19)
- Related Report
  2019 Research-status Report
- Int'l Joint Research / Invited

Machine learning driven system level heterogeneous memory management for high-performance computing

Principal Investigator

GEROFI BALAZS 国立研究開発法人理化学研究所, 計算科学研究センター, 上級研究員 (70633501)

¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)

Report

Research Products

[Int'l Joint Research] Telecom Sudparis(フランス)

Related Report

[Int'l Joint Research] Argonne National Laboratory(米国)

Related Report

[Journal Article] Why Globally Re-shuffle? An I/O Perspective on Data Shuffling in Large Scale Deep Learning2021

Author(s)

Journal Title

Related Report

[Presentation] Why Globally Re-shuffle? Revisiting Data Shuffling in Large Scale Deep Learning2022

Author(s)

Organizer

Related Report

[Presentation] Rapid Execution Time Estimation for Heterogeneous Memory Systems through Differential Tracing2022

Author(s)

Organizer

Related Report

[Presentation] Directions for Operating Systems Research2021

Author(s)

Organizer

Related Report

[Presentation] 2020 SIAM Conference on Parallel Processing for Scientific Computing2020

Author(s)

Organizer

Related Report

[Presentation] Towards Intelligent Management of Heterogeneous Memory: A Reinforcement Learning Approach2019

Author(s)

Organizer

Related Report