• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Development of accurate and reproducible matrix computation library for massively parallel environments

Research Project

Project/Area Number 19K20286
Research Category

Grant-in-Aid for Early-Career Scientists

Allocation TypeMulti-year Fund
Review Section Basic Section 60100:Computational science-related
Research InstitutionInstitute of Physical and Chemical Research

Principal Investigator

Mukunoki Daichi  国立研究開発法人理化学研究所, 計算科学研究センター, 研究員 (90742289)

Project Period (FY) 2019-04-01 – 2023-03-31
Project Status Completed (Fiscal Year 2022)
Budget Amount *help
¥2,340,000 (Direct Cost: ¥1,800,000、Indirect Cost: ¥540,000)
Fiscal Year 2021: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2020: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2019: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Keywords高精度 / 再現性 / 行列計算 / 疎行列反復法 / BLAS / 超並列 / 浮動小数点演算
Outline of Research at the Start

コンピュータによる科学技術計算で主として用いられる浮動小数点演算は有限桁であり,演算結果には真の値に対して丸め誤差が生じうる.また結合法則が成り立たないため計算環境に依存して計算順序が変わると計算結果が丸め誤差レベルで変わりうるため,同じ計算結果を再現できないことがある.これらの特性は特にスーパーコンピュータ上で実施されるような大規模かつ複雑な数値計算において,信頼性の担保やソフトウェア開発・保守の障壁となりうる.本研究では科学技術計算の基本演算となる行列計算において,計算の高精度化と再現性を実現し,かつ最新のスーパーコンピュータにおいて高性能を達成できるソフトウェアを開発する.

Outline of Final Research Achievements

In this study, we developed the Basic Linear Algebra Subprograms (BLAS) for massively parallel architectures, which is accurate and can ensure reproducibility of computation results among different environments. Focusing mainly on the Ozaki scheme, we have developed a high-performance implementation of accurate and reproducible BLAS routines, and demonstrated its application to sparse iterative solvers on CPUs and GPUs. As further applications, we proposed an implementation of a single/double precision matrix multiplications using low-precision arithmetic units (Tensor Cores) and a binary128 matrix multiplication using single/double precision matrix multiplications.

Academic Significance and Societal Importance of the Research Achievements

CPUおよびGPUにおいて高精度かつ計算結果の再現が可能なBLASルーチンを実現し,疎行列ソルバーへの応用を示した.既存手法と比べて性能および実装が容易であり,応用数理分野での応用も期待できる.またAI向け低精度演算器を単精度・倍精度の行列計算に応用可能であることを示した.今後のハードウェアデザインへのインパクトも期待できる.

Report

(5 results)
  • 2022 Annual Research Report   Final Research Report ( PDF )
  • 2021 Research-status Report
  • 2020 Research-status Report
  • 2019 Research-status Report
  • Research Products

    (38 results)

All 2023 2022 2021 2020 2019 Other

All Int'l Joint Research (3 results) Journal Article (11 results) (of which Int'l Joint Research: 5 results,  Peer Reviewed: 8 results,  Open Access: 4 results) Presentation (24 results) (of which Int'l Joint Research: 19 results)

  • [Int'l Joint Research] Sorbonne University(フランス)

    • Related Report
      2021 Research-status Report
  • [Int'l Joint Research] Sorbonne University(フランス)

    • Related Report
      2020 Research-status Report
  • [Int'l Joint Research] Sorbonne University(フランス)

    • Related Report
      2019 Research-status Report
  • [Journal Article] Infinite-Precision Inner Product and Sparse Matrix-Vector Multiplication Using Ozaki Scheme with Dot2 on Manycore Processors2023

    • Author(s)
      Mukunoki Daichi、Ozaki Katsuhisa、Ogita Takeshi、Imamura Toshiyuki
    • Journal Title

      Proc. Parallel Processing and Applied Mathematics (PPAM 2022), Part of the Lecture Notes in Computer Science book series

      Volume: 13826 Pages: 40-54

    • DOI

      10.1007/978-3-031-30442-2_4

    • ISBN
      9783031304415, 9783031304422
    • Related Report
      2022 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] 尾崎スキームによる無限精度内積と再現可能疎行列反復ソルバーへの応用2022

    • Author(s)
      椋木大地, 尾崎克久, 荻田武史, 今村俊幸
    • Journal Title

      日本応用数理学会2022年度年会講演予稿集

      Volume: -

    • Related Report
      2022 Annual Research Report
  • [Journal Article] Accurate Matrix Multiplication on Binary128 Format Accelerated by Ozaki Scheme2021

    • Author(s)
      Mukunoki Daichi、Ozaki Katsuhisa、Ogita Takeshi、Imamura Toshiyuki
    • Journal Title

      Proc. The 50th International Conference on Parallel Processing (ICPP-2021)

      Volume: -- Pages: 1-11

    • DOI

      10.1145/3472456.3472493

    • Related Report
      2021 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Conjugate Gradient Solvers with High Accuracy and Bit-wise Reproducibility between CPU and GPU using Ozaki scheme2021

    • Author(s)
      Mukunoki Daichi、Ozaki Katsuhisa、Ogita Takeshi、Iakymchuk Roman
    • Journal Title

      Proc. The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2021)

      Volume: - Pages: 100-109

    • DOI

      10.1145/3432261.3432270

    • Related Report
      2020 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?2021

    • Author(s)
      Jens Domke, Emil Vatai, Aleksandr Drozd, Peng Chen, Yosuke Oyama, Lingqi Zhang, Shweta Salaria, Daichi Mukunoki, Artur Podobas, Mohamed Wahib, Satoshi Matsuoka
    • Journal Title

      Proc. 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021)

      Volume: -

    • Related Report
      2020 Research-status Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] DGEMM Using Tensor Cores, and Its Accurate and Reproducible Versions2020

    • Author(s)
      Mukunoki Daichi、Ozaki Katsuhisa、Ogita Takeshi、Imamura Toshiyuki
    • Journal Title

      Lecture Notes in Computer Science

      Volume: 12151 Pages: 230-248

    • DOI

      10.1007/978-3-030-50743-5_12

    • ISBN
      9783030507428, 9783030507435
    • Related Report
      2020 Research-status Report
    • Peer Reviewed
  • [Journal Article] Can We Avoid Rounding-Error Estimation in HPC Codes and Still Get Trustworthy Results?2020

    • Author(s)
      Jezequel Fabienne、Graillat Stef、Mukunoki Daichi、Imamura Toshiyuki、Iakymchuk Roman
    • Journal Title

      Lecture Notes in Computer Science

      Volume: 12549 Pages: 163-177

    • DOI

      10.1007/978-3-030-63618-0_10

    • ISBN
      9783030636173, 9783030636180
    • Related Report
      2020 Research-status Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] DGEMM using Tensor Cores, and Its Accurate and Reproducible Versions2020

    • Author(s)
      Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura
    • Journal Title

      ISC High Performance 2020

      Volume: -

    • Related Report
      2019 Research-status Report
    • Peer Reviewed
  • [Journal Article] Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme for Many-core Architectures2020

    • Author(s)
      Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki
    • Journal Title

      13th International Conference on Parallel Processing and Applied Mathematics (PPAM2019), Lecture Notes in Computer Science

      Volume: 12043 Pages: 516-527

    • DOI

      10.1007/978-3-030-43229-4_44

    • ISBN
      9783030432287, 9783030432294
    • Related Report
      2019 Research-status Report
    • Peer Reviewed
  • [Journal Article] Can we avoid rounding-error estimation in HPC codes and still get trustful results?2020

    • Author(s)
      Fabienne Jezequel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, and Roman Iakymchuk
    • Journal Title

      Hyper Articles en Ligne

      Volume: hal-02486753

    • Related Report
      2019 Research-status Report
    • Open Access / Int'l Joint Research
  • [Journal Article] Numerical Reproducibility based on Minimal-Precision Validation2019

    • Author(s)
      Toshiyuki Imamura, Daichi Mukunoki, Fabienne Jezequel, Stef Graillat, Roman Iakymchuk
    • Journal Title

      Computational Reproducibility at Exascale Workshop (CRE2019)

      Volume: -

    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] A Fast Infinite Precision Inner Product using Ozaki Scheme and Dot2, and Its Application to Reproducible Conjugate Gradient Solvers2022

    • Author(s)
      Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura
    • Organizer
      ISC High Performance (ISC 2022), research poster session
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Accurate Matrix Computations using Ozaki Scheme on CPUs and GPUs2022

    • Author(s)
      Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura
    • Organizer
      The 30th Anniversary Symposium of the Center for Computational Sciences at the University of Tsukuba
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Impact and Contribution of Ozaki scheme in High Performance Computing2022

    • Author(s)
      Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Roman Iakymchuk
    • Organizer
      International Workshop on Reliable Computing and Computer-Assisted Proofs (ReCAP 2022)
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] Accurate Matrix Multiplication on Binary128 using Ozaki Scheme2021

    • Author(s)
      Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura
    • Organizer
      ISC High Performance (ISC 2021), research poster session
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] Accurate and Reproducible Conjugate Gradient in Hybrid Parallel Environments2021

    • Author(s)
      Roman Iakymchuk, Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki, Stef Graillat
    • Organizer
      ISC High Performance (ISC 2021), research poster session
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] High-Precision, Accurate, and Reproducible Linear Algebra Operations using Ozaki Scheme2021

    • Author(s)
      Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Roman Iakymchuk
    • Organizer
      3rd R-CCS International Symposium
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] 精度自動チューニングに向けた基盤技術の検討2021

    • Author(s)
      椋木大地
    • Organizer
      第13回自動チューニング技術の現状と応用に関するシンポジウム (ATTA2021)
    • Related Report
      2021 Research-status Report
  • [Presentation] DGEMM using Tensor Cores2021

    • Author(s)
      Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura
    • Organizer
      SIAM Conference on Computational Science and Engineering (CSE21)
    • Related Report
      2021 Research-status Report 2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] High-Precision, Accurate, and Reproducible Linear Algebra Operations using Ozaki Scheme2021

    • Author(s)
      Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Roman Iakymchuk
    • Organizer
      The 3rd R-CCS International Symposium
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] DGEMM using Tensor Cores and OzBLAS2020

    • Author(s)
      Daichi Mukunoki
    • Organizer
      11th Joint Laboratory for Extreme Scale Computing (JLESC) Workshop
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] binary128に対する尾崎スキーム行列積2020

    • Author(s)
      椋木大地, 尾崎克久, 荻田武史
    • Organizer
      第4回精度保証付き数値計算の実問題への応用研究集会 (NVR 2020)
    • Related Report
      2020 Research-status Report
  • [Presentation] 尾崎スキームを用いたbinary128による4倍精度行列積2020

    • Author(s)
      椋木大地, 尾崎克久, 荻田武史
    • Organizer
      日本応用数理学会2020年度年会
    • Related Report
      2020 Research-status Report
  • [Presentation] Accurate DGEMM using Tensor Cores2020

    • Author(s)
      Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura
    • Organizer
      HPC Asia 2020 (poster session)
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] Optimizing Precision for High-Performance, Robust, and Energy-Efficient Computations2020

    • Author(s)
      Roman Iakymchuk, Fabienne Jezequel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Norihisa Fujita, Taisuke Boku
    • Organizer
      HPC Asia 2020 (poster session)
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations2020

    • Author(s)
      Daichi Mukunoki
    • Organizer
      SIAM Conference on Parallel Processing for Scientific Computing (PP20)
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] Accurate BLAS implementations: OzBLAS and BLAS-DOT22020

    • Author(s)
      Daichi Mukunoki
    • Organizer
      Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2020 January)
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations2020

    • Author(s)
      Daichi Mukunoki
    • Organizer
      Sapporo Winter HPC Seminar 2020
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations2019

    • Author(s)
      Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jezequel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku
    • Organizer
      SC19 (research poster session)
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations2019

    • Author(s)
      Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jezequel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku
    • Organizer
      France-Japan-Germany trilateral workshop: Convergence of HPC and Data Science for Future Extreme Scale Intelligent Applications (poster presentation)
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] Accurate and Reproducible Linear Algebra Operations for Many-core Architectures2019

    • Author(s)
      Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki:
    • Organizer
      Russian Supercomputing Days 2019 (RuSCDays 2019) (poster session)
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] 尾崎スキームによる高精度BLAS実装「OzBLAS」とその応用2019

    • Author(s)
      椋木大地, 荻田武史, 尾崎克久
    • Organizer
      第3回 精度保証付き数値計算の実問題への応用研究集会 (NVR 2019)
    • Related Report
      2019 Research-status Report
  • [Presentation] Accurate and Reproducible CG Method on GPUs2019

    • Author(s)
      Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki
    • Organizer
      European Numerical Mathematics and Advanced Applications Conference 2019 (ENUMATH2019)
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] High-Performance Implementations of Accurate and Reproducible BLAS Routines on GPUs2019

    • Author(s)
      Daichi Mukunoki
    • Organizer
      Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2019 June)
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] 尾崎スキームに基づく高精度かつ再現性のあるBLASルーチンの実装と自動チューニングの適用2019

    • Author(s)
      椋木大地
    • Organizer
      第22回AT研究会オープンアカデミックセッション(ATOS22)
    • Related Report
      2019 Research-status Report

URL: 

Published: 2019-04-18   Modified: 2024-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi