Project/Area Number |
15K00166
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Research Field |
High performance computing
|
Research Institution | The University of Tokyo |
Principal Investigator |
|
Project Period (FY) |
2015-04-01 – 2019-03-31
|
Project Status |
Completed (Fiscal Year 2018)
|
Budget Amount *help |
¥4,680,000 (Direct Cost: ¥3,600,000、Indirect Cost: ¥1,080,000)
Fiscal Year 2017: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2016: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2015: ¥2,860,000 (Direct Cost: ¥2,200,000、Indirect Cost: ¥660,000)
|
Keywords | FPGA / 演算加速装置 / PCI Express / OpenCL / OpenACC / 演算と通信の融合 / GPUクラスタ / 高性能インタコネクト / 密結合演算加速アーキテクチャ / 密結合演算加速機構アーキテクチャ |
Outline of Final Research Achievements |
Tightly Coupled Accelerators (TCA) architecture, which realizes direct communication among accelerators such as GPUs, is effective for improving strong-scaling performance thanks to low-latency of TCA. In the present study, the feasibility study was performed for the purpose of realization of highly efficient computation by fusion of fast communication using TCA and FPGA computation. Several kernels including numerical algorithms were described for accelerator by OpenCL, and higher performance could be achieved by further modification toward highly pipelined manner. Automatic conversion from OpenACC to OpenCL was also investigated. However, since drastic modification is required from traditional description manner, it is considered that automatic optimization is too complicated.
|
Academic Significance and Societal Importance of the Research Achievements |
演算加速器向けのプログラミング言語であるOpenCLを用いたFPGA実装がFeasibleであることを示した。しかしGPUのようにデータ並列の記述では性能が得られず、FPGAのアーキテクチャを考慮し記述する必要がある。OpenCLのカーネルを分割し、パイプライン方式での制御に変更することで行列積については高い性能が得られた。 また、通常のソフトウェア最適化技術と逆行する、冗長な記述や、ループ中での分岐などがFPGAで有効である。 今後に向けた最新インタフェースとして、CPUとキャッシュ一貫性を持つFPGA接続、3次元積層メモリに関して性能確認を行い、現状の各5倍、30倍程度のバンド幅が期待できる。
|