2014 Fiscal Year Annual Research Report

メモリマシンモデル上の並列計算理論の構築と次世代ＧＰＧＰＵアーキテクチャの提案

Research Project

Project/Area Number	26280002
Research Institution	Hiroshima University
Principal Investigator	中野浩嗣広島大学, 工学(系)研究科(研究院), 教授 (30281075)
Co-Investigator(Kenkyū-buntansha)	高藤大介広島大学, 工学(系)研究科(研究院), 助教 (00314732) 伊藤靖朗広島大学, 工学(系)研究科(研究院), 准教授 (40397964)
Project Period (FY)	2014-04-01 – 2019-03-31
Keywords	並列処理 / 理論計算モデル / 並列アルゴリズム
Outline of Annual Research Achievements	GPUは本来グラフィックス処理のための補助演算用のLSI であるが，これをグラフィクス以外の汎用計算に利用する技術GPGPUが注目されており，さまざまな研究開発が行なわれている．しかし，そのアーキテクチャは複雑であり，単純な共有メモリを想定しPRAM 向けに最適化された並列アルゴリズムをそのままGPUに実装しても，十分な性能を得ることができない．本研究の目的はGPUの本質をとらえた並列計算の理論モデルを構築し，理論的・解析的に性能評価をおこなうことである．そこで，３つの理論計算モデルDMM（Discrete Memory Machine），UMM（Unified Memory Machine），HMM(Hierarchical Memory Machine)の３つを提案した．これらはGPUのメモアクセスに注目した並列計算のろりんモデルである．DMMはGPUのシェアードメモリ，UMMはGPUのグローバルメモリのアクセスについてモデル化してものであり，HMMはそれを階層的に接続したGPUのアーキテクチャを反映したものである．これらのモデル上で，基本的な行列計算の並列アルゴリズムを示し，またその最適性の証明を行った．また，GPUに実装し，理論的解析と実際の性能がほぼ一致することを示した．さらには，Summed Area Tableの計算や動的計画法のアルゴリズムを示し，理論モデルで解析するとともに，GPUに実装しその比較を行った．
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason GPUの理論計算モデルを示し，その上での基本的なアルゴリズムの理論解析と実装による評価に成功しており，順調に研究が進んでいると言える．
Strategy for Future Research Activity	より複雑なアルゴリズムについて理論モデルの検証を行う．また，占有率などのモデルに加味されていない要素を取り入れた理論モデルを検討する．
Causes of Carryover	購入予定のGPUが予定より安くなったため．
Expenditure Plan for Carryover Budget	今年度に購入するGPUの一部に充当したい．

Research Products
(11 results)

All 2015 2014

All Journal Article (3 results) (of which Peer Reviewed: 3 results) Presentation (8 results)

[Journal Article] An Optimal Implementation of the Approximate String Matching on the Hierarchical Memory Machine, with Performance Evaluation on the GPU2014
- Author(s)
  Duhu MAN, Koji NAKANO, Yasuaki ITO
- Journal Title
  
  IEICE TRANSACTIONS on Information and Systems
  
  Volume: E97-D Pages: 3063-3071
- DOI
  http://doi.org/10.1587/transinf.2014PAP0011
- Peer Reviewed
[Journal Article] Offline Permutation on the CUDA-enabled GPU2014
- Author(s)
  Akihiko KASAGI, Koji NAKANO, Yasuaki ITO
- Journal Title
  
  IEICE TRANSACTIONS on Information and Systems
  
  Volume: E97-D Pages: 3052-3062
- DOI
  http://doi.org/10.1587/transinf.2014PAP0010
- Peer Reviewed
[Journal Article] Accelerating ant colony optimisation for the travelling salesman problem on the GPU2014
- Author(s)
  Akihiro Uchida, Yasuaki Ito, Koji Nakano
- Journal Title
  
  International Journal of Parallel, Emergent and Distributed Systems
  
  Volume: 29 Pages: 401-420
- DOI
  http://doi.org/10.1080/17445760.2013.842568
- Peer Reviewed
[Presentation] Optimality of Fundamental Parallel Algorithms on the Hierarchical Memory Machine, with GPU implementation2015
- Author(s)
  Koji Nakano and Yasuaki Ito
- Organizer
  International Conference on Parallel, Distributed and Network-Based Processing
- Place of Presentation
  フィンランド，ツルク
- Year and Date
  2015-03-04 – 2015-03-06
[Presentation] A Time Optimal Parallel Algorithm for the Dynamic Programming on the Hierarchical Memory Machine2014
- Author(s)
  Koji Nakano
- Organizer
  International Symposium on Computing and Networking
- Place of Presentation
  静岡
- Year and Date
  2014-12-10 – 2014-12-12
[Presentation] Thorough Evaluation of GPU Shared Memory Load and Store Instructions2014
- Author(s)
  Satoshi Okamoto, Yasuaki Ito, Koji Nakano, Jacir L. Bordim
- Organizer
  International Symposium on Computing and Networking
- Place of Presentation
  静岡
- Year and Date
  2014-12-10 – 2014-12-12
[Presentation] Parallel Algorithms for the Summed Area Table on the Asynchronous Hierarchical Memory Machine, with GPU implementations2014
- Author(s)
  Akihiko Kasagi, Koji Nakano, Yasuaki Ito
- Organizer
  International Symposium on Computing and Networking
- Place of Presentation
  米国，ミネアポリス
- Year and Date
  2014-09-09 – 2014-09-12
[Presentation] Random Address Permute Shift Technique for the Shared Memory on GPUs2014
- Author(s)
  Koji Nakano, Susumu Matsumae, Yasuaki Ito
- Organizer
  International Conference on Parallel Processing
- Place of Presentation
  米国，ミネアポリス
- Year and Date
  2014-09-09 – 2014-09-12
[Presentation] A GPU Implementation of Clipping-Free Halftoning using the Direct Binary Search2014
- Author(s)
  Hiroaki Kouge, Yasuaki Ito and Koji Nakano
- Organizer
  International Conference on Algorithms and Architectures for Parallel Processing
- Place of Presentation
  中国，大連
- Year and Date
  2014-08-24 – 2014-08-27
[Presentation] A CUDA C Program Generator for Bulk Execution of a Sequential Algorithm2014
- Author(s)
  Daisuke Takafuji, Koji Nakano and Yasuaki Ito
- Organizer
  International Conference on Algorithms and Architectures for Parallel Processing
- Place of Presentation
  中国，大連
- Year and Date
  2014-08-24 – 2014-08-27
[Presentation] Bulk Execution of Oblivious Algorithms on the Unified Memory Machine, with GPU Implementation2014
- Author(s)
  Kazuya Tani, Daisuke Takafuji, Koji Nakano, Yasuaki Ito
- Organizer
  International Parallel and Distributed Processing Symposium Workshops
- Place of Presentation
  米国，フェニックス
- Year and Date
  2014-05-19 – 2014-05-23

2014 Fiscal Year Annual Research Report

メモリマシンモデル上の並列計算理論の構築と次世代ＧＰＧＰＵアーキテクチャの提案

Principal Investigator

中野 浩嗣 広島大学, 工学(系)研究科(研究院), 教授 (30281075)

Current Status of Research Progress

Reason

Research Products

[Journal Article] An Optimal Implementation of the Approximate String Matching on the Hierarchical Memory Machine, with Performance Evaluation on the GPU2014

Author(s)

Journal Title

DOI

[Journal Article] Offline Permutation on the CUDA-enabled GPU2014

Author(s)

Journal Title

DOI

[Journal Article] Accelerating ant colony optimisation for the travelling salesman problem on the GPU2014

Author(s)

Journal Title

DOI

[Presentation] Optimality of Fundamental Parallel Algorithms on the Hierarchical Memory Machine, with GPU implementation2015

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] A Time Optimal Parallel Algorithm for the Dynamic Programming on the Hierarchical Memory Machine2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Thorough Evaluation of GPU Shared Memory Load and Store Instructions2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Parallel Algorithms for the Summed Area Table on the Asynchronous Hierarchical Memory Machine, with GPU implementations2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Random Address Permute Shift Technique for the Shared Memory on GPUs2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] A GPU Implementation of Clipping-Free Halftoning using the Direct Binary Search2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] A CUDA C Program Generator for Bulk Execution of a Sequential Algorithm2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Bulk Execution of Oblivious Algorithms on the Unified Memory Machine, with GPU Implementation2014

Author(s)

Organizer

Place of Presentation

Year and Date

中野浩嗣広島大学, 工学(系)研究科(研究院), 教授 (30281075)