2014 年度実績報告書

メモリマシンモデル上の並列計算理論の構築と次世代ＧＰＧＰＵアーキテクチャの提案

研究課題

研究課題/領域番号	26280002
研究機関	広島大学
研究代表者	中野浩嗣広島大学, 工学(系)研究科(研究院), 教授 (30281075)
研究分担者	高藤大介広島大学, 工学(系)研究科(研究院), 助教 (00314732) 伊藤靖朗広島大学, 工学(系)研究科(研究院), 准教授 (40397964)
研究期間 (年度)	2014-04-01 – 2019-03-31
キーワード	並列処理 / 理論計算モデル / 並列アルゴリズム
研究実績の概要	GPUは本来グラフィックス処理のための補助演算用のLSI であるが，これをグラフィクス以外の汎用計算に利用する技術GPGPUが注目されており，さまざまな研究開発が行なわれている．しかし，そのアーキテクチャは複雑であり，単純な共有メモリを想定しPRAM 向けに最適化された並列アルゴリズムをそのままGPUに実装しても，十分な性能を得ることができない．本研究の目的はGPUの本質をとらえた並列計算の理論モデルを構築し，理論的・解析的に性能評価をおこなうことである．そこで，３つの理論計算モデルDMM（Discrete Memory Machine），UMM（Unified Memory Machine），HMM(Hierarchical Memory Machine)の３つを提案した．これらはGPUのメモアクセスに注目した並列計算のろりんモデルである．DMMはGPUのシェアードメモリ，UMMはGPUのグローバルメモリのアクセスについてモデル化してものであり，HMMはそれを階層的に接続したGPUのアーキテクチャを反映したものである．これらのモデル上で，基本的な行列計算の並列アルゴリズムを示し，またその最適性の証明を行った．また，GPUに実装し，理論的解析と実際の性能がほぼ一致することを示した．さらには，Summed Area Tableの計算や動的計画法のアルゴリズムを示し，理論モデルで解析するとともに，GPUに実装しその比較を行った．
現在までの達成度 (区分)	現在までの達成度 (区分) 2: おおむね順調に進展している理由 GPUの理論計算モデルを示し，その上での基本的なアルゴリズムの理論解析と実装による評価に成功しており，順調に研究が進んでいると言える．
今後の研究の推進方策	より複雑なアルゴリズムについて理論モデルの検証を行う．また，占有率などのモデルに加味されていない要素を取り入れた理論モデルを検討する．
次年度使用額が生じた理由	購入予定のGPUが予定より安くなったため．
次年度使用額の使用計画	今年度に購入するGPUの一部に充当したい．

研究成果
(11件)

すべて 2015 2014

すべて雑誌論文 (3件) (うち査読あり 3件) 学会発表 (8件)

[雑誌論文] An Optimal Implementation of the Approximate String Matching on the Hierarchical Memory Machine, with Performance Evaluation on the GPU2014
- 著者名/発表者名
  Duhu MAN, Koji NAKANO, Yasuaki ITO
- 雑誌名
  
  IEICE TRANSACTIONS on Information and Systems
  
  巻: E97-D ページ: 3063-3071
- DOI
  http://doi.org/10.1587/transinf.2014PAP0011
- 査読あり
[雑誌論文] Offline Permutation on the CUDA-enabled GPU2014
- 著者名/発表者名
  Akihiko KASAGI, Koji NAKANO, Yasuaki ITO
- 雑誌名
  
  IEICE TRANSACTIONS on Information and Systems
  
  巻: E97-D ページ: 3052-3062
- DOI
  http://doi.org/10.1587/transinf.2014PAP0010
- 査読あり
[雑誌論文] Accelerating ant colony optimisation for the travelling salesman problem on the GPU2014
- 著者名/発表者名
  Akihiro Uchida, Yasuaki Ito, Koji Nakano
- 雑誌名
  
  International Journal of Parallel, Emergent and Distributed Systems
  
  巻: 29 ページ: 401-420
- DOI
  http://doi.org/10.1080/17445760.2013.842568
- 査読あり
[学会発表] Optimality of Fundamental Parallel Algorithms on the Hierarchical Memory Machine, with GPU implementation2015
- 著者名/発表者名
  Koji Nakano and Yasuaki Ito
- 学会等名
  International Conference on Parallel, Distributed and Network-Based Processing
- 発表場所
  フィンランド，ツルク
- 年月日
  2015-03-04 – 2015-03-06
[学会発表] A Time Optimal Parallel Algorithm for the Dynamic Programming on the Hierarchical Memory Machine2014
- 著者名/発表者名
  Koji Nakano
- 学会等名
  International Symposium on Computing and Networking
- 発表場所
  静岡
- 年月日
  2014-12-10 – 2014-12-12
[学会発表] Thorough Evaluation of GPU Shared Memory Load and Store Instructions2014
- 著者名/発表者名
  Satoshi Okamoto, Yasuaki Ito, Koji Nakano, Jacir L. Bordim
- 学会等名
  International Symposium on Computing and Networking
- 発表場所
  静岡
- 年月日
  2014-12-10 – 2014-12-12
[学会発表] Parallel Algorithms for the Summed Area Table on the Asynchronous Hierarchical Memory Machine, with GPU implementations2014
- 著者名/発表者名
  Akihiko Kasagi, Koji Nakano, Yasuaki Ito
- 学会等名
  International Symposium on Computing and Networking
- 発表場所
  米国，ミネアポリス
- 年月日
  2014-09-09 – 2014-09-12
[学会発表] Random Address Permute Shift Technique for the Shared Memory on GPUs2014
- 著者名/発表者名
  Koji Nakano, Susumu Matsumae, Yasuaki Ito
- 学会等名
  International Conference on Parallel Processing
- 発表場所
  米国，ミネアポリス
- 年月日
  2014-09-09 – 2014-09-12
[学会発表] A GPU Implementation of Clipping-Free Halftoning using the Direct Binary Search2014
- 著者名/発表者名
  Hiroaki Kouge, Yasuaki Ito and Koji Nakano
- 学会等名
  International Conference on Algorithms and Architectures for Parallel Processing
- 発表場所
  中国，大連
- 年月日
  2014-08-24 – 2014-08-27
[学会発表] A CUDA C Program Generator for Bulk Execution of a Sequential Algorithm2014
- 著者名/発表者名
  Daisuke Takafuji, Koji Nakano and Yasuaki Ito
- 学会等名
  International Conference on Algorithms and Architectures for Parallel Processing
- 発表場所
  中国，大連
- 年月日
  2014-08-24 – 2014-08-27
[学会発表] Bulk Execution of Oblivious Algorithms on the Unified Memory Machine, with GPU Implementation2014
- 著者名/発表者名
  Kazuya Tani, Daisuke Takafuji, Koji Nakano, Yasuaki Ito
- 学会等名
  International Parallel and Distributed Processing Symposium Workshops
- 発表場所
  米国，フェニックス
- 年月日
  2014-05-19 – 2014-05-23

2014 年度 実績報告書

メモリマシンモデル上の並列計算理論の構築と次世代ＧＰＧＰＵアーキテクチャの提案

研究代表者

中野 浩嗣 広島大学, 工学(系)研究科(研究院), 教授 (30281075)

現在までの達成度 (区分)

理由

研究成果

[雑誌論文] An Optimal Implementation of the Approximate String Matching on the Hierarchical Memory Machine, with Performance Evaluation on the GPU2014

著者名/発表者名

雑誌名

DOI

[雑誌論文] Offline Permutation on the CUDA-enabled GPU2014

著者名/発表者名

雑誌名

DOI

[雑誌論文] Accelerating ant colony optimisation for the travelling salesman problem on the GPU2014

著者名/発表者名

雑誌名

DOI

[学会発表] Optimality of Fundamental Parallel Algorithms on the Hierarchical Memory Machine, with GPU implementation2015

著者名/発表者名

学会等名

発表場所

年月日

[学会発表] A Time Optimal Parallel Algorithm for the Dynamic Programming on the Hierarchical Memory Machine2014

著者名/発表者名

学会等名

発表場所

年月日

[学会発表] Thorough Evaluation of GPU Shared Memory Load and Store Instructions2014

著者名/発表者名

学会等名

発表場所

年月日

[学会発表] Parallel Algorithms for the Summed Area Table on the Asynchronous Hierarchical Memory Machine, with GPU implementations2014

著者名/発表者名

学会等名

発表場所

年月日

[学会発表] Random Address Permute Shift Technique for the Shared Memory on GPUs2014

著者名/発表者名

学会等名

発表場所

年月日

[学会発表] A GPU Implementation of Clipping-Free Halftoning using the Direct Binary Search2014

著者名/発表者名

学会等名

発表場所

年月日

[学会発表] A CUDA C Program Generator for Bulk Execution of a Sequential Algorithm2014

著者名/発表者名

学会等名

発表場所

年月日

[学会発表] Bulk Execution of Oblivious Algorithms on the Unified Memory Machine, with GPU Implementation2014

著者名/発表者名

学会等名

発表場所

年月日

2014 年度実績報告書

中野浩嗣広島大学, 工学(系)研究科(研究院), 教授 (30281075)