Thread Partitioning and Speculative Execution for On-Chip Multiprocessor
Project/Area Number |
15500036
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Computer system/Network
|
Research Institution | Nagoya University |
Principal Investigator |
SHIMADA Toshio Nagoya University, Graduate School of Engineering, Professor, 大学院・工学研究科, 教授 (60252251)
|
Project Period (FY) |
2003 – 2005
|
Project Status |
Completed (Fiscal Year 2005)
|
Budget Amount *help |
¥3,700,000 (Direct Cost: ¥3,700,000)
Fiscal Year 2005: ¥800,000 (Direct Cost: ¥800,000)
Fiscal Year 2004: ¥800,000 (Direct Cost: ¥800,000)
Fiscal Year 2003: ¥2,100,000 (Direct Cost: ¥2,100,000)
|
Keywords | Chip Multiprocessor / Thread Partitioning / Value Prediction / Compiler / Multi-Threading / Speculative Execution / ハードウェア制約 |
Research Abstract |
On-chip multiprocessors can reduce the overhead related to inter-thread communication, and exploit thread-level parallelism in addition to instruction-level parallelism. This study focuses on these advantages to explore techniques that improve performance in non-numerical programs where only fine-grain threads are available. In this study, we propose thread partitioning techniques and speculative execution techniques for on-chip multiprocessors. Our contributions are as follows. 1. We introduce value prediction to mitigate the constraint of control and-data dependences between threads and evaluate it. Our evaluation results show that the speculative thread execution achieves performance improvements by 12.7% in SPECint2000 over conventional non-speculative thread execution. 2. We propose a scheme that reduces the required number of physical registers by sharing physical registers among threads and enables non-blocking communication on registers. Our evaluation results show that multithreaded execution with our scheme achieves higher performance than the single-threaded execution with 130 physical registers, and reduces the number of physical registers by 50%. 3. We propose a two-step physical register deallocation scheme that exploits potential ILP within a single-thread by suppressing the occurrence of stalls caused by physical register shortage. Our evaluation results show that this scheme achieves significant speedups of 32% on average in the typical case of 64 physical registers. 4. We explore what techniques are required to extract large amounts of TLP in programs. Our evaluation results show that the combination of speculative thread execution, speculative register communication, and basic block-level partitioning achieves a ten-times speedup than a single-thread.
|
Report
(4 results)
Research Products
(26 results)