Co-Investigator(Kenkyū-buntansha) |
KATAHIRA Masayuki Akita University, School of Medicine, Associate Professor, 医学部, 助教授 (90250860)
KITAJIMA Hiroyuki Tohoku University, Graduate School of Information Science, Research Associate, 大学院・情報科学研究科, 助手 (70311553)
NAKAMURA Tadao Tohoku University, Graduate School of Information Science, Professor, 大学院・情報科学研究科, 教授 (80005454)
SUZUKI Ken-ichi Miyagi National College of Technology, Department of Design and Computer Applications, Lecturer, 情報デザイン学科, 講師 (50300520)
|
Research Abstract |
In this research project, we did a basic design of a graphics hardware architecture for photo-realistic image synthesis. The design is based on the object space parallel processing model that have been proposed by the main investigator of the project. A prototype, named Thunder, was developed as a printed circuit board with a PCI interface, 2 FPGAs, each of which can implement a logic circuit with up to 200K gates and 4 256-MB SDRAMs (total 1GB). We implement the basic function units of Thunder : a 3DDDA unit, an intersection calculation unit (ICU), and a secondary ray generator on the FPGAs, and an object memory on the SDRAMs. The maximum bandwidth between the object memory and function units is 512MB/s. In the design of the Thunder, we especially focus on the optimization of the ICU.We employed the fix-point calculations instead of the floating-point ones to achieve low latency and high throughput of the ICU.To avoid the image quality degradation by fixed-point calculations, we developed a novel fix-point intersection calculation algorithm to keep calculation accuracy as high as possible. Through the experiments, we confirmed that the image quality using our algorithm with fixed-point calculations is comparable to that obtained by 64-bit floating-point calculations. In addition, we discussed the performance scalability in terms of the number of ICUs. The experimental results have shown that speedups of 6.4 in 8 ICUs and 11 in 16 ICUs can be obtained. Especially, in the case of 16 ICUs, running at 400MHz, we estimated that the accelerator is 20 times faster than Pentium-II based image synthesis running at the same clock frequency. The accelerator also needs a memory bandwidth of around 100GB/s. We believe that such a large bandwidth can be available as the CMOS technology proceeds, for example, the memory-logic merged.
|