In this research project, we developed data licalization schemes for multigrain parallel processing, multiprocessor scheduling algorithms considering maximum overlap of inter-processor data transfer and task processing to hide data transfer overhead, and precise machine code scheduling schemes for a "compiler back-end" to eliminate all synchronization instructions from parallelized machine code without deteriorating calculation accuracy. Also, we showed effectiveness of the developed schemes on a multiprocessor architecture simulator and a real supercomputer.
Performance evaluation on a multiprocessor architecture simulator for the data localization schemes using the proposed aligned data decomposition and partial static assignment techniques showed us that the scheme can shorten average execution time of multigrain parallel processing using coarse grain parallelism, loop parallelism, and fine grain parallelism hierarchically by 20%.
Also, it has been confirmed that the overlapping scheduling algorithms to hide data transfer overhead reduce execution time on Fujitsu VPP500 with 4 processors by 15% in average.
futhermore, the development and evaluation of near fine grain parallel processing schemes made clear desirable architectural supports for advanced parallel machine code scheduling.
The above compilation schemes and evalution using architecture simulator and the supercomputer made us clear necessary architectural supports for next generation multiprocessor supercomputers and a future single chip multiprocessor.
These research accomplishment were published as 15 journal or international conference papers, 1 symposium paper with reviews, 5 technical reports and 12 short papers for domestic annual conventions.