1985 Fiscal Year Final Research Report Summary

Development of Parallel Relational Database Machine Using Hash and Sort

Research Project

Project/Area Number	59880004
Research Category	Grant-in-Aid for Developmental Scientific Research
Allocation Type	Single-year Grants
Research Field	Informatics
Research Institution	The University of Tokyo
Principal Investigator	TANAKA Hidehiko 東京大学, 工, 助教授 (60011102)
Project Period (FY)	1984 – 1985
Keywords	Database System / Database Machine / Hash / Sort / Multidimensional Clustering / Hardware Sorter / マルチプロセッサ
Research Abstract	The objective of this research was to develop a parallel relational database machine which can keep the high performance even for congested accesses to the large database. The machine design is based on the data stream model which enables us to execute operations along the stream of data. Together with the following results, we believe that our objective was fully achieved. The results can be summarized from the view point of data stream processing, generation, and control: 1. (data stream processing) To implement the data stream model, we need the linear time algorithms for every relational algebra operation. Our approach is based on the clustering effect of hash and linear time sorting of data, the latter of which is realized by the hardware sorter. The prototype sorter (the second version) can sort data at the rate of 3MB/sec, and exposes no limitation on the length of record, the number of records, and the number of data streams to be sorted. 2. (data stream generation) In our design, the performance is dominated by the initial generation of data streams from disks. We developed the adaptive multidimensional clustering technique and evaluated its detailed performance. It was shown that the technique can reduce the average number of page accesses largely, hence can generate the initial data streams quite efficiently. 3. (data stream control) In our machine, the data streams are made so autonomous that they can flow through the machine without any centralized control. As a result, the controller of the machine is much less burdened than others: it is invoked only when the data streaming is to start or end. Our design utilizes the data flow control to determine the order of operation executions. Its performance evaluation shows that the control overhead is very little and that it is easy to implement.