2017 Fiscal Year Final Research Report

Preliminary study of an inexpensive implementation methodology for an all-sky oriented astronomical data archive system powered by Hadoop for huge observational multi-wavelength data set

Research Project

PDF

Project/Area Number	15K17501
Research Category	Grant-in-Aid for Young Scientists (B)
Allocation Type	Multi-year Fund
Research Field	Computational science
Research Institution	Fukuoka University
Principal Investigator	Eguchi Satoshi 福岡大学, 理学部, 助教 (40647202)
Project Period (FY)	2015-04-01 – 2018-03-31
Keywords	バーチャル天文台 / ビッグデータ / 分散コンピューティング / クラウドコンピューティング / Hadoop / Hive
Outline of Final Research Achievements	The size of astronomical observational data has been exponentially inflating due to the hugeness and complexity of modern telescopes and their instruments. Public cloud computing would be an attractive solution for the data explosion to astronomers without sufficient financial support thanks to its flexibility and inexpensiveness. To this end, I investigated the feasibility through the implementation of a simple astronomical database running on Hadoop, a software framework for distributed computing, and Hive, Hadoop-based database software with an SQL-like query language. I found that we should (1) choose clouds enabling the users to get arbitrary computing resources depending on the complexity of problems at the time, instead of virtual private servers providing limited and fixed resources at a lower annual cost, (2) adopt the Tez engine and the ORC file format for the Hive database, (3) partition the Hive table based on the level 6 HEALPix labeling.
Free Research Field	データベース天文学