• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2015 Fiscal Year Final Research Report

A Study on Data-Space Generating Operations on Massive Data Platforms

Research Project

  • PDF
Project/Area Number 24500109
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeMulti-year Fund
Section一般
Research Field Media informatics/Database
Research InstitutionThe University of Electro-Communications

Principal Investigator

OHMORI Tadashi  電気通信大学, その他の研究科, 教授 (30233274)

Project Period (FY) 2012-04-01 – 2016-03-31
Keywords大規模データ処理 / MapReduce / 類似結合 / 多対多関係 / 編集距離 / ハッシュ結合
Outline of Final Research Achievements

Similarity joins on massive datasets are useful operations to detect many-to-many relationship residing in target datasets. However many join algorithms on various similarity functions are known to have unstable performance on map/reduce systems. The objective of this research is to clarify reasons of this unstablity, and to solve it. To do so, the research proposes two new algorithmic frameworks. One is the hybrid-hash join enhanced with bucket-regrouping techniques, named HSJ+BR. It solves unexpected unbalance between reducers without intermediate mapreduce jobs. The other is called two-stage hash-partitioning strategy. It can greatly reduce the shuffle overhead caused by too much record-replication associated with many similarity join algorithms. Using these two frameworks, it is shown that stable and efficient performance of similarity joins on map/reduce systems (where, as typical cases, m-to-n equi-join and edit-distance join are used) is achieved.

Free Research Field

データベース・データ工学

URL: 

Published: 2017-05-10  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi