A Study on Data-Space Generating Operations on Massive Data Platforms

Research Project

Project/Area Number	24500109
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Research Field	Media informatics/Database
Research Institution	The University of Electro-Communications
Principal Investigator	OHMORI Tadashi 電気通信大学, その他の研究科, 教授 (30233274)
Project Period (FY)	2012-04-01 – 2016-03-31
Project Status	Completed (Fiscal Year 2015)
Budget Amount *help	¥2,990,000 (Direct Cost: ¥2,300,000、Indirect Cost: ¥690,000) Fiscal Year 2014: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000) Fiscal Year 2013: ¥910,000 (Direct Cost: ¥700,000、Indirect Cost: ¥210,000) Fiscal Year 2012: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Keywords	大規模データ処理 / MapReduce / 類似結合 / 多対多関係 / 編集距離 / ハッシュ結合 / 巨大データ処理 / 2段階ハッシュ分割 / 編集距離結合 / mapreduce / 結合演算 / map/reduce
Outline of Final Research Achievements	Similarity joins on massive datasets are useful operations to detect many-to-many relationship residing in target datasets. However many join algorithms on various similarity functions are known to have unstable performance on map/reduce systems. The objective of this research is to clarify reasons of this unstablity, and to solve it. To do so, the research proposes two new algorithmic frameworks. One is the hybrid-hash join enhanced with bucket-regrouping techniques, named HSJ+BR. It solves unexpected unbalance between reducers without intermediate mapreduce jobs. The other is called two-stage hash-partitioning strategy. It can greatly reduce the shuffle overhead caused by too much record-replication associated with many similarity join algorithms. Using these two frameworks, it is shown that stable and efficient performance of similarity joins on map/reduce systems (where, as typical cases, m-to-n equi-join and edit-distance join are used) is achieved.

Report

(5 results)

2015 Annual Research Report Final Research Report ( PDF )
2014 Research-status Report
2013 Research-status Report
2012 Research-status Report

Research Products
(7 results)

All 2015 2014 2013

All Journal Article (1 results) (of which Peer Reviewed: 1 results) Presentation (6 results)

[Journal Article] Map/Reduceにおけるバケット再グループ化を用いたハイブリッドハッシュ結合アルゴリズム2014
- Author(s)
  廣瀬繁雄, 大森匡, 新谷隆彦
- Journal Title
  
  日本データベース学会論文誌
  
  Volume: Vol.12 (No.1 Pages: 61-66
- NAID
  40019725764
- Related Report
  2013 Research-status Report
- Peer Reviewed
[Presentation] MapReduce上の編集距離結合における2段階ハッシュ分割技法の効果2015
- Author(s)
  大森匡，今野篤人，新谷隆彦
- Organizer
  FIT情報技術フォーラム2015　Ｄ－０４５
- Place of Presentation
  愛媛大学
- Year and Date
  2015-09-15
- Related Report
  2015 Annual Research Report
[Presentation] MapReduceにおける編集距離結合の負荷分散・効率化方式2015
- Author(s)
  今野篤人，大森匡，新谷隆彦
- Organizer
  データ工学と情報マネジメントに関するフォーラムDEIM 2015, E7-6
- Place of Presentation
  郡山市（福島県）
- Year and Date
  2015-03-02 – 2015-03-04
- Related Report
  2014 Research-status Report
[Presentation] 空間データベースにおけるm-最近接キーワード検索の一方式2014
- Author(s)
  邱原, 大森匡, 新谷隆彦, 藤田秀之
- Organizer
  情報処理学会第76回全国大会 (5M-1)
- Place of Presentation
  東京電機大学
- Related Report
  2013 Research-status Report
[Presentation] An evaluation of m-Closest Keyword Search on spatial data using Flickr data2014
- Author(s)
  DANG H. Anh, Qiu Yuan, OHMORI Tadashi, FUJITA Hideyuki
- Organizer
  情報処理学会第76回全国大会 (5M-2)
- Place of Presentation
  東京電機大学
- Related Report
  2013 Research-status Report
[Presentation] Map/Reduceにおけるバケット再グループ化を用いたハイブリッドハッシュ結合アルゴリズム2013
- Author(s)
  廣瀬繁雄, 大森匡，新谷隆彦
- Organizer
  DEIM(データ工学と情報管理フォーラム) 2013，　Ｆ２－４　(7ページ）
- Place of Presentation
  郡山
- Related Report
  2012 Research-status Report
[Presentation] 空間データにおける2^n分割木を用いたm-最近傍キーワード検索2013
- Author(s)
  邱原，大森匡，新谷隆彦
- Organizer
  DEIM 2013, A9-5　(7ページ）
- Place of Presentation
  郡山
- Related Report
  2012 Research-status Report

A Study on Data-Space Generating Operations on Massive Data Platforms

Principal Investigator

OHMORI Tadashi 電気通信大学, その他の研究科, 教授 (30233274)

¥2,990,000 (Direct Cost: ¥2,300,000、Indirect Cost: ¥690,000)

Report

Research Products

[Journal Article] Map/Reduceにおけるバケット再グループ化を用いたハイブリッドハッシュ結合アルゴリズム2014

Author(s)

Journal Title

NAID

Related Report

[Presentation] MapReduce上の編集距離結合における2段階ハッシュ分割技法の効果2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] MapReduceにおける編集距離結合の負荷分散・効率化方式2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 空間データベースにおけるm-最近接キーワード検索の一方式2014

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] An evaluation of m-Closest Keyword Search on spatial data using Flickr data2014

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] Map/Reduceにおけるバケット再グループ化を用いたハイブリッドハッシュ結合アルゴリズム2013

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] 空間データにおける2^n分割木を用いたm-最近傍キーワード検索2013

Author(s)

Organizer

Place of Presentation

Related Report