• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2020 Fiscal Year Research-status Report

Efficient Query Processing for Learning-based Data Management

Research Project

Project/Area Number 19K11979
Research InstitutionOsaka University

Principal Investigator

肖 川  大阪大学, 情報科学研究科, 准教授(常勤) (10643900)

Project Period (FY) 2019-04-01 – 2022-03-31
Keywords問合せ処理 / 機械学習 / データベース / データサイエンス
Outline of Annual Research Achievements

There were two major achievements in FY2020. First, we studied efficient blocking techniques for queries with learning-based predicates. The blocking rules are a conjunction of similarity predicates over high-dimensional data. To efficiently apply the blocking rules, we modeled this as a query optimization problem. We developed a learning-based method that accurately estimates the cardinality of each similarity predicate and chooses the processing order with the smallest cost. Experiments demonstrated the effectiveness and the efficiency of our approach. Our study has been accepted as a full research paper by ACM SIGMOD International Conference on Management of Data (SIGMOD) 2021. Second, we studied the problem of joinable table discovery in data lakes, which is an important task for data enrichment. We proposed to embed textual values as high-dimensional vectors and join column upon similarity predicates on high-dimensional vectors, hence to address the limitations of traditional equi-join approaches and identify more meaningful results. We devised a series of techniques to speed up the discovery process. Our solution identifies substantially more useful results than equi-joins and outperforms other similarity-based options. Its efficiency was also demonstrated through experimental evaluation. Our discovery appeared as a full research paper at IEEE International Conference on Data Engineering (ICDE) 2021.

Current Status of Research Progress
Current Status of Research Progress

1: Research has progressed more than it was originally planned.

Reason

In our plan for FY2020, we planned to finish Task 2 and develop generic blocking techniques for queries with learning-based predicates. We achieved this goal and developed a solution that works for high-dimensional vectors and a variety of similarity functions over high-dimensional data. We published our discoveries at ACM SIGMOD International Conference on Management of Data (SIGMOD) 2021, a top-tier conference in the database area. We also explored the problem of joinable table discovery in data lakes. We targeted the case when textual values are embedded as high-dimensional vectors and columns are joined upon similarity predicates on high-dimensional vector. Our study was published at IEEE International Conference on Data Engineering (ICDE) 2021, a top-tier conference in the database area. Based on the above achievements in FY2020, we believe that the project has been progressing more smoothly than initially planned. In addition, we started the initial work of implementing a prototype system that integrates all our proposed methods in this research period.

Strategy for Future Research Activity

In FY2021, our ongoing work is to further study on the problem of joinable table discovery in data lakes. We will explore in the direction of column embedding. The new approach will be significantly efficient than our proposed one in ICDE 2021 and retain the accuracy. In addition, the new approach can be extended to solve other related problems in data lake management. Another academic goal of FY2021 is to complete Task 3 and work on system prototyping and evaluation. We have already started the initial work of prototype system implementation. The implemented system will integrate all our proposed methods in this research period, and we seek opportunity of releasing our system.

Causes of Carryover

Due to the COVID-19 outbreak, the PI was unable to attend onsite conferences and this resulted in the aforementioned unused amount, which would have been used for travel expense. The PI requests this amount to be carried forward to FY2021, during which period registration for conferences, publication at journals, and purchase of equipment may occur.

  • Research Products

    (14 results)

All 2021 2020 Other

All Int'l Joint Research (2 results) Journal Article (3 results) (of which Peer Reviewed: 3 results,  Open Access: 2 results) Presentation (6 results) (of which Int'l Joint Research: 2 results) Remarks (3 results)

  • [Int'l Joint Research] ニューサウスウェールズ大学/メルボルン大学/シドニー工科大学(オーストラリア)

    • Country Name
      AUSTRALIA
    • Counterpart Institution
      ニューサウスウェールズ大学/メルボルン大学/シドニー工科大学
  • [Int'l Joint Research] 深セン大学(中国)

    • Country Name
      CHINA
    • Counterpart Institution
      深セン大学
  • [Journal Article] Continuous Top-k Spatial-Keyword Search on Dynamic Objects2021

    • Author(s)
      Yuyang Dong, Chuan Xiao, Hanxiong Chen, Jeffrey Xu Yu, Kunihiro Takeoka, Masafumi Oyamada, and Hiroyuki Kitagawa
    • Journal Title

      The VLDB Journal

      Volume: 30 Pages: 141-161

    • DOI

      10.1007/s00778-020-00627-4

    • Peer Reviewed
  • [Journal Article] Similarity Query Processing for High-Dimensional Data2020

    • Author(s)
      Jianbin Qin, Wei Wang, Chuan Xiao, and Ying Zhang
    • Journal Title

      Proceedings of the VLDB Endowment

      Volume: 13 Pages: 3437-3440

    • DOI

      10.14778/3415478.3415564

    • Peer Reviewed / Open Access
  • [Journal Article] Fast Subtrajectory Similarity Search in Road Networks under Weighted Edit Distance Constraints2020

    • Author(s)
      Satoshi Koide, Chuan Xiao, and Yoshiharu Ishikawa
    • Journal Title

      Proceedings of the VLDB Endowment

      Volume: 13 Pages: 2188-2201

    • DOI

      10.14778/3407790.3407818

    • Peer Reviewed / Open Access
  • [Presentation] Consistent and Flexible Selectivity Estimation for High-Dimensional Data2021

    • Author(s)
      Yaoshu Wang, Chuan Xiao, Jianbin Qin, Rui Mao, Makoto Onizuka, Wei Wang, Rui Zhang, and Yoshiharu Ishikawa
    • Organizer
      ACM SIGMOD International Conference on Management of Data (SIGMOD 2021)
    • Int'l Joint Research
  • [Presentation] Efficient Joinable Table Discovery in Data Lakes: A High-Dimensional Similarity-Based Approach2021

    • Author(s)
      Yuyang Dong, Kunihiro Takeoka, Chuan Xiao, and Masafumi Oyamada
    • Organizer
      IEEE International Conference on Data Engineering (ICDE 2021)
    • Int'l Joint Research
  • [Presentation] Non-Autoregressiveモデルによる高速で安定したカーディナリティ推定2021

    • Author(s)
      伊藤竜一, 佐々木勇和, 肖川, 鬼塚真
    • Organizer
      第13回データ工学と情報マネジメントに関するフォーラム (DEIM 2021)
  • [Presentation] FedMe: モデル交換に基づく連合学習手法2021

    • Author(s)
      松田光司, 堀敬三, 佐々木勇和, 肖川, 鬼塚真
    • Organizer
      第13回データ工学と情報マネジメントに関するフォーラム (DEIM 2021)
  • [Presentation] 計算ノートブック類似検索のための高速な検索アルゴリズム2021

    • Author(s)
      堀内美聡, 山崎翔平, 佐々木勇和, 肖川, 鬼塚真
    • Organizer
      第13回データ工学と情報マネジメントに関するフォーラム (DEIM 2021)
  • [Presentation] 深層生成モデルを用いた分子グラフ自動補完2021

    • Author(s)
      胡晟, 瀧川一学, 肖川
    • Organizer
      第13回データ工学と情報マネジメントに関するフォーラム (DEIM 2021)
  • [Remarks] 大阪大学 ビッグデータ工学講座 鬼塚研究室

    • URL

      http://www-bigdata.ist.osaka-u.ac.jp/ja/paper/

  • [Remarks] 名古屋大学 情報学研究科 データベース研究室(石川研究室)

    • URL

      https://www.db.is.i.nagoya-u.ac.jp/ja/research/publications/

  • [Remarks] Chuan Xiao's homepage

    • URL

      https://sites.google.com/site/chuanxiao1983/publication

URL: 

Published: 2021-12-27  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi