• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

High Performance Data Processing System for Ad-hoc Data

Research Project

Project/Area Number 16H01715
Research Category

Grant-in-Aid for Scientific Research (A)

Allocation TypeSingle-year Grants
Section一般
Research Field Software
Research InstitutionThe University of Tokyo

Principal Investigator

Taura Kenjiro  東京大学, 大学院情報理工学系研究科, 教授 (90282714)

Project Period (FY) 2016-04-01 – 2021-03-31
Project Status Completed (Fiscal Year 2021)
Budget Amount *help
¥38,870,000 (Direct Cost: ¥29,900,000、Indirect Cost: ¥8,970,000)
Fiscal Year 2020: ¥9,360,000 (Direct Cost: ¥7,200,000、Indirect Cost: ¥2,160,000)
Fiscal Year 2019: ¥8,970,000 (Direct Cost: ¥6,900,000、Indirect Cost: ¥2,070,000)
Fiscal Year 2018: ¥7,020,000 (Direct Cost: ¥5,400,000、Indirect Cost: ¥1,620,000)
Fiscal Year 2017: ¥7,410,000 (Direct Cost: ¥5,700,000、Indirect Cost: ¥1,710,000)
Fiscal Year 2016: ¥6,110,000 (Direct Cost: ¥4,700,000、Indirect Cost: ¥1,410,000)
Keywords大規模データ処理 / ad-hocデータ処理 / 字句解析 / 構文解析 / データ抽出 / ad‐hocデータ処理 / 演算子優先度文法 / 構文解析器生成系 / 字句解析器生成系 / ad‐hocデータ / 構文解析器生成器 / 並列構文解析 / スキャナレス構文解析 / LL(*)文法 / 構文解析生成系 / ad-hocデータ / LL(*) / 正規表現 / 高性能計算 / 大規模データ / ストリーム処理 / 解析表現文法 / ニューラル機械翻訳 / 再帰型ニューラルネットワーク / 自然言語処理 / 文字列処理 / SIMD命令
Outline of Final Research Achievements

Toward the goal of high performance text processing using parallelization and vectorization, we studied lexer (or parser) generators that generate parallelized/vectorized lexers (or parsers) from regular expressions or context free grammars. We investigate an approach that vectorizes scannerless parser and an approach that parallelizes both lexers and locally parsable (thus relatively simple-to-parallelize) parsers.

Academic Significance and Societal Importance of the Research Achievements

データ活用はSociety 5.0の要諦である. 多くの利用可能なデータはテキスト形式で保存されている(XML, JSONなど標準的な形式のものもあれば, 決まった形式のないものもある). 文字列に対するデータ処理の一番はじめの段階が字句解析または構文解析と呼ばれる, 一種のパターンマッチング処理である. 本研究はそれらを容易に, かつ高速に処理することを目指したもので, 社会で利用可能なビッグデータの増大に対して有用な貢献を果たしうる研究である.

Report

(6 results)
  • 2021 Final Research Report ( PDF )
  • 2020 Annual Research Report
  • 2019 Annual Research Report
  • 2018 Annual Research Report
  • 2017 Annual Research Report
  • 2016 Annual Research Report
  • Research Products

    (19 results)

All 2022 2021 2019 2018 2017 2016

All Journal Article (16 results) (of which Int'l Joint Research: 3 results,  Peer Reviewed: 15 results,  Open Access: 1 results,  Acknowledgement Compliant: 6 results) Presentation (3 results) (of which Int'l Joint Research: 1 results,  Invited: 1 results)

  • [Journal Article] Plex: Scaling Parallel Lexing with Backtrack-Free Prescanning2021

    • Author(s)
      Le Li, Shigeyuki Sato, Qiheng Liu, Kenjiro Taura
    • Journal Title

      2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

      Volume: 会議予稿集のためなし Pages: 693-702

    • DOI

      10.1109/ipdps49936.2021.00079

    • Related Report
      2020 Annual Research Report
    • Peer Reviewed
  • [Journal Article] LL(*) 文法に基づくスキャナレス構文解析器の提案2018

    • Author(s)
      井原 央翔, 佐藤 重幸, 田浦 健次朗
    • Journal Title

      Cross-disciplinary workshop on computing Systems, Infrastructures, and programming (xSIG) 2018

      Volume: -

    • Related Report
      2018 Annual Research Report
    • Peer Reviewed
  • [Journal Article] OPG を利用したアドホックな並列データ処理系2018

    • Author(s)
      リュウ ケイコウ, 井原 央翔, 田浦 健次朗
    • Journal Title

      情報処理学会論文誌プログラミング(PRO)

      Volume: 12 Pages: 1-8

    • NAID

      170000150059

    • Related Report
      2018 Annual Research Report
  • [Journal Article] LL(*) 文法に基づくスキャナレス構文解析器の提案2018

    • Author(s)
      井原 央翔, 佐藤 重幸, 田浦 健次朗
    • Journal Title

      xSIG 2018 workshop

      Volume: -

    • Related Report
      2017 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Parallelized software offloading of low-level communication with user-level threads2018

    • Author(s)
      Wataru Endo and Kenjiro Taura
    • Journal Title

      Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region

      Volume: - Pages: 289-298

    • DOI

      10.1145/3149457.3149475

    • Related Report
      2017 Annual Research Report
    • Peer Reviewed
  • [Journal Article] 低レイテンシ SSD をメモリ拡張として利用したときの性能評価2018

    • Author(s)
      中澤 弘樹, 田浦 健次朗
    • Journal Title

      xSIG 2018 workshop

      Volume: -

    • Related Report
      2017 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Cache friendly parallelization of neural encoder-decoder models without padding on multi-core architecture.2017

    • Author(s)
      Yuchen Qiao, Kazuma Hashimoto, Akiko Eriguchi, Haixia Wang, Dongsheng Wang, Yoshimasa Tsuruoka, and Kenjiro Taura.
    • Journal Title

      The 6th International Workshop on Parallel and Distributed Computing for Large Scale Machine Learning and Big Data Analytics

      Volume: - Pages: 437-440

    • DOI

      10.1109/ipdpsw.2017.165

    • Related Report
      2017 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Neural Machine Translation with Source-Side Latent Graph Parsing2017

    • Author(s)
      Kazuma Hashimoto and Yoshimasa Tsuruoka
    • Journal Title

      Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017)

      Volume: - Pages: 125-135

    • DOI

      10.18653/v1/d17-1012

    • Related Report
      2017 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Learning to Parse and Translate Improves Neural Machine Translation2017

    • Author(s)
      Akiko Eriguchi, Yoshimasa Tsuruoka, and Kyunghyun Cho
    • Journal Title

      Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL2017)

      Volume: - Pages: 72-78

    • DOI

      10.18653/v1/p17-2012

    • Related Report
      2017 Annual Research Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] Cache Friendly Parallel Encoder-Decoder Model without Padding on Mulit-core Architecture2017

    • Author(s)
      Yuchen Qiao, Kenjiro Taura, Kazuma Hashimoto, Yoshimasa Tsuruoka and Akkiko Eriguchi
    • Journal Title

      Proceedings of The 6th International Workshop on Parallel and Distributed Computing for Large Scale Machine Learning and Big Data Analytics

      Volume: -

    • Related Report
      2016 Annual Research Report
    • Peer Reviewed / Int'l Joint Research / Acknowledgement Compliant
  • [Journal Article] Low latency and resource-aware program composition for large-scale data analysis2016

    • Author(s)
      Masahiro Tanaka, Kenjiro Taura, and Kentaro Torisawa
    • Journal Title

      16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

      Volume: - Pages: 325-330

    • DOI

      10.1109/ccgrid.2016.88

    • Related Report
      2016 Annual Research Report
    • Peer Reviewed
  • [Journal Article] A static cut-off for task parallel programs2016

    • Author(s)
      Shintaro Iwasaki, Kenjiro Taura
    • Journal Title

      Proceedings of the 2016 International Conference on Parallel Architectures and Compilation

      Volume: - Pages: 139-150

    • DOI

      10.1145/2967938.2967968

    • Related Report
      2016 Annual Research Report
    • Peer Reviewed / Acknowledgement Compliant
  • [Journal Article] Fragmented BWT: An Extended BWT for Full-Text Indexing2016

    • Author(s)
      Masaru Ito, Hiroshi Inoue, and Kenjiro Taura
    • Journal Title

      International Symposium on String Processing and Information Retrieval

      Volume: - Pages: 97-109

    • DOI

      10.1007/978-3-319-46049-9_10

    • ISBN
      9783319460482, 9783319460499
    • Related Report
      2016 Annual Research Report
    • Peer Reviewed / Acknowledgement Compliant
  • [Journal Article] Autotuning of a Cut-Off for Task Parallel Programs2016

    • Author(s)
      Shintaro Iwasaki, Kenjiro Taura
    • Journal Title

      IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

      Volume: - Pages: 353-360

    • DOI

      10.1109/mcsoc.2016.51

    • Related Report
      2016 Annual Research Report
    • Peer Reviewed / Acknowledgement Compliant
  • [Journal Article] Domain Adaptation and Attention-Based Unknown Word Replacement in Chinese-to-Japanese Neural Machine Translation2016

    • Author(s)
      Kazuma Hashimoto, Akiko Eriguchi, and Yoshimasa Tsuruoka
    • Journal Title

      the 3rd Workshop on Asian Translation (WAT2016)

      Volume: - Pages: 75-83

    • Related Report
      2016 Annual Research Report
    • Peer Reviewed / Acknowledgement Compliant
  • [Journal Article] Character-based Decoding in Tree-to-Sequence Attention-based Neural Machine Translation2016

    • Author(s)
      Akiko Eriguchi, Kazuma Hashimoto, and Yoshimasa Tsuruoka
    • Journal Title

      the 3rd Workshop on Asian Translation (WAT2016)

      Volume: - Pages: 175-183

    • Related Report
      2016 Annual Research Report
    • Peer Reviewed / Acknowledgement Compliant
  • [Presentation] Simultaneous Finite Automaton の部分構成による並列正規表現マッチ2022

    • Author(s)
      高品 剛大, 佐藤 重幸, 田浦 健次朗
    • Organizer
      第24回プログラミングおよびプログラミング言語ワークショップ(PPL 2022)
    • Related Report
      2020 Annual Research Report
  • [Presentation] OPGを使ったアドホックな大規模な文字列データ解析のための並列処理系2019

    • Author(s)
      リュウ ケイコウ
    • Organizer
      情報処理学会論文誌プログラミング(PRO)
    • Related Report
      2019 Annual Research Report
  • [Presentation] A Quest for Unified, Global View Parallel Programming Models for Our Future2016

    • Author(s)
      Kenjiro Taura
    • Organizer
      A Quest for Unified, Global View Parallel Programming Models for Our Future
    • Place of Presentation
      Kyoto
    • Related Report
      2016 Annual Research Report
    • Int'l Joint Research / Invited

URL: 

Published: 2016-04-21   Modified: 2023-03-16  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi