• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Research on Integrated Structural Parsing from Scientific Literature

Research Project

Project/Area Number 18K18109
Research Category

Grant-in-Aid for Early-Career Scientists

Allocation TypeMulti-year Fund
Review Section Basic Section 61030:Intelligent informatics-related
Research InstitutionNara Institute of Science and Technology

Principal Investigator

Hiroyuki Shindo  奈良先端科学技術大学院大学, データ駆動型サイエンス創造センター, 特任准教授 (20734784)

Project Period (FY) 2018-04-01 – 2022-03-31
Project Status Completed (Fiscal Year 2021)
Budget Amount *help
¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000)
Fiscal Year 2020: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2019: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2018: ¥1,820,000 (Direct Cost: ¥1,400,000、Indirect Cost: ¥420,000)
Keywords論文解析 / 自然言語処理 / 構文解析 / オブジェクト検出 / 構造解析 / PDF / XML / 知識獲得 / 情報抽出 / 科学技術論文 / 意味解析 / 関係抽出
Outline of Final Research Achievements

The number of published scientific papers is increasing at an accelerating rate, and it is difficult for individuals to search and read all the necessary papers.
In this study, we developed a model to automatically detect the objects such as figures and tables, and analyze the structure of the text and tables in a paper to convert them into structured formats such as XML. Our integrated parser mainly targets materials science literature, using image processing to detect the regions of figures and tables, and natural language processing to analyze the structures of text and tables. In addition, we developed resources for training and evaluating our model such as datasets for the region of tables and figures, as well as the structure of the text and tables in a paper.

Academic Significance and Societal Importance of the Research Achievements

本研究により,PDF形式の論文データを入力として,図表,数式,段落などのオブジェクトを抽出することや,表の内部構造(ヘッダや行列)を取得することができるようになった.そのため,ある分野における論文の実験データを網羅的に収集することや,図表に記述されている情報の細かい分析や検索が可能になると考えられる.また,本技術を用いて様々な分野の論文を構造化して知識データベースを構築し,ユーザーが閲覧できるようなサービスの実現も可能となる.

Report

(5 results)
  • 2021 Annual Research Report   Final Research Report ( PDF )
  • 2020 Research-status Report
  • 2019 Research-status Report
  • 2018 Research-status Report
  • Research Products

    (21 results)

All 2021 2020 2019 2018

All Journal Article (5 results) (of which Peer Reviewed: 5 results,  Open Access: 2 results) Presentation (16 results) (of which Int'l Joint Research: 14 results)

  • [Journal Article] Machine extraction of polymer data from tables using XML versions of scientific articles2021

    • Author(s)
      Hiroyuki Oka, Atsushi Yoshizawa, Hiroyuki Shindo, Yuji Matsumoto, Masashi Ishii
    • Journal Title

      SCIENCE AND TECHNOLOGY OF ADVANCED MATERIALS: METHODS

      Volume: 1 Issue: 1 Pages: 12-23

    • DOI

      10.1080/27660400.2021.1899456

    • Related Report
      2021 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Construction and Analysis of Multiword Expression-aware Dependency Corpus2019

    • Author(s)
      Kato Akihiko、Shindo Hiroyuki、Matsumoto Yuji
    • Journal Title

      Journal of Natural Language Processing

      Volume: 26 Issue: 4 Pages: 663-688

    • DOI

      10.5715/jnlp.26.663

    • NAID

      130007808657

    • ISSN
      1340-7619, 2185-8314
    • Year and Date
      2019-12-15
    • Related Report
      2019 Research-status Report
    • Peer Reviewed
  • [Journal Article] Development of a computer-assisted Japanese functional expression learning system for Chinese-speaking learners2019

    • Author(s)
      Liu, J., Shindo, H. and Matsumoto, Y
    • Journal Title

      Educational Technology Research and Development

      Volume: 67 Issue: 5 Pages: 1307-1331

    • DOI

      10.1007/s11423-019-09669-0

    • Related Report
      2019 Research-status Report
    • Peer Reviewed
  • [Journal Article] Autoencoder for Semisupervised Multiple Emotion Detection of Conversation Transcripts2018

    • Author(s)
      Phan Duc-Anh、Matsumoto Yuji、Shindo Hiroyuki
    • Journal Title

      IEEE Transactions on Affective Computing

      Volume: 1 Issue: 3 Pages: 1-11

    • DOI

      10.1109/taffc.2018.2885304

    • Related Report
      2018 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Similarity and Replaceability Feature Representations of Word Sequences for Identifying Coordination Boundaries2018

    • Author(s)
      Teranishi Hiroki、Shindo Hiroyuki、Matsumoto Yuji
    • Journal Title

      Journal of Natural Language Processing

      Volume: 25 Issue: 4 Pages: 441-462

    • DOI

      10.5715/jnlp.25.441

    • NAID

      130007531010

    • Related Report
      2018 Research-status Report
    • Peer Reviewed
  • [Presentation] A Generative Approach for End-to-End Relation Extraction2021

    • Author(s)
      Shanshan Liu, Tatsuya Ishigaki, Yui Uehara, Hiroya Takamura, Chowdhury Mohammad Mahir Asef, Mutsunori Uenuma, Hiroyuki Shindo, Yuji Matsumoto
    • Organizer
      Fifth International Workshop on Scientific Document Analysis
    • Related Report
      2021 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Automatic extraction of text data of synthesis process for papers on bulk thermoelectric materials2021

    • Author(s)
      Mohammad Mahir Asef Chowdhury, Mutsunori Uenuma, Shanshan Liu, Hiroyuki Shindo, Yuji Matsumoto, Yukiharu Uraoka
    • Organizer
      Virtual Conference on Thermoelectrics
    • Related Report
      2021 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Nested Named Entity Recognition via Explicitly Excluding the Influence of the Best Path2021

    • Author(s)
      Yiran Wang, Hiroyuki Shindo, Yuji Matsumoto, Taro Watanabe
    • Organizer
      The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing
    • Related Report
      2021 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Structured Refinement for Sequential Labeling2021

    • Author(s)
      Yiran Wang, Hiroyuki Shindo, Yuji Matsumoto, Taro Watanabe
    • Organizer
      The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Findings)
    • Related Report
      2021 Annual Research Report
    • Int'l Joint Research
  • [Presentation] ラベルの不均衡を考慮したEnd-to-End情報抽出モデルの学習2021

    • Author(s)
      山口泰弘, 進藤裕之, 渡辺太郎
    • Organizer
      言語処理学会第27回年次大会(NLP2021)
    • Related Report
      2020 Research-status Report
  • [Presentation] 遺伝子二重欠失研究のための関連論文検索手法2021

    • Author(s)
      平野颯, 野村航, 進藤裕之, 渡辺太郎
    • Organizer
      言語処理学会第27回年次大会(NLP2021)
    • Related Report
      2020 Research-status Report
  • [Presentation] LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention2020

    • Author(s)
      Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto
    • Organizer
      In Proceedings of EMNLP
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia2020

    • Author(s)
      Ikuya Yamada, Akari Asai, Jin Sakuma, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji, Yuji Matsumoto
    • Organizer
      In Proceedings of EMNLP (demo)
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] Decomposed Local Models for Coordinate Structure Parsing2019

    • Author(s)
      Hiroki Teranishi, Hiroyuki Shindo, Yuji Matsumoto
    • Organizer
      In Proceedings of NAACL
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] Stochastic Tokenization with a Language Model for Neural Text Classification2019

    • Author(s)
      Tatsuya Hiraoka, Hiroyuki Shindo, Yuji Matsumoto
    • Organizer
      In Proceedings of ACL, 2019
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] Relation Classification Using Segment-Level Attention-based CNN and Dependency-based RNN2019

    • Author(s)
      Van-Hien Tran, Hiroyuki Shindo, Yuji Matsumoto
    • Organizer
      In Proceedings of NAACL, 2019
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] Automatic extraction of polymer data from tables in xml2018

    • Author(s)
      Hiroyuki Oka, Hiroyuki Shindo, Keisuke Goto, Yuji Matsumoto, Atsushi Yoshizawa, Isao Kuwajima and Masashi Ishii
    • Organizer
      In Proceedings of SCIDOCA
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research
  • [Presentation] Line Detection Considering Spatial Context for Reading Line Charts2018

    • Author(s)
      Keisuke Goto, Hiroyuki Shindo and Yuji Matsumoto
    • Organizer
      In Proceedings of SCIDOCA
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research
  • [Presentation] Translating Chemical Substance Names using Attentional Encoder-Decoder2018

    • Author(s)
      Shuhei Kondo, Yuji Matsumoto and Hiroyuki Shindo
    • Organizer
      In Proceedings of SCIDOCA
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research
  • [Presentation] A Span Selection Model for Semantic Role Labeling2018

    • Author(s)
      Hiroki Ouchi, Hiroyuki Shindo and Yuji Matsumoto
    • Organizer
      In Proceedings of EMNLP, 2018
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research
  • [Presentation] Representation Learning of Entities and Documents from Knowledge Base Descriptions2018

    • Author(s)
      Ikuya Yamada and Hiroyuki Shindo
    • Organizer
      In Proceedings of COLING, 2018
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research

URL: 

Published: 2018-04-23   Modified: 2023-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi