Research on Integrated Structural Parsing from Scientific Literature

Research Project

Project/Area Number	18K18109
Research Category	Grant-in-Aid for Early-Career Scientists
Allocation Type	Multi-year Fund
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	Nara Institute of Science and Technology
Principal Investigator	Hiroyuki Shindo 奈良先端科学技術大学院大学, データ駆動型サイエンス創造センター, 特任准教授 (20734784)
Project Period (FY)	2018-04-01 – 2022-03-31
Project Status	Completed (Fiscal Year 2021)
Budget Amount *help	¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000) Fiscal Year 2020: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000) Fiscal Year 2019: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000) Fiscal Year 2018: ¥1,820,000 (Direct Cost: ¥1,400,000、Indirect Cost: ¥420,000)
Keywords	論文解析 / 自然言語処理 / 構文解析 / オブジェクト検出 / 構造解析 / PDF / XML / 知識獲得 / 情報抽出 / 科学技術論文 / 意味解析 / 関係抽出
Outline of Final Research Achievements	The number of published scientific papers is increasing at an accelerating rate, and it is difficult for individuals to search and read all the necessary papers. In this study, we developed a model to automatically detect the objects such as figures and tables, and analyze the structure of the text and tables in a paper to convert them into structured formats such as XML. Our integrated parser mainly targets materials science literature, using image processing to detect the regions of figures and tables, and natural language processing to analyze the structures of text and tables. In addition, we developed resources for training and evaluating our model such as datasets for the region of tables and figures, as well as the structure of the text and tables in a paper.
Academic Significance and Societal Importance of the Research Achievements	本研究により，PDF形式の論文データを入力として，図表，数式，段落などのオブジェクトを抽出することや，表の内部構造（ヘッダや行列）を取得することができるようになった．そのため，ある分野における論文の実験データを網羅的に収集することや，図表に記述されている情報の細かい分析や検索が可能になると考えられる．また，本技術を用いて様々な分野の論文を構造化して知識データベースを構築し，ユーザーが閲覧できるようなサービスの実現も可能となる．

Report

(5 results)

2021 Annual Research Report Final Research Report ( PDF )
2020 Research-status Report
2019 Research-status Report
2018 Research-status Report

Research Products
(21 results)

All 2021 2020 2019 2018

All Journal Article (5 results) (of which Peer Reviewed: 5 results, Open Access: 2 results) Presentation (16 results) (of which Int'l Joint Research: 14 results)

[Journal Article] Machine extraction of polymer data from tables using XML versions of scientific articles2021
- Author(s)
  Hiroyuki Oka, Atsushi Yoshizawa, Hiroyuki Shindo, Yuji Matsumoto, Masashi Ishii
- Journal Title
  
  SCIENCE AND TECHNOLOGY OF ADVANCED MATERIALS: METHODS
  
  Volume: 1 Issue: 1 Pages: 12-23
- DOI
  10.1080/27660400.2021.1899456
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Construction and Analysis of Multiword Expression-aware Dependency Corpus2019
- Author(s)
  Kato Akihiko、Shindo Hiroyuki、Matsumoto Yuji
- Journal Title
  
  Journal of Natural Language Processing
  
  Volume: 26 Issue: 4 Pages: 663-688
- DOI
  10.5715/jnlp.26.663
- NAID
  130007808657
- ISSN
  1340-7619, 2185-8314
- Year and Date
  2019-12-15
- Related Report
  2019 Research-status Report
- Peer Reviewed
[Journal Article] Development of a computer-assisted Japanese functional expression learning system for Chinese-speaking learners2019
- Author(s)
  Liu, J., Shindo, H. and Matsumoto, Y
- Journal Title
  
  Educational Technology Research and Development
  
  Volume: 67 Issue: 5 Pages: 1307-1331
- DOI
  10.1007/s11423-019-09669-0
- Related Report
  2019 Research-status Report
- Peer Reviewed
[Journal Article] Autoencoder for Semisupervised Multiple Emotion Detection of Conversation Transcripts2018
- Author(s)
  Phan Duc-Anh、Matsumoto Yuji、Shindo Hiroyuki
- Journal Title
  
  IEEE Transactions on Affective Computing
  
  Volume: 1 Issue: 3 Pages: 1-11
- DOI
  10.1109/taffc.2018.2885304
- Related Report
  2018 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Similarity and Replaceability Feature Representations of Word Sequences for Identifying Coordination Boundaries2018
- Author(s)
  Teranishi Hiroki、Shindo Hiroyuki、Matsumoto Yuji
- Journal Title
  
  Journal of Natural Language Processing
  
  Volume: 25 Issue: 4 Pages: 441-462
- DOI
  10.5715/jnlp.25.441
- NAID
  130007531010
- Related Report
  2018 Research-status Report
- Peer Reviewed
[Presentation] A Generative Approach for End-to-End Relation Extraction2021
- Author(s)
  Shanshan Liu, Tatsuya Ishigaki, Yui Uehara, Hiroya Takamura, Chowdhury Mohammad Mahir Asef, Mutsunori Uenuma, Hiroyuki Shindo, Yuji Matsumoto
- Organizer
  Fifth International Workshop on Scientific Document Analysis
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Automatic extraction of text data of synthesis process for papers on bulk thermoelectric materials2021
- Author(s)
  Mohammad Mahir Asef Chowdhury, Mutsunori Uenuma, Shanshan Liu, Hiroyuki Shindo, Yuji Matsumoto, Yukiharu Uraoka
- Organizer
  Virtual Conference on Thermoelectrics
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Nested Named Entity Recognition via Explicitly Excluding the Influence of the Best Path2021
- Author(s)
  Yiran Wang, Hiroyuki Shindo, Yuji Matsumoto, Taro Watanabe
- Organizer
  The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Structured Refinement for Sequential Labeling2021
- Author(s)
  Yiran Wang, Hiroyuki Shindo, Yuji Matsumoto, Taro Watanabe
- Organizer
  The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Findings)
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] ラベルの不均衡を考慮したEnd-to-End情報抽出モデルの学習2021
- Author(s)
  山口泰弘, 進藤裕之, 渡辺太郎
- Organizer
  言語処理学会第27回年次大会(NLP2021)
- Related Report
  2020 Research-status Report
[Presentation] 遺伝子二重欠失研究のための関連論文検索手法2021
- Author(s)
  平野颯, 野村航, 進藤裕之, 渡辺太郎
- Organizer
  言語処理学会第27回年次大会(NLP2021)
- Related Report
  2020 Research-status Report
[Presentation] LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention2020
- Author(s)
  Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto
- Organizer
  In Proceedings of EMNLP
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia2020
- Author(s)
  Ikuya Yamada, Akari Asai, Jin Sakuma, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji, Yuji Matsumoto
- Organizer
  In Proceedings of EMNLP (demo)
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] Decomposed Local Models for Coordinate Structure Parsing2019
- Author(s)
  Hiroki Teranishi, Hiroyuki Shindo, Yuji Matsumoto
- Organizer
  In Proceedings of NAACL
- Related Report
  2019 Research-status Report
- Int'l Joint Research
[Presentation] Stochastic Tokenization with a Language Model for Neural Text Classification2019
- Author(s)
  Tatsuya Hiraoka, Hiroyuki Shindo, Yuji Matsumoto
- Organizer
  In Proceedings of ACL, 2019
- Related Report
  2019 Research-status Report
- Int'l Joint Research
[Presentation] Relation Classification Using Segment-Level Attention-based CNN and Dependency-based RNN2019
- Author(s)
  Van-Hien Tran, Hiroyuki Shindo, Yuji Matsumoto
- Organizer
  In Proceedings of NAACL, 2019
- Related Report
  2019 Research-status Report
- Int'l Joint Research
[Presentation] Automatic extraction of polymer data from tables in xml2018
- Author(s)
  Hiroyuki Oka, Hiroyuki Shindo, Keisuke Goto, Yuji Matsumoto, Atsushi Yoshizawa, Isao Kuwajima and Masashi Ishii
- Organizer
  In Proceedings of SCIDOCA
- Related Report
  2018 Research-status Report
- Int'l Joint Research
[Presentation] Line Detection Considering Spatial Context for Reading Line Charts2018
- Author(s)
  Keisuke Goto, Hiroyuki Shindo and Yuji Matsumoto
- Organizer
  In Proceedings of SCIDOCA
- Related Report
  2018 Research-status Report
- Int'l Joint Research
[Presentation] Translating Chemical Substance Names using Attentional Encoder-Decoder2018
- Author(s)
  Shuhei Kondo, Yuji Matsumoto and Hiroyuki Shindo
- Organizer
  In Proceedings of SCIDOCA
- Related Report
  2018 Research-status Report
- Int'l Joint Research
[Presentation] A Span Selection Model for Semantic Role Labeling2018
- Author(s)
  Hiroki Ouchi, Hiroyuki Shindo and Yuji Matsumoto
- Organizer
  In Proceedings of EMNLP, 2018
- Related Report
  2018 Research-status Report
- Int'l Joint Research
[Presentation] Representation Learning of Entities and Documents from Knowledge Base Descriptions2018
- Author(s)
  Ikuya Yamada and Hiroyuki Shindo
- Organizer
  In Proceedings of COLING, 2018
- Related Report
  2018 Research-status Report
- Int'l Joint Research

Research on Integrated Structural Parsing from Scientific Literature

Principal Investigator

Hiroyuki Shindo 奈良先端科学技術大学院大学, データ駆動型サイエンス創造センター, 特任准教授 (20734784)

¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000)

Report

Research Products

[Journal Article] Machine extraction of polymer data from tables using XML versions of scientific articles2021

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Construction and Analysis of Multiword Expression-aware Dependency Corpus2019

Author(s)

Journal Title

DOI

NAID

ISSN

Year and Date

Related Report

[Journal Article] Development of a computer-assisted Japanese functional expression learning system for Chinese-speaking learners2019

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Autoencoder for Semisupervised Multiple Emotion Detection of Conversation Transcripts2018

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Similarity and Replaceability Feature Representations of Word Sequences for Identifying Coordination Boundaries2018

Author(s)

Journal Title

DOI

NAID

Related Report

[Presentation] A Generative Approach for End-to-End Relation Extraction2021

Author(s)

Organizer

Related Report

[Presentation] Automatic extraction of text data of synthesis process for papers on bulk thermoelectric materials2021

Author(s)

Organizer

Related Report

[Presentation] Nested Named Entity Recognition via Explicitly Excluding the Influence of the Best Path2021

Author(s)

Organizer

Related Report

[Presentation] Structured Refinement for Sequential Labeling2021

Author(s)

Organizer

Related Report

[Presentation] ラベルの不均衡を考慮したEnd-to-End情報抽出モデルの学習2021

Author(s)

Organizer

Related Report

[Presentation] 遺伝子二重欠失研究のための関連論文検索手法2021

Author(s)

Organizer

Related Report

[Presentation] LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention2020

Author(s)

Organizer

Related Report

[Presentation] Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia2020

Author(s)

Organizer

Related Report

[Presentation] Decomposed Local Models for Coordinate Structure Parsing2019

Author(s)

Organizer

Related Report

[Presentation] Stochastic Tokenization with a Language Model for Neural Text Classification2019

Author(s)

Organizer

Related Report

[Presentation] Relation Classification Using Segment-Level Attention-based CNN and Dependency-based RNN2019

Author(s)

Organizer

Related Report

[Presentation] Automatic extraction of polymer data from tables in xml2018