Project/Area Number |
18K18109
|
Research Category |
Grant-in-Aid for Early-Career Scientists
|
Allocation Type | Multi-year Fund |
Review Section |
Basic Section 61030:Intelligent informatics-related
|
Research Institution | Nara Institute of Science and Technology |
Principal Investigator |
Hiroyuki Shindo 奈良先端科学技術大学院大学, データ駆動型サイエンス創造センター, 特任准教授 (20734784)
|
Project Period (FY) |
2018-04-01 – 2022-03-31
|
Project Status |
Completed (Fiscal Year 2021)
|
Budget Amount *help |
¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000)
Fiscal Year 2020: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2019: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2018: ¥1,820,000 (Direct Cost: ¥1,400,000、Indirect Cost: ¥420,000)
|
Keywords | 論文解析 / 自然言語処理 / 構文解析 / オブジェクト検出 / 構造解析 / PDF / XML / 知識獲得 / 情報抽出 / 科学技術論文 / 意味解析 / 関係抽出 |
Outline of Final Research Achievements |
The number of published scientific papers is increasing at an accelerating rate, and it is difficult for individuals to search and read all the necessary papers. In this study, we developed a model to automatically detect the objects such as figures and tables, and analyze the structure of the text and tables in a paper to convert them into structured formats such as XML. Our integrated parser mainly targets materials science literature, using image processing to detect the regions of figures and tables, and natural language processing to analyze the structures of text and tables. In addition, we developed resources for training and evaluating our model such as datasets for the region of tables and figures, as well as the structure of the text and tables in a paper.
|
Academic Significance and Societal Importance of the Research Achievements |
本研究により,PDF形式の論文データを入力として,図表,数式,段落などのオブジェクトを抽出することや,表の内部構造(ヘッダや行列)を取得することができるようになった.そのため,ある分野における論文の実験データを網羅的に収集することや,図表に記述されている情報の細かい分析や検索が可能になると考えられる.また,本技術を用いて様々な分野の論文を構造化して知識データベースを構築し,ユーザーが閲覧できるようなサービスの実現も可能となる.
|