Design and implementation of an evolutional data collecting system for the atomic and molecular databases
Project/Area Number |
16540364
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
原子・分子・量子エレクトロニクス・プラズマ
|
Research Institution | Advanced Photon Research Center, Quantum Beam Science Directorate, Japan Atomic Energy Agency |
Principal Investigator |
SASAKI Akira Japan Atomic Energy Agency, Quantum Beam Science Directorate, Senior Scientist, 量子ビーム応用研究部門, 研究副主幹 (10215709)
|
Co-Investigator(Kenkyū-buntansha) |
KATO Takako National Institute for Fusion Science, Professor, 自然科学研究機構・核融合科学研究所, 教授 (20115546)
JOE Kazuki Nara Women's University, Science Department, Professor, 奈良女子大学・理学部, 教授 (90283928)
PICHL Lukas International Christian University, Division of Natural Science, Associate Professor, 理学部, 助教授 (10343394)
OHISHI Masatoshi National Astronomical Observatory, Associate Professor, 自然科学研究機構・国立天文台, 助教授 (00183757)
MURATA Masaki National Institute of Information and Communications Technology, Principal Scientist, 情報通信研究部門, 主任研究員 (50358884)
|
Project Period (FY) |
2004 – 2005
|
Project Status |
Completed (Fiscal Year 2005)
|
Budget Amount *help |
¥3,200,000 (Direct Cost: ¥3,200,000)
Fiscal Year 2005: ¥1,500,000 (Direct Cost: ¥1,500,000)
Fiscal Year 2004: ¥1,700,000 (Direct Cost: ¥1,700,000)
|
Keywords | Atomic and Molecular Database / Atomic and Molecular Processes / Data mining / Natural Language Processing / Information extraction / Machine Learning / データ抽出 / 機械学習 / 原子分子データ / データマイニング / データベース / 原子分子過程 / 学習ベクトル量子化 |
Research Abstract |
Atomic and molecular database is useful for wide variety of fields of basic science and industrial applications. According to the present procedure of database development, atomic and molecular data are mainly collected from scientific papers through laborious work of staff scientists in the data centers. In this study, we have investigated a new application of information technology for evolutional development of atomic and molecular database based on automatic collection of scientific papers through the internet. We have developed a machine learning software which can recognize the existence of atomic and molecular data in unknown articles, using the bibliographic database of electron excitation and ionization cross sections as training examples. In order to characterize the articles, we firstly use the frequency of terms in the training and test examples. Secondly, we use information on the atomic and molecular states and technical terms corresponding to the atomic and molecular physics, which is found to be useful for improving the accuracy of the recognition. We study methods to recognize expressions for atomic and ionic species, charge states, electron configuration, spectral terms in the electronic form of scientific papers. We investigate frequency of appearance of each expression in the papers from different fields of physic published in Phys. Rev. A to E journals. Expression for nuclear species and molecules are also recognized separately, which are found to be useful to distinguish papers for atomic physics from those for the nuclear physics as well as physics of the condensed matter. Combining with the internet search engines, present result will make one possible to collect not only atomic and molecular data from articles but broader scientific information over a wide range of research fields.
|
Report
(3 results)
Research Products
(18 results)