• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Development of a classification system for data analysis methods based on natural language processing and collective intelligence

Research Project

Project/Area Number 21700315
Research Category

Grant-in-Aid for Young Scientists (B)

Allocation TypeSingle-year Grants
Research Field Statistical science
Research InstitutionNational Institute of Genetics

Principal Investigator

OGASAWARA Osamu  National Institute of Genetics, 生命情報・DDBJ研究センター, 助教 (00435512)

Project Period (FY) 2009 – 2010
Project Status Completed (Fiscal Year 2010)
Budget Amount *help
¥2,600,000 (Direct Cost: ¥2,000,000、Indirect Cost: ¥600,000)
Fiscal Year 2010: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2009: ¥1,820,000 (Direct Cost: ¥1,400,000、Indirect Cost: ¥420,000)
Keywordsデータベース / 自然言語処理 / 統計処理システム / named entity recognition
Research Abstract

As data measurement technology has advanced, increasing attention has been paid to data-intensive approaches, especially in the field of biology. In addition, as the performance of digital computers has increased, so has the sophistication of statistical analysis and other data analysis methods. The fusion of data measurement and the data analysis technologies is expected to have profound impacts on future biological research. However, from a practical standpoint, it is difficult for experimental scientists who are devoted to making the measurements that generate massive amounts of data but are not specialists in statistics to make full use of cutting-edge statistical analysis methods.
To remedy the above-described problem, I have been publishing a database of statistical analysis procedures (the R Graphical Manual) since 2006. This database has the virtue that users can browse the functionality of procedures in the R statistical system by making use of all the provided images generated … More by invoking all the examples in the R statistical system, as well as enabling full text search of all documents in the R statistical system. This database has been highly acclaimed by users world-wide and the visit statistics for the database were 100,000 to 500,000 page views/month and 8,000 to 10,000 unique IPs/month in 2008. However, sufficient resources, both in terms of hardware and software, had not been allocated to the database, despite the high computational demand necessary for data preparation for this database.
Thanks to the improved hardware and software environment of this project, the number of unique IPs per month has increased notably, to about 50,000 unique IPs/month (about 200,000 page views/month) in May 2011. Since the number of unique IPs/month of DDBJ (maintained by the National Institute of Genetics) is about 17,000 and that of KEGG (at Kyoto University) is about 200,000, the R Graphical Manual has grown in Japan into a database having comparable popularity to those famous databases.
In this project, I developed a classification system of statistical procedures taken from statistical dictionaries, textbooks, and manuals that are contained in the R Graphical Manual. In order to map the functions in the R Graphical Manual to the categories of this classification system, I developed a novel algorithm to improve the performance of named entity recognition (NER). This algorithm is applied to all the individual manual entries contained within the R Graphical Manual to extract technical statistical terms and I made a mapping from each procedure entry to the classification categories. Less

Report

(3 results)
  • 2010 Annual Research Report   Final Research Report ( PDF )
  • 2009 Annual Research Report
  • Research Products

    (6 results)

All Other

All Remarks (6 results)

  • [Remarks] ホームページ等

    • URL

      http://rgm2.lab.nig.ac.jp/RGM2/index.php

    • Related Report
      2010 Final Research Report
  • [Remarks] ミラーサイト

    • URL

      http://www.oga-lab.net/RGM2/index.php

    • Related Report
      2010 Final Research Report
  • [Remarks] アクセスログ解析結果

    • URL

      http://rgm2.lab.nig.ac.jp/cgi-bin/awstats.pl

    • Related Report
      2010 Final Research Report
  • [Remarks] ミラーサイトのアクセスログ解析結果

    • URL

      http://www.oga-lab.net/cgi-bin/awstats.pl

    • Related Report
      2010 Final Research Report
  • [Remarks]

    • URL

      http://rgm2.lab.nig.ac.jp/RGM2/index.php

    • Related Report
      2010 Annual Research Report
  • [Remarks]

    • URL

      http://bm2.genes.nig.ac.jp/RGM2/index.php

    • Related Report
      2009 Annual Research Report

URL: 

Published: 2009-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi