• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2010 Fiscal Year Final Research Report

Research on Advanced Natural Language Processing and Text Mining

Research Project

  • PDF
Project/Area Number 18002007
Research Category

Grant-in-Aid for Specially Promoted Research

Allocation TypeSingle-year Grants
Review Section Science and Engineering
Engineering
Research InstitutionThe University of Tokyo

Principal Investigator

TSUJII Junichi  The University of Tokyo, 大学院・情報理工学系研究科, 教授 (20026313)

Co-Investigator(Kenkyū-buntansha) YONEZAWA Akinori  東京大学, 大学院・情報理工学研究科, 教授 (00133116)
TAURA Kenjiro  東京大学, 大学院・情報理工学研究科, 准教授 (90282714)
MIYAO Yusuke  東京大学, 大学院・情報理工学研究科, 助教 (00343096)
MATSUZAKI Takuya  東京大学, 大学院・情報理工学研究科, 助教 (40463872)
Research Collaborator KANO Yoshinobu  東京大学, 大学院・情報学環, 特任研究員
OHTA Tomoko  東京大学, 大学院・情報学環, 特任研究員
SAETRE Rune  東京大学, 大学院・情報学環, 特任研究員
SHIBATA Takeshi  東京大学, 大学院・情報学環, 特任研究員
MIWA Makoto  東京大学, 大学院・情報学環, 特任研究員
PYYSALO SAMPO Mikael  東京大学, 大学院・情報学環, 特任研究員
KIM Jin-Dong  東京大学, 大学院・情報学環, 特任講師
SAGAE Kenji  東京大学, 大学院・情報理工学系研究科, 特任研究員
SAGAE T. Alicia  東京大学, 大学院・情報理工学系研究科, リサーチアシスタント
WANG Xiangli  東京大学, 大学院・情報理工学系研究科, 特任研究員
TSUNAKAWA Takashi  東京大学, 大学院・情報理工学系研究科, 特任研究員
HARA Tadayoshi  東京大学, 大学院・情報学環, 特任研究員
Project Period (FY) 2006 – 2010
Keywords言語理解 / 意味処理 / テキストマイニング / 文脈処理 / 知的検索
Research Abstract

The objective of the project was to apply the methodology of combining statistical modeling with structure-based symbolic processing, which had proven successful in sentence parsing, to more challenging tasks such as deep semantic processing, knowledge-based information extraction and contextual processing. We have achieved significant results in (1) efficient and robust deep parsing based on a linguistically sound formalism, (2) a large scale semantically annotated corpus for the biology domain (GENIA corpus), (3) information extraction programs (named entity recognizers and event recognizers) for the biology domain which combine the deep parsing in (1) and structural machine learning algorithms, and (4) Workflow software for data-centered parallel processing.
The GENIA corpus in (2) has been recognized as the gold standard corpus for research of text mining for biology and has been used by many groups in the world. It was adopted as the training and test corpus for international shared task competition twice (BioNLP 09 and BioNLP 11). The extraction programs developed in (3) successfully showed the state of the art performance in these international shared task competitions. The system based on (1) and (4) showed that the technology developed by this project was practical for processing the real world text. We successfully processed the whole of MEDLINE (20 million abstracts, more than 2 billion sentences) and indexed them semantically in less than a week. The processing results of MEDLINE has been made publicly available through an intelligent document retrieval system (MEDIE)

  • Research Products

    (42 results)

All 2010 2009 2008 2006 Other

All Journal Article (13 results) Presentation (24 results) Book (3 results) Remarks (2 results)

  • [Journal Article] Building a High Quality Sense Inventory for Improved Abbreviation Disambiguation. Bioinformatics.2010

    • Author(s)
      Okazaki, Naoaki, Sophia Ananiadou, Jun'ichi Tsujii.
    • Journal Title

      Oxford University Press

  • [Journal Article] Bio-Molecular Event Extraction with Markov Logic.2010

    • Author(s)
      Riedel, Sebastian, Rune Saetre, Hong-Woo Chun, Toshihisa Takagi, Jun'ichi Tsujii.
    • Journal Title

      Computational Intelligence. Special Issue. Edmonton, Alberta, Canada T6G 2E8(Jin-Dong Kim (Eds.))

  • [Journal Article] Event Extraction with Complex Event Classification Using Rich Features.2010

    • Author(s)
      Miwa, Makoto, Rune Satre, Jin-Dong Kim, Jun'ichi Tsujii.
    • Journal Title

      Journal of Bioinformatics and Computational Biology (JBCB). 8(1)

      Pages: 131-146

  • [Journal Article] Comparison of Chinese Treebanks for Corpus-oriented HPSG Grammar Development.2010

    • Author(s)
      Yu, Kun, Yusuke Miyao, Takuya Matsuzaki, Xiangli Wang, Yaozhong Zhang, Kiyotaka Uchimoto, Junichi Tsujii.
    • Journal Title

      Journal of Natural Language Processing (Special Issue on Empirical Methods for Asian Language Processing). April

  • [Journal Article] Improve Syntax-based Translation Using Deep Syntactic Structures.2010

    • Author(s)
      Wu, Xianchao, Takuya Matsuzaki, Jun'ichi Tsujii.
    • Journal Title

      Journal of Machine Translation (Special Issue : Pushing the frontiers of SMT). 24(2)Springer

      Pages: 141-157

  • [Journal Article] Extracting Protein-Interactions from Text with the Unified AkaneRE Event Extraction System.2010

    • Author(s)
      Saetre, Rune, Kazuhiro Yoshida, Makoto Miwa, Takuya Matsuzaki, Yoshinobu Kano, Junichi Tsujii.
    • Journal Title

      Transactions on Computational Biology and Bioinformatics (TCBB), BioCreative II.5 Special Issue. 7 IEEE/ACM

      Pages: 46

  • [Journal Article] A Chinese-Japanese Lexical Machine Translation through a Pivot Language.2009

    • Author(s)
      Tsunakawa, Takashi, Naoaki Okazaki, Xiao Liu, Jun'ichi Tsujii.
    • Journal Title

      ACM Transactions on Asian Language Information Processing. 8(2)

      Pages: 9:1-9:21(ISSN: 1530-0226)

  • [Journal Article] Evaluating Contributions of Natural Language Parsers to Protein-Protein Interaction Extraction.2009

    • Author(s)
      Miyao, Yusuke, Kenji Sagae, Rune Saetre, Takuya Matsuzaki, Jun'ichi Tsujii.
    • Journal Title

      Bioinformatics.(Oxford University Press) 25(3)

      Pages: 394-400

  • [Journal Article] Protein-Protein Interaction Extraction by Leveraging Multiple Kernels and Parsers.2009

    • Author(s)
      Miwa, Makoto, Rune Saetre, Yusuke Miyao, Jun'ichi Tsujii.
    • Journal Title

      International Journal of Medical Informatics.(Mining of Clinical and Biomedical Text and Data Special Issue.) 78(12)

      Pages: e39-e46

  • [Journal Article] Corpus annotation for mining biomedical events from lterature.2008

    • Author(s)
      Kim, Jin-Dong, Tomoko Ohta, Jun'ichi Tsujii.
    • Journal Title

      BMC Bioinformatics.(BioMed Central) 9(1)

      Pages: 10(ISSN 1471-2105)

  • [Journal Article] New challenges for text mining : Mapping between text and manually curated pathways.2008

    • Author(s)
      Oda, Kanae, Jin-Dong Kim, Tomoko Ohta, Daisuke Okanohara, Takuya Matsuzaki, Yuka Tateisi, Jun'ichi Tsujii.
    • Journal Title

      BMC Bioinformatics.(BioMed Central) 9(Suppl 3)

      Pages: S5(ISSN 1471-2105)

  • [Journal Article] Sophia Ananiadou. FACTA : a text search engine for finding associated biomedical concepts.2008

    • Author(s)
      Tsuruoka, Yoshimasa, Jun'ichi Tsujii.
    • Journal Title

      Bioinformatics. 24(21)

      Pages: 2259-2260

  • [Journal Article] Feature Forest Models for Probabilistic HPSG Parsing2008

    • Author(s)
      Miyao, Yusuke, Jun'ichi Tsujii.
    • Journal Title

      Computational Linguistics. 34(1) MIT Press

      Pages: 35-80

  • [Presentation] Evaluating Dependency Representation for Event Extraction.2010

    • Author(s)
      Miwa, Makoto, Sampo Pyysalo, Tadayoshi Hara, Jun'ichi Tsujii.
    • Organizer
      23rd COLING. pp.779-787
    • Year and Date
      20100800
  • [Presentation] Entity-Focused Sentence Simplification for Relation Extraction.2010

    • Author(s)
      Miwa, Makoto, Yusuke Miyao, Rune Satre, Jun'ichi Tsujii.
    • Organizer
      23rd COLING. pp.788-796
    • Year and Date
      20100800
  • [Presentation] Fine-Grained Tree-to-String Translation Rule Extraction.2010

    • Author(s)
      Wu, Xianchao, Takuya Matsuzaki, Jun'ichi Tsujii.
    • Organizer
      48th ACL. pp.325-334
    • Year and Date
      20100700
  • [Presentation] A Simple Approach for HPSG Supertagging Using Dependency Information.2010

    • Author(s)
      Yao-zhong Zhang, Takuya Matsuzaki, Jun'ichi Tsujii.
    • Organizer
      11th NAACL-HLT'10. pp.645-648
    • Year and Date
      20100600
  • [Presentation] ParaTrac : A Fine-Grained Profiler for Data-Intensive Workflows.2010

    • Author(s)
      Dun, Nan, Kenjiro Taura, Akinori Yonezawa.
    • Organizer
      19th ACM HPDC 2010, pp.37-48
    • Year and Date
      20100600
  • [Presentation] File-Access Patterns of Data-Intensive Workflow Applications and their Implications to Distributed Filesystems.2010

    • Author(s)
      Shibata, Takeshi, SungJun Choi, Kenjiro Taura.
    • Organizer
      3rd DIDC 2010, pp.746-755
    • Year and Date
      20100600
  • [Presentation] A Japanese Particle Corpus Built by Example-Based Annotation.2010

    • Author(s)
      Hanaoka, Hiroki, Hideki Mima, Jun'ichi Tsujii.
    • Organizer
      LREC2010. pp.1876-1880
    • Year and Date
      20100500
  • [Presentation] Design and Implementation of GXP make-a Workflow System Based on Make.2010

    • Author(s)
      Taura, Kenjiro, Takuya Matsuzaki, Makoto Miwa, Yoshikazu Kamoshida, Daisaku Yokoyama, Nan Dun, Takeshi Shibata, Choi Sung Jun, Jun'ichi Tsujii.
    • Organizer
      2010 IEEE 6th International Conference on e-Science 214-221
    • Year and Date
      20100000
  • [Presentation] Forest-guided Supertagger Training.2010

    • Author(s)
      Yao-zhong Zhang, Takuya Matsuzaki, Jun'ichi Tsujii.
    • Organizer
      23rd COLING. pp.1281-1289
    • Year and Date
      20100000
  • [Presentation] The UOT System : Improve String-to-Tree Translation Using Head-Driven Phrase Structure Grammar and Predicate-Argument Structures.2009

    • Author(s)
      Wu, Xianchao, Takuya Matsuzaki, Naoaki Okazaki, Yusuke Miyao, Jun'ichi Tsujii.
    • Organizer
      IWSLT 2009. pp.99-106
    • Year and Date
      20091200
  • [Presentation] Event Extraction with Complex Event Classification using Rich Features.2009

    • Author(s)
      Miwa, Makoto, Rune Saetre, Jin-Dong Kim, Jun'ichi Tsujii.
    • Organizer
      In the 3rd International Symposium on Languages in Biology and Medicine (LBM 2009). pp.11-19
    • Place of Presentation
      Honorable Mention Award
    • Year and Date
      20091100
  • [Presentation] Effective Analysis of Causes and Inter-dependencies of Parsing Errors.2009

    • Author(s)
      Hara, Tadayoshi, Yusuke Miyao, Jun'ichi Tsujii.
    • Organizer
      IWPT-09 Paris, France, pp.180-191
    • Year and Date
      20091000
  • [Presentation] A Comparative Study on Generalization of Semantic Roles in FrameNet.2009

    • Author(s)
      Matsubayashi, Yuichiroh, Naoaki Okazaki, Jun'ichi Tsujii.
    • Organizer
      ACL-IJCNLP2009. pp.19-27
    • Year and Date
      20090800
  • [Presentation] Supervised Learning of a Probabilistic Lexicon of Verb Semantic Classes.2009

    • Author(s)
      Miyao, Yusuke, Jun'ichi Tsujii.
    • Organizer
      EMNLP 2009. Singapore, pp.1328-1337
    • Year and Date
      20090800
  • [Presentation] Fast Full Parsing by Linear-Chain Conditional Random2009

    • Author(s)
      Tsuruoka, Yoshimasa, Jun'ichi Tsujii, Sophia Ananiadou.
    • Organizer
      EACL. pp.790-798
    • Year and Date
      20090400
  • [Presentation] Bilingual Dictionary Extraction from Wikipedia.2009

    • Author(s)
      Yu, Kun, Junichi Tsujii.
    • Organizer
      Proceedings of Machine Translation Summit XII.
    • Year and Date
      20090000
  • [Presentation] Extracting Bilingual Dictionary from Comparable Corpora with Dependency Heterogeneity.2009

    • Author(s)
      Yu, Kun, Junichi Tsujii.
    • Organizer
      NAACL HLT 2009. pp.121-124
    • Year and Date
      20090000
  • [Presentation] A Discriminative Latent Variable Chinese Segmenter with Hybrid Word/Character Information.2009

    • Author(s)
      Sun, Xu, Yaozhong Zhang, Takuya Matsuzaki, Yoshimasa Tsuruoka, Jun'ichi Tsujii.
    • Organizer
      NAACL-HLT'09. Boulder, Colorado, pp.56-64
    • Year and Date
      20090000
  • [Presentation] HPSG Supertagging : A Sequence Labeling View.2009

    • Author(s)
      Yao-zhong Zhang, Takuya Matsuzaki, Jun'ichi Tsujii.
    • Organizer
      11th IWPT'09. pp.210-213
    • Year and Date
      20090000
  • [Presentation] Robust Approach to Abbreviating Terms : A Discriminative Latent Variable Model with Global Information.2009

    • Author(s)
      Sun, Xu, Naoaki Okazaki, Jun'ichi Tsujii.
    • Organizer
      ACL. Singapore, pp.905-913
    • Year and Date
      20090000
  • [Presentation] Semi-Supervised Lexicon Mining from Parenthetical Expressions in Monolingual Web Pages.2009

    • Author(s)
      Wu, Xianchao, Okazaki, Naoaki, Tsujii, Jun'ichi.
    • Organizer
      Human Language Technologies : NAACL. Boulder, Colorado, pp.424-432
    • Year and Date
      20090000
  • [Presentation] Latent Variable Perceptron Algorithm for Structured Classification.2009

    • Author(s)
      Sun, Xu, Takuya Matsuzaki, Daisuke Okanohara, Jun'ichi Tsujii.
    • Organizer
      IJCAI. Los Angeles, pp.1236-1242
    • Year and Date
      20090000
  • [Presentation] Evaluating Contribution of Deep Syntactic Information to Shallow Semantic Analysis.2009

    • Author(s)
      Uematsu, Sumire, Jun'ichi Tsujii.
    • Organizer
      IWPT'09. pp.85-88
    • Year and Date
      20090000
  • [Presentation] Sequential Labeling with Latent Variables : An Exact Inference Algorithm and An Efficient Approximation.2009

    • Author(s)
      Sun, Xu, Jun'ichi Tsujii.
    • Organizer
      12th EACL 2009. Athens, Greece, pp.772-780
    • Year and Date
      20090000
  • [Book] Evaluating the Impact of Re-training a Lexical Disambiguation Model on Domain Adaptation of an HPSG Parser.(Trends in Parsing Technology : Dependency Parsing, Domain Adaptation, and Deep Parsing. Text, Speech and Language Technologypp.)(Harry Bunt, Paola Merlo, Joakim Nivre (Eds.))2010

    • Author(s)
      Hara, Tadayoshi, Yusuke Miyao, Jun'ichi Tsujii.
    • Total Pages
      253-272
    • Publisher
      Springer
  • [Book] Probabilistic Context-Free Grammars with Latent Annotations.(Supertagging-Using Complex Lexical Descriptions in Natural Language Processing.)(Srinivas Bangalore, Aravind K.Joshi (Eds.))2010

    • Author(s)
      Matsuzaki, Takuya, Yusuke Miyao, Jun'ichi Tsujii.
    • Total Pages
      337-354
    • Publisher
      MIT Press
  • [Book] London SW1V 1AH UK(Corpora and their Annotation.)(Text Mining for Biology and Biomedicine. 46 Gillingham Street)(Sophia Ananiadou, John McNaught, (Eds.))2006

    • Author(s)
      Kim, Jin-Dong, Jun'ichi Tsujii.
    • Total Pages
      ISBN 1-58053-984-X
    • Publisher
      Artech House
  • [Remarks] プロジェクト

    • URL

      http://www-tsujii.is.s.u-tokyo.ac.jp/aNT/

  • [Remarks] 研究成果を反映したサービスを行う英国マンチェスター大学、国立テキストマイニングセンター

    • URL

      http://www.nactem.ac.uk/pathtext/

URL: 

Published: 2012-02-13   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi