2010 Fiscal Year Annual Research Report

高度言語理解のための意味・知識処理の基盤技術に関する研究

Research Project

Project/Area Number	18002007
Research Institution	The University of Tokyo
Principal Investigator	辻井潤一東京大学, 大学院・情報理工学系研究科, 教授 (20026313)
Keywords	言語理解 / 意味処理 / テキストマイニング / 文脈処理 / 知的検索
Research Abstract	意味文脈を考慮し、かつ、大規模なテキスト集合を処理する基盤技術の確立を目指した研究の最終年度として、あらたな研究成果を上げると同時に、最終的な大規模実験、および、広い研究者集団に成果を公開するためにソフトウェア・データの整備を行った。以下の研究を行った。 (1)スーバータギングの精度向上、意味の導入:局所的な情報のみを参照する従来のスーパータギングに浅い依存構造処理を統合することで、処理速度の劣化なしに精度を向上させることに成功した。現在の精度は、統合モデルよりも優れたものになっている。また、言語処理の初期段階であるタギング処理に意味を導入するモデルを構築した。 (2)事象認識システムとパスウェイモデルの統合:前年度に開発した事象認識モデルをさらに改良し現時点で世界最高の性能を示すシステムとした。このシステム(EventMine)を一般に公開した。また、認識された事象をより広範な生命事象ネットワーク(Pathway)に写像するシステムを構築した。 (3)GENIAコーパスの拡充と公開:英国マンチェスター大学、米国ヴァージニア工科大学と共同し、感染症関連の文献への意味アノテーションを行った。これは、たんぱく質関連に特化したこれまでのGENIAコーパスの範囲を大きく広げるものである。前年度に完成した32の事象アノテーションとこの感染症アノテーションを世界的に公開し、これをもとに国際的なコンペティション(BioNLP 2011)を組織した。コンペティションは、研究終了後の2011年6月に米国ポートランドで行われる。 (4)文解析を用いた機械翻訳:英語の深い文解析手法を中国語にも拡張し、中国語の深い文解析システムを完成した。この2つの文解析システムを使い、これまでのTree2Strngの統計翻訳システムをTree2treeシステムへと拡張した。また、英語と中国語の文解析システム、および、Tree2Treeの機械翻訳ソフトウェアを公開した。 (5)大規模言語処理システムの実験:前年度作成したワークフローを、英国マンチェスター大学と共同し抄録ではなく論文全文に適用する大規模な処理実験を行い、GXPが実用レベルの大規模処理に適用できることを実証した。実験では、8000以上の並列度(CPUコア数)が達成できることを確認した。

Research Products
(31 results)

All 2011 2010 Other

All Journal Article (19 results) (of which Peer Reviewed: 19 results) Presentation (10 results) Book (1 results) Remarks (1 results)

[Journal Article] Effective use of Dependency Structure for Bilingual Lexicon Creation.2011
- Author(s)
  Andrade D, Matsuzaki T, Tsujii J
- Journal Title
  
  Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2011), Lecture Notes in Computer Science. 6609.
  
  Pages: 80-92
- Peer Reviewed
[Journal Article] Multi-Topical Discussion Summarization using Structured Lexical Chains and Cue Words.2011
- Author(s)
  Hatori J, Murakami A, Tsujii J
- Journal Title
  
  Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2011), Lecture Notes in Computer Science. 6609.
  
  Pages: 313-327
- Peer Reviewed
[Journal Article] Named Entity Recognition for Bacterial Type IV Secretion Systems.2011
- Author(s)
  Ananiadou S, Sullivan D, Black W, Levow G-A, Gillespie JJ, Mao CH, Pyysalo S, Kolluru BK, Tsujii J, Sobral B
- Journal Title
  
  PLoS ONE
  
  Volume: 6(3) Pages: e14780
- Peer Reviewed
[Journal Article] Robust Measurement and Comparison of Context Similarity for Finding Translation Pairs.2010
- Author(s)
  Andrade D, Nasukawa T, Tsujii J
- Journal Title
  
  Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010)
  
  Pages: 19-27
- Peer Reviewed
[Journal Article] Evaluating Dependency Representation for Event Extraction.2010
- Author(s)
  Miwa M, Pyysalo S, Hara T, Tsujii J
- Journal Title
  
  Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010)
  
  Pages: 779-787
- Peer Reviewed
[Journal Article] Entity-Focused Sentence Simplification for Relation Extraction.2010
- Author(s)
  Miwa M, Miyao Y, Saetre R, Tsujii J
- Journal Title
  
  Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010)
  
  Pages: 788-796
- Peer Reviewed
[Journal Article] Semi-automatically Developing Chinese HPSG Grammar from the Penn Chinese Treebank for Deep Parsing.2010
- Author(s)
  Yu K, Miyao Y, Wang XLi, Matsuzaki T, Tsujii J
- Journal Title
  
  Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010)
  
  Pages: 1417-1425
- Peer Reviewed
[Journal Article] Simple and Efficient Algorithm for Approximate Dictionary Matching.2010
- Author(s)
  Okazaki N, Tsujii J
- Journal Title
  
  Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010)
  
  Pages: 851-859
- Peer Reviewed
[Journal Article] Forest-guided Supertagger Training.2010
- Author(s)
  Zhang Y-Z, Matsuzaki T, Tsujii J
- Journal Title
  
  Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010)
  
  Pages: 1281-1289
- Peer Reviewed
[Journal Article] The Gene Normalization and Intractive Systems of the University of Tokyo in the BioCreative III Challenge.2010
- Author(s)
  Okazaki N, Cho H-C, Saetre R, Pyysalo S, Ohta T, Tsujii J
- Journal Title
  
  Proceedings of BioCreative III
  
  Pages: 125-130
- Peer Reviewed
[Journal Article] Entities, Relations, Events : Representing Biomolecular Semantics.2010
- Author(s)
  Pyysalo, Sampo
- Journal Title
  
  BMC Bioinformatics
  
  Volume: 11(Suppl 5) Pages: 06
- Peer Reviewed
[Journal Article] MEDIE and Info-PubMed : 2010 Update.2010
- Author(s)
  Ohta T, Matsuzaki T, Okazaki N, Miwa M, Saetre R, Pyysalo S, Tsujii J
- Journal Title
  
  BMC Bioinformatics
  
  Volume: 11(Suppl 5) Pages: 7
- Peer Reviewed
[Journal Article] Text Mining Meets Workflow : Linking U-Compare with Taverna.2010
- Author(s)
  Kano Y, Dobson P, Nakanishi M, Tsujii J, Ananiadou S
- Journal Title
  
  Bioinformatics.
  
  Volume: 26(19) Pages: 2486-2487
- Peer Reviewed
[Journal Article] Improving the Inter-corpora Compatibility for Protein Annotations.2010
- Author(s)
  Wang Y, Kim J-D, Saetre R, Pyysalo S, Ohta T, Tsujii J
- Journal Title
  
  Journal of Bioinformatics and Computational Biology (JBCB)
  
  Volume: 8(5) Pages: 901-916
- Peer Reviewed
[Journal Article] A Re-Evaluation of Biomedical Named Entity-Term Relations.2010
- Author(s)
  Ohta T, Pyysalo S, Kim J-D, Tsujii J
- Journal Title
  
  Journal of Bioinformatics and Computational Biology (JBCB)
  
  Volume: 8(5) Pages: 917-928
- Peer Reviewed
[Journal Article] Event Extraction for DNA Methylation.2010
- Author(s)
  Ohta T, Pyysalo S, Miwa M, Tsujii J
- Journal Title
  
  Proceedings of the fourth International Symposium for Semantic Mining in Biomedicine (SMBM 2010)
  
  Pages: 48-56
- Peer Reviewed
[Journal Article] An Analysis of Gene/Protein Associations at PubMed Scale.2010
- Author(s)
  Pyysalo S, Ohta T, Tsujii J
- Journal Title
  
  Proceedings of the fourth International Symposium for Semantic Mining in Biomedicine (SMBM 2010)
  
  Pages: 57-65
- Peer Reviewed
[Journal Article] Easy and Instantaneous Processing for Data-Intensive Workflows.2010
- Author(s)
  Dun N, Taura K
- Journal Title
  
  Proceedings of the 3rd IEEE Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS 2010)
  
  Pages: 1-10
- Peer Reviewed
[Journal Article] Design and Implementation of GXP make---a Workflow System Based on Make.2010
- Author(s)
  Kenjiro T, Matsuzaki T, Miwa M, Kamoshida Y, Yokoyama D, Dun N, Shibata T, Choi S-J, Tsujii J
- Journal Title
  
  Proceedings of the 2010 IEEE 6th International Conference on e-Science
  
  Pages: 214-221
- Peer Reviewed
[Presentation] 表現から意味へ:言語処理技術と言語の科学2011
- Author(s)
  辻井潤一
- Organizer
  言語処理学会第17回年次大会
- Place of Presentation
  愛知県豊橋市豊橋技術科学大学(特別講演)
- Year and Date
  20110308-20110310
[Presentation] 日本語言語資源の統合的相互運用2011
- Author(s)
  狩野芳伸, 橋田浩一
- Organizer
  言語処理学会第17回年次大会
- Place of Presentation
  愛知県豊橋市豊橋技術科学大学
- Year and Date
  20110308-20110310
[Presentation] ツリーバンキングのための文法枠組みに関する考察2011
- Author(s)
  王向莉, 松崎拓也, 宮尾祐介, Kun Yu, 李元, 辻井潤一
- Organizer
  言語処理学会第17回年次大会
- Place of Presentation
  愛知県豊橋市豊橋技術科学大学
- Year and Date
  20110308-20110310
[Presentation] A Term Translation System Using Hierarchical Phrases and Morphemes.2011
- Author(s)
  Wu XC, Tshjii J
- Organizer
  言語処理学会第17回年次大会
- Place of Presentation
  愛知県豊橋市豊橋技術科学大学
- Year and Date
  20110308-20110310
[Presentation] Token Boundaries or Named Entity Boundaries.2011
- Author(s)
  Cho H-C, Okazaki N, Tsujii J
- Organizer
  言語処理学会第17回年次大会
- Place of Presentation
  愛知県豊橋市豊橋技術科学大学
- Year and Date
  20110308-20110310
[Presentation] 日本語格解析において問題となり得る諸現象の定量的分析2011
- Author(s)
  花岡洋輝, 松崎拓也, 宮尾祐介, 辻井潤一
- Organizer
  言語処理学会第17回年次大会
- Place of Presentation
  愛知県豊橋市豊橋技術科学大学
- Year and Date
  20110308-20110310
[Presentation] Getting the Deep Parse of Chinese.2011
- Author(s)
  Yu K, Miyao Y, Matsuzaki T, Wang XLi, Tsujii J
- Organizer
  言語処理学会第17回年次大会
- Place of Presentation
  愛知県豊橋市豊橋技術科学大学
- Year and Date
  20110308-20110310
[Presentation] 自動構築した大規模訓練データを用いた固有名抽出2011
- Author(s)
  宇佐美佑, Cho H-C, 岡崎直観, 辻井潤一
- Organizer
  言語処理学会第17回年次大会
- Place of Presentation
  愛知県豊橋市豊橋技術科学大学
- Year and Date
  20110308-20110310
[Presentation] Computational Linguistics and Natural Language Processing2011
- Author(s)
  Tsujii J
- Organizer
  the 12th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2011), Key Note
- Place of Presentation
  早稲田大学、東京都
- Year and Date
  20110220-20110226
[Presentation] The Perspectives of BioNLP Shared Tasks and GENIA2010
- Author(s)
  Tsujii J
- Organizer
  BioCreative III Workshop, Key Note
- Place of Presentation
  Double Tree Hotel, Bethesda, Maryland, USA
- Year and Date
  20100913-20100915
[Book] Probabilistic Context-Free Grammars with Latent Annotations. In Srinivas Bangalore and Aravind K.Joshi (Eds.), "Supertagging-Using Complex Lexical Descriptions in Natural Language Processing."2010
- Author(s)
  Matsuzaki, Takuya, Yusuke Miyao, Jun'ichi Tsujii.
- Total Pages
  337-354
- Publisher
  MIT Press
[Remarks]
- URL
  http://www-tsujii.is.s.u-tokyo.ac.jp/index-j.html

2010 Fiscal Year Annual Research Report

高度言語理解のための意味・知識処理の基盤技術に関する研究

Principal Investigator

辻井 潤一 東京大学, 大学院・情報理工学系研究科, 教授 (20026313)

Research Products

[Journal Article] Effective use of Dependency Structure for Bilingual Lexicon Creation.2011

Author(s)

Journal Title

[Journal Article] Multi-Topical Discussion Summarization using Structured Lexical Chains and Cue Words.2011

Author(s)

Journal Title

[Journal Article] Named Entity Recognition for Bacterial Type IV Secretion Systems.2011

Author(s)

Journal Title

[Journal Article] Robust Measurement and Comparison of Context Similarity for Finding Translation Pairs.2010

Author(s)

Journal Title

[Journal Article] Evaluating Dependency Representation for Event Extraction.2010

Author(s)

Journal Title

[Journal Article] Entity-Focused Sentence Simplification for Relation Extraction.2010

Author(s)

Journal Title

[Journal Article] Semi-automatically Developing Chinese HPSG Grammar from the Penn Chinese Treebank for Deep Parsing.2010

Author(s)

Journal Title

[Journal Article] Simple and Efficient Algorithm for Approximate Dictionary Matching.2010

Author(s)

Journal Title

[Journal Article] Forest-guided Supertagger Training.2010

Author(s)

Journal Title

[Journal Article] The Gene Normalization and Intractive Systems of the University of Tokyo in the BioCreative III Challenge.2010

Author(s)

Journal Title

[Journal Article] Entities, Relations, Events : Representing Biomolecular Semantics.2010

Author(s)

Journal Title

[Journal Article] MEDIE and Info-PubMed : 2010 Update.2010

Author(s)

Journal Title

[Journal Article] Text Mining Meets Workflow : Linking U-Compare with Taverna.2010

Author(s)

Journal Title

[Journal Article] Improving the Inter-corpora Compatibility for Protein Annotations.2010

Author(s)

Journal Title

[Journal Article] A Re-Evaluation of Biomedical Named Entity-Term Relations.2010

Author(s)

Journal Title

[Journal Article] Event Extraction for DNA Methylation.2010

Author(s)

Journal Title

[Journal Article] An Analysis of Gene/Protein Associations at PubMed Scale.2010

Author(s)

Journal Title

[Journal Article] Easy and Instantaneous Processing for Data-Intensive Workflows.2010

Author(s)

Journal Title

[Journal Article] Design and Implementation of GXP make---a Workflow System Based on Make.2010

Author(s)

Journal Title

[Presentation] 表現から意味へ:言語処理技術と言語の科学2011

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 日本語言語資源の統合的相互運用2011

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] ツリーバンキングのための文法枠組みに関する考察2011

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] A Term Translation System Using Hierarchical Phrases and Morphemes.2011

Author(s)

Organizer

辻井潤一東京大学, 大学院・情報理工学系研究科, 教授 (20026313)