• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Building Named Entity Recognizers by combining a large-scale lexicon and corpora

Research Project

Project/Area Number 23700159
Research Category

Grant-in-Aid for Young Scientists (B)

Allocation TypeMulti-year Fund
Research Field Intelligent informatics
Research InstitutionTohoku University

Principal Investigator

OKAZAKI Naoaki  東北大学, 情報科学研究科, 准教授 (50601118)

Project Period (FY) 2011 – 2012
Project Status Completed (Fiscal Year 2013)
Budget Amount *help
¥4,420,000 (Direct Cost: ¥3,400,000、Indirect Cost: ¥1,020,000)
Fiscal Year 2012: ¥1,820,000 (Direct Cost: ¥1,400,000、Indirect Cost: ¥420,000)
Fiscal Year 2011: ¥2,600,000 (Direct Cost: ¥2,000,000、Indirect Cost: ¥600,000)
Keywords自然言語処理 / 固有表現抽出 / 情報抽出 / 機械学習 / 知識獲得
Research Abstract

This research builds Named Entity Recognizers, which extract text mentions of entities or concepts of specific semantic classes (e.g., product names and disease names) from text, at a low cost. In order to achieve this goal, this project addresses three challenges: (1) automatic acquisition of training data with mentions annotated with semantic classes; (2) building Named Entity Recognizers from the automatically acquired training data; and (3) evaluating the Named Entity Recognizers. We proposed a method for improving the quality of automatically acquired training data by using reference information in the dictionary, and demonstrated its effectiveness through the experiments. We also proposed a method for mining context gazetteers, which are dependency paths appearing around expressions of the target semantic classes, and confirmed improvements of accuracy of Named Entity Recognizers.

Report

(4 results)
  • 2013 Annual Research Report   Final Research Report ( PDF )
  • 2012 Research-status Report
  • 2011 Research-status Report
  • Research Products

    (39 results)

All 2013 2012 2011 Other

All Journal Article (11 results) (of which Peer Reviewed: 9 results) Presentation (28 results)

  • [Journal Article] Named entity recognition with multiple segment representations2013

    • Author(s)
      Han-Cheol Cho, Naoaki Okazaki, Makoto Miwa, Jun'ichi Tsujii
    • Journal Title

      Information Processing & Management

      Volume: Vol.49, No.4 Issue: 4 Pages: 954-965

    • DOI

      10.1016/j.ipm.2013.03.002

    • Related Report
      2013 Annual Research Report 2013 Final Research Report
    • Peer Reviewed
  • [Journal Article] Learning Abbreviations from Chinese and English Terms by Modeling Non-local Information2013

    • Author(s)
      Xu Sun, Naoaki Okazaki, Junichi Tsujii, Houfeng Wang
    • Journal Title

      ACM Transactions on Asian Language Information Processing

      Volume: Vol.12, No.2 Issue: 2 Pages: 1-17

    • DOI

      10.1145/2461316.2461317

    • Related Report
      2013 Final Research Report 2012 Research-status Report
    • Peer Reviewed
  • [Journal Article] Extracting False Information on Twitter and Analyzing its Diffusion Processes by using Linguistic Patterns for Correction2013

    • Author(s)
      鍋島啓太, 渡邉研斗, 水野淳太, 岡崎直観, 乾健太郎
    • Journal Title

      Journal of Natural Language Processing

      Volume: 20 Issue: 3 Pages: 461-484

    • DOI

      10.5715/jnlp.20.461

    • NAID

      10031174541

    • ISSN
      1340-7619, 2185-8314
    • Related Report
      2013 Final Research Report
    • Peer Reviewed
  • [Journal Article] Set Expansion Using Sibling Relationships Between Semantic Categories2013

    • Author(s)
      高瀬翔, 岡崎直観, 乾健太郎
    • Journal Title

      Journal of Natural Language Processing

      Volume: 20 Issue: 2 Pages: 273-296

    • DOI

      10.5715/jnlp.20.273

    • NAID

      10031174534

    • ISSN
      1340-7619, 2185-8314
    • Related Report
      2013 Annual Research Report 2013 Final Research Report
    • Peer Reviewed
  • [Journal Article] 言語処理による分析 - 日本栄養士会活動報告の分析2012

    • Author(s)
      岡崎直観, 鍋島啓太, 乾健太郎
    • Journal Title

      日本栄養士会雑誌

      Volume: Vol.55, No.12 Pages: 6-8

    • Related Report
      2013 Final Research Report
  • [Journal Article] A preference learning approach to sentence ordering for multi-document summarization2012

    • Author(s)
      Danushka Bollegala, Naoaki Okazaki, Mitsuru Ishizuka
    • Journal Title

      Information Sciences

      Volume: 217 Pages: 78-95

    • DOI

      10.1016/j.ins.2012.06.015

    • Related Report
      2012 Research-status Report
    • Peer Reviewed
  • [Journal Article] Leveraging Diverse Lexical Resources for Textual Entailment Recognition2012

    • Author(s)
      Yotaro Watanabe, Junta Mizuno, Eric Nichols, Katsuma Narisawa, Keita Nabeshima, Naoaki Okazaki and Kentaro Inui
    • Journal Title

      ACM Transactions on Asian Language Information Processing (TALIP)

      Volume: Vol. 11, No. 4 Issue: 4 Pages: 1-22

    • DOI

      10.1145/2382593.2382600

    • URL

      http://dl.acm.org/citation.cfm?id=2382600

    • Related Report
      2012 Research-status Report
    • Peer Reviewed
  • [Journal Article] 言語処理による分析 ― 日本栄養士会活動報告の分析2012

    • Author(s)
      岡崎直観, 鍋島啓太, 乾健太郎
    • Journal Title

      日本栄養士会雑誌

      Volume: 55 Pages: 6-8

    • Related Report
      2012 Research-status Report
  • [Journal Article] A Simple and Fast Algorithm for Approximate String Matching with Set Similarity2011

    • Author(s)
      岡崎直観, 辻井潤一
    • Journal Title

      Journal of Natural Language Processing

      Volume: 18 Issue: 2 Pages: 89-117

    • DOI

      10.5715/jnlp.18.89

    • NAID

      10029062875

    • ISSN
      1340-7619, 2185-8314
    • Related Report
      2013 Final Research Report 2011 Research-status Report
    • Peer Reviewed
  • [Journal Article] The gene normalization task in BioCreative III2011

    • Author(s)
      Zhiyong Lu, ..., Naoaki Okazaki, ..., John W. Wilbur
    • Journal Title

      BMC Bioinformatics

      Volume: 12 Issue: S8

    • DOI

      10.1186/1471-2105-12-s8-s9

    • Related Report
      2011 Research-status Report
    • Peer Reviewed
  • [Journal Article] BioCreative III interactive task: an overview2011

    • Author(s)
      Cecilia N Arighi, ..., Naoaki Okazaki, ..., Cathy H Wu
    • Journal Title

      BMC Bioinformatics

      Volume: 12 Issue: S8

    • DOI

      10.1186/1471-2105-12-s8-s8

    • Related Report
      2011 Research-status Report
    • Peer Reviewed
  • [Presentation] ウェブ文書の構造を利用した場所名・住所ペアの獲得2013

    • Author(s)
      佐藤貴大, 岡崎直観, 乾健太郎
    • Organizer
      第27回人工知能学会全国大会 (JSAI2013)
    • Place of Presentation
      富山国際会議場(富山県)
    • Related Report
      2013 Final Research Report
  • [Presentation] Inducing Context Gazetteers from Encyclopedic Database for Named Entity Recognition2013

    • Author(s)
      Han-Cheol Cho, Naoaki Okazaki, Kentaro Inui
    • Organizer
      Proceedings of the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2013), pp.378-389
    • Place of Presentation
      Gold Coast, Australia
    • Related Report
      2013 Final Research Report
  • [Presentation] Exploiting Dependency Context Gazetteers for Named Entity Recognition2013

    • Author(s)
      Han-Cheol Cho, Naoaki Okazaki, Kentaro Inui
    • Organizer
      言語処理学会第19回年次大会(NLP2013), pp. 220-223
    • Place of Presentation
      名古屋大学(愛知県)
    • Related Report
      2013 Final Research Report
  • [Presentation] Inducing Context Gazetteers from Encyclopedic Database for Named Entity Recognition2013

    • Author(s)
      Han-Cheol Cho, Naoaki Okazaki, Kentaro Inui
    • Organizer
      17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2013)
    • Place of Presentation
      Surfers Paradise Marriott, Gold Coast, Australia
    • Related Report
      2013 Annual Research Report
  • [Presentation] ウェブ文書の構造を利用した場所名・住所ペアの獲得2013

    • Author(s)
      佐藤貴大, 岡崎直観, 乾健太郎
    • Organizer
      第27回人工知能学会全国大会
    • Place of Presentation
      富山国際会議場
    • Related Report
      2013 Annual Research Report
  • [Presentation] Set Expansion using Sibling Relations between Semantic Categories2012

    • Author(s)
      Sho Takase, Naoaki Okazaki, Kentaro Inui
    • Organizer
      Proceedings of the 26th Pacific Asia Conference on Language,Information and Computation (PACLIC 26), pp.567-576
    • Place of Presentation
      Bali, Indonesia
    • Year and Date
      2012-11-09
    • Related Report
      2013 Final Research Report
  • [Presentation] 名詞カテゴリからの関係知識獲得に向けて2012

    • Author(s)
      高瀬翔, 岡崎直観, 乾健太郎
    • Organizer
      NLP 若手の会 第7回シンポジウム
    • Place of Presentation
      東北大学(宮城県)
    • Related Report
      2013 Final Research Report
  • [Presentation] 意味カテゴリの階層関係を活用した集合拡張2012

    • Author(s)
      高瀬翔, 岡崎直観, 乾健太郎
    • Organizer
      言語処理学会第18回年次大会(NLP2012), pp.475-478
    • Place of Presentation
      広島市立大学(広島県)
    • Related Report
      2013 Final Research Report
  • [Presentation] 数量表現を伴う文における含意関係認識の課題分析2012

    • Author(s)
      成澤克麻, 渡邉陽太郎, 水野淳太, 岡崎直観, 乾健太郎
    • Organizer
      言語処理学会第18回年次大会
    • Place of Presentation
      広島市立大学
    • Related Report
      2011 Research-status Report
  • [Presentation] Web文書からの人の安全・危険に関わる情報の抽出2012

    • Author(s)
      岡崎直観, 成澤克麻, 乾健太郎
    • Organizer
      言語処理学会第18回年次大会
    • Place of Presentation
      広島市立大学
    • Related Report
      2011 Research-status Report
  • [Presentation] 英作文支援のための用例検索システムの構築2012

    • Author(s)
      高松優, 水野淳太, 岡崎直観, 乾健太郎
    • Organizer
      言語処理学会第18回年次大会
    • Place of Presentation
      広島市立大学
    • Related Report
      2011 Research-status Report
  • [Presentation] 冠詞誤り訂正時における訂正根拠の提示2012

    • Author(s)
      梅澤次郎, 水野淳太, 岡崎直観, 乾健太郎
    • Organizer
      言語処理学会第18回年次大会
    • Place of Presentation
      広島市立大学
    • Related Report
      2011 Research-status Report
  • [Presentation] 隠れ変数を持つ識別モデルによる文間意味関係の学習2012

    • Author(s)
      渡邉陽太郎, 水野淳太, 岡崎直観, 乾健太郎
    • Organizer
      言語処理学会第18回年次大会
    • Place of Presentation
      広島市立大学
    • Related Report
      2011 Research-status Report
  • [Presentation] 意味カテゴリの階層関係を活用した集合拡張2012

    • Author(s)
      高瀬翔, 岡崎直観, 乾健太郎
    • Organizer
      言語処理学会第18回年次大会
    • Place of Presentation
      広島市立大学
    • Related Report
      2011 Research-status Report
  • [Presentation] Automatic Acquisition of Huge Training Data for Bio-Medical Named Entity Recognition2011

    • Author(s)
      Yu Usami, Han-Cheol Cho, Naoaki Okazaki, Jun'ichi Tsujii
    • Organizer
      Proceedings of BioNLP 2011 Workshop, pp. 65-73
    • Place of Presentation
      Portland, Oregon, USA
    • Year and Date
      2011-06-23
    • Related Report
      2013 Final Research Report
  • [Presentation] Fast Newton-CG Method for Batch Learning of Conditional Random Fields2011

    • Author(s)
      Yuta Tsuboi, Yuya Unno, Hisashi Kashima, Naoaki Okazaki
    • Organizer
      Twenty-Fifth Conference on Artificial Intelligence (AAAI-11)
    • Place of Presentation
      San Francisco, California , USA
    • Related Report
      2011 Research-status Report
  • [Presentation] Automatic Acquisition of Huge Training Data for Bio-Medical Named Entity Recognition2011

    • Author(s)
      Yu Usami, Han-Cheol Cho, Naoaki Okazaki, Jun'ichi Tsujii
    • Organizer
      BioNLP 2011 Workshop
    • Place of Presentation
      Portland, Oregon, USA
    • Related Report
      2011 Research-status Report
  • [Presentation] Inducing Context Gazetteers from Encyclopedic Database for Named Entity Recognition

    • Author(s)
      Han-Cheol Cho, Naoaki Okazaki, Kentaro Inui
    • Organizer
      Proceedings of the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2013)
    • Place of Presentation
      Gold Coast, Australia
    • Related Report
      2012 Research-status Report
  • [Presentation] Evidence in Automatic Error Correction Improves Learners’ English Skill

    • Author(s)
      Jiro Umezawa, Junta Mizuno, Naoaki Okazaki, Kentaro Inui
    • Organizer
      Proceedings of the 14th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2013)
    • Place of Presentation
      Samos, Greece
    • Related Report
      2012 Research-status Report
  • [Presentation] Discriminative Learning of First-order Weighted Abduction from Partial Discourse Explanations

    • Author(s)
      Kazeto Yamamoto, Naoya Inoue, Yotaro Watanabe, Naoaki Okazaki, Kentaro Inui
    • Organizer
      Proceedings of the 14th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2013)
    • Place of Presentation
      Samos, Greece
    • Related Report
      2012 Research-status Report
  • [Presentation] Acquiring and Generalizing Causal Inference Rules from Deverbal Noun Constructions

    • Author(s)
      Shohei Tanaka, Naoaki Okazaki, Mitsuru Ishizuka. Acquiring and Generalizing Causal Inference Rules from Deverbal Noun Constructions
    • Organizer
      Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012)
    • Place of Presentation
      Mumbai, India
    • Related Report
      2012 Research-status Report
  • [Presentation] A Latent Discriminative Model for Compositional Entailment Relation Recognition Using Natural Logic

    • Author(s)
      Yotaro Watanabe, Junta Mizuno, Eric Nichols, Naoaki Okazaki, Kentaro Inui
    • Organizer
      Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012)
    • Place of Presentation
      Mumbai, India
    • Related Report
      2012 Research-status Report
  • [Presentation] Set Expansion using Sibling Relations between Semantic Categories

    • Author(s)
      Sho Takase, Naoaki Okazaki, Kentaro Inui
    • Organizer
      Proceedings of the 26th Pacific Asia Conference on Language,Information and Computation (PACLIC 26)
    • Place of Presentation
      Bali, Indonesia
    • Related Report
      2012 Research-status Report
  • [Presentation] Exploiting Dependency Context Gazetteers for Named Entity Recognition

    • Author(s)
      Han-Cheol Cho, Naoaki Okazaki, Kentaro Inui
    • Organizer
      言語処理学会第19回年次大会
    • Place of Presentation
      名古屋
    • Related Report
      2012 Research-status Report
  • [Presentation] ソーシャルメディア上の発言とユーザー間の関係を利用した批判的ユーザーの抽出

    • Author(s)
      高瀬翔, 村上明子, 榎美紀, 岡崎直観, 乾健太郎
    • Organizer
      言語処理学会第19回年次大会
    • Place of Presentation
      名古屋
    • Related Report
      2012 Research-status Report
  • [Presentation] 数量の大小の自動判定:「彼は身長が2mある」は高いか低いか

    • Author(s)
      成澤克麻, 渡邉陽太郎, 水野淳太, 岡崎直観, 乾健太郎
    • Organizer
      言語処理学会第19回年次大会
    • Place of Presentation
      名古屋
    • Related Report
      2012 Research-status Report
  • [Presentation] マイクロブログユーザからの現地被災者抽出の技術的支援

    • Author(s)
      水野淳太, 岡崎直観, 乾健太郎
    • Organizer
      情報処理学会第75回全国大会
    • Place of Presentation
      仙台
    • Related Report
      2012 Research-status Report
  • [Presentation] Online Large-margin Weight Learning for First-order Logic-based Abduction

    • Author(s)
      Naoya Inoue, Kazeto Yamamoto, Yotaro Watanabe, Naoaki Okazaki, Kentaro Inui
    • Organizer
      第15回情報論的学習理論ワークショップ (IBISML)
    • Place of Presentation
      筑波大学東京キャンパス文京校舎
    • Related Report
      2012 Research-status Report

URL: 

Published: 2011-08-05   Modified: 2019-07-29  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi