• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Japanese semantic analysis using balanced corpus of contemporary Written Japanese

Planned Research

Project AreaCompilation of a balanced corpus of written Japanese: Infrastructure for the coming Japanese linguistics
Project/Area Number 18061003
Research Category

Grant-in-Aid for Scientific Research on Priority Areas

Allocation TypeSingle-year Grants
Review Section Humanities and Social Sciences
Research InstitutionTokyo Institute of Technology

Principal Investigator

OKUMURA Manabu  Tokyo Institute of Technology, 精密工学研究所, 教授 (60214079)

Co-Investigator(Kenkyū-buntansha) SHIRAI Kiyoaki  北陸先端科学技術大学院大学, 情報科学研究科, 准教授 (30302970)
SHINNOU Hiroyuki  茨城大学, 工学部, 准教授 (10250987)
TAKAMURA Hiroya  東京工業大学, 精密工学研究所, 准教授 (80361773)
TAKEUCHI Kouichi  岡山大学, 自然科学研究科, 講師 (80311174)
SASAKI Minoru  茨城大学, 工学部, 講師 (60344834)
NAKAMURA Makoto  北陸先端科学技術大学院大学, 情報科学研究科, 助教 (50377438)
Project Period (FY) 2006 – 2010
Project Status Completed (Fiscal Year 2010)
Budget Amount *help
¥84,700,000 (Direct Cost: ¥84,700,000)
Fiscal Year 2010: ¥18,400,000 (Direct Cost: ¥18,400,000)
Fiscal Year 2009: ¥18,400,000 (Direct Cost: ¥18,400,000)
Fiscal Year 2008: ¥18,400,000 (Direct Cost: ¥18,400,000)
Fiscal Year 2007: ¥18,400,000 (Direct Cost: ¥18,400,000)
Fiscal Year 2006: ¥11,100,000 (Direct Cost: ¥11,100,000)
Keywords語義タグ付コーパス / 単語の新語義発見 / 機械学習 / 語彙概念構造 / クラスタリング / 多義性解消 / 新語義発見 / 代表性
Research Abstract

1) We constructed a corpus with word-sense annotation, based on the balanced contemporary corpus of written Japanese.
2) We organized the SemEval-2 Japanese Word Sense Disambiguation (WSD) task by using the corpus that we constructed in 1). Nine systems from four organizations participated in the task.
3) We showed that when domain adaptation for WSD (word sense disambiguation) was performed, the most effective domain adaptation method varies according to the properties of the source data and target data. We also presented the way to select the most effective method for domain adaptation depending on these properties using decision tree learning. The average accuracy of WSD showed significant improvement when the domain adaptation method which is selected automatically was used respectively, compared to when the original methods were used collectively.
4) We proposed a supervised word sense disambiguation (WSD) system that uses features obtained from clustering results of word instances. … More Our approach is novel in that we employ semi-supervised clustering that controls the fluctuation of the centroid of a cluster, and we select seed instances by considering the frequency distribution of word senses and exclude outliers when we introduce "must-link" constraints between seed instances. In addition, we improved the supervised WSD accuracy by using features computed from word instances in clusters generated by the semi-supervised clustering.
5) We proposed a method of detecting new word senses in a corpus. It consists of two procedures : (A) clusters of word instances are constructed so that the instances of the same sense are merged, (B) then similarity between a cluster and a sense in a dictionary is measured in order to determine senses of instances in each cluster.
6) We proposed the method to detect peculiar examples of the target word from a corpus. Our method is to combine the density based method, Local Outlier Factor (LOF), and One Class SVM, which are representative outlier detection methods in the data mining domain. Our method improved precision and recall of LOF and One Class SVM. And we show that our method can detect new meanings by using the noun 'midori (green)'.
7) We presented a co-clustering-based verb synonym extraction approach that increases the number of extracted meanings of polysemous verbs from a large text corpus. Our proposed approach can extract the different meanings of polysemous verbs by recursively eliminating the extracted clusters from the initial data set. The experimental results of verb synonym extraction show that the proposed approach increases the correct verb clusters by about 50% with a 0.9% increase in precision and a 1.5% increase in recall over the previous approach. Less

Report

(7 results)
  • 2010 Annual Research Report   Final Research Report ( PDF )
  • 2009 Annual Research Report
  • 2008 Annual Research Report   Self-evaluation Report ( PDF )
  • 2007 Annual Research Report
  • 2006 Annual Research Report
  • Research Products

    (40 results)

All 2011 2010 2009 2008 2007 Other

All Journal Article (8 results) (of which Peer Reviewed: 3 results) Presentation (28 results) Remarks (4 results)

  • [Journal Article] On SemEval-2010 Japanese WSD Task2011

    • Author(s)
      Manabu Okumura, Kiyoaki Shirai, Kanako Komiya, Hikaru Yokono
    • Journal Title

      自然言語処理 Vol.18, No.3

    • NAID

      130000969397

    • Related Report
      2010 Final Research Report
    • Peer Reviewed
  • [Journal Article] Co-clustering with Recursive Elimination for Verb Synonym Extraction from Large Text Corpus2009

    • Author(s)
      Koichi Takeuchi, Hideyuki Takahashi
    • Journal Title

      IEICE Transactions on Information and Systems Vol.E92-D, No.12

      Pages: 2334-2340

    • NAID

      10026812417

    • Related Report
      2010 Final Research Report 2009 Annual Research Report
    • Peer Reviewed
  • [Journal Article] 代表性のあるコーパスを利用した日本語意味解析2009

    • Author(s)
      奥村学, 白井清昭
    • Journal Title

      人工知能学会誌 Vol.24, No.5

      Pages: 673-680

    • Related Report
      2009 Annual Research Report
  • [Journal Article] コーパスにおける語の意味の自動識別2009

    • Author(s)
      白井清昭
    • Journal Title

      国文学 解釈と鑑賞 Vol. 74, No. 1

      Pages: 61-69

    • Related Report
      2008 Self-evaluation Report
  • [Journal Article] コーパスにおける語の意味の自動識別2009

    • Author(s)
      白井清昭
    • Journal Title

      国文学解釈と鑑賞 Vol.74, No.1

      Pages: 61-69

    • Related Report
      2008 Annual Research Report
  • [Journal Article] Analysis of Eye Movements and Linguistic Boundaries in a Text for the Investigation of Japanese Reading Processes.IEICE Transaction on Information and Systems, Special Issue on Knowledge2008

    • Author(s)
      Akemi Tera, Kiyoaki Shirai, Takaya Yuizono, Kozo Sugiyama.
    • Journal Title

      Information and Creativity Support System Vol.E91-D, No.11

      Pages: 2560-2567

    • Related Report
      2010 Final Research Report
    • Peer Reviewed
  • [Journal Article] 現代日本語書き言葉均衡コーパスを用いた意味解析-語義の自動特定, 新語義の発見-2008

    • Author(s)
      奥村 学, 白井清昭
    • Journal Title

      言語 Vol.37,No.8

      Pages: 66-73

    • Related Report
      2008 Self-evaluation Report
  • [Journal Article] 現代日本語書き言葉均衡コーパスを用いた意味解析-語義の自動特定,新語義の発見-2008

    • Author(s)
      奥村学, 白井清昭
    • Journal Title

      言語 Vol.37, No.8

      Pages: 66-73

    • Related Report
      2008 Annual Research Report
  • [Presentation] 教師付き外れ値検出による新語義の発見2011

    • Author(s)
      新納浩幸, 佐々木稔
    • Organizer
      言語処理学会第17回年次大会
    • Place of Presentation
      豊橋
    • Year and Date
      2011-03-10
    • Related Report
      2010 Annual Research Report
  • [Presentation] 距離学習に基づく語義識別の性能分析2011

    • Author(s)
      佐々木稔, 新納浩幸
    • Organizer
      言語処理学会第17回年次大会
    • Place of Presentation
      豊橋
    • Year and Date
      2011-03-09
    • Related Report
      2010 Annual Research Report
  • [Presentation] 複数の観点から定義された用例間類似度に基づく語義識別2011

    • Author(s)
      中西隆一郎, 白井清昭, 中村誠
    • Organizer
      言語処理学会第17回年次大会
    • Place of Presentation
      豊橋
    • Year and Date
      2011-03-09
    • Related Report
      2010 Annual Research Report
  • [Presentation] 分類器の確信度を用いた合議制による語義曖昧性解消の領域適応2011

    • Author(s)
      古宮嘉那子, 奥村学
    • Organizer
      言語処理学会第17回年次大会
    • Place of Presentation
      豊橋
    • Year and Date
      2011-03-09
    • Related Report
      2010 Annual Research Report
  • [Presentation] Document Clustering Using Semantic Relationship Between Target Documents And Related Documents2010

    • Author(s)
      Minoru Sasaki, Hiroyuki Shinnou
    • Organizer
      The Fourth International Conference on Advances in Semantic Processing
    • Place of Presentation
      Florence.
    • Year and Date
      2010-10-27
    • Related Report
      2010 Final Research Report
  • [Presentation] Document Clustering Using Semantic Relationship Between Target Documents And Related Documents2010

    • Author(s)
      Minoru Sasaki, Hiroyuki Shinnou
    • Organizer
      The Fourth International Conference on Advances in Semantic Processing
    • Place of Presentation
      Florence, Italy
    • Year and Date
      2010-10-27
    • Related Report
      2010 Annual Research Report
  • [Presentation] グラフに基づくクラスタリングによる動詞類義語の獲得2010

    • Author(s)
      竹内孔一, 高橋秀幸, 小林大介
    • Organizer
      言語理解とコミュニケーション研究会
    • Place of Presentation
      機械振興会館
    • Year and Date
      2010-10-23
    • Related Report
      2010 Annual Research Report
  • [Presentation] 語義曖昧性解消のための領域適応手法の自動選択2010

    • Author(s)
      古宮嘉那子, 奥村学
    • Organizer
      情報処理学会自然言語処理研究会
    • Place of Presentation
      国立情報学研究所
    • Year and Date
      2010-09-16
    • Related Report
      2010 Annual Research Report
  • [Presentation] A Thesaurus of Predicate-Argument Structure for Japanese Verbs to Deal with Granularity of Verb Meanings2010

    • Author(s)
      Koichi Takeuchi, Kentaro Inui, Nao Takeuchi, Atsushi Fujita
    • Organizer
      The 8th Workshop on Asian Language Resources
    • Place of Presentation
      Beijing.
    • Year and Date
      2010-08-21
    • Related Report
      2010 Final Research Report
  • [Presentation] A Thesaurus of Predicate-Argument Structure for Japanese Verbs to Deal with Granularity of Verb Meanings2010

    • Author(s)
      Koichi Takeuchi, Kentaro Inui, Nao Takeuchi, Atsushi Fujita
    • Organizer
      The 8th Workshop on Asian Language Resources
    • Place of Presentation
      Beijing
    • Year and Date
      2010-08-21
    • Related Report
      2010 Annual Research Report
  • [Presentation] SemEval-2010 Task: Japanese WSD.2010

    • Author(s)
      Manabu Okumura, Kiyoaki Shirai, Kanako Komiya, Hikaru Yokono.
    • Organizer
      The 5th International Workshop on Semantic Evaluation, pp.67-74
    • Place of Presentation
      Uppsala.
    • Year and Date
      2010-07-15
    • Related Report
      2010 Final Research Report
  • [Presentation] JAIST: Clustering and Classification Based Approaches for Japanese WSD.2010

    • Author(s)
      Kiyoaki Shirai, Makoto Nakamura.
    • Organizer
      The 5th International Workshop on Semantic Evaluation, pp.379-382
    • Place of Presentation
      Uppsala.
    • Year and Date
      2010-07-15
    • Related Report
      2010 Final Research Report
  • [Presentation] Detection of Peculiar Examples using LOF and One Class SVM2010

    • Author(s)
      Hiroyuki Shinnou, Minoru Sasaki
    • Organizer
      LREC-2010
    • Place of Presentation
      Malta.
    • Year and Date
      2010-05-21
    • Related Report
      2010 Final Research Report
  • [Presentation] Detection of Peculiar Examples using LOF and One Class SVM2010

    • Author(s)
      Hiroyuki Shinnou, Minoru Sasaki
    • Organizer
      LREC-2010
    • Place of Presentation
      Malta
    • Year and Date
      2010-05-21
    • Related Report
      2010 Annual Research Report
  • [Presentation] Webディレクトリを利用した意味的関連語集合の作成2010

    • Author(s)
      佐々木稔, 三上健太, 新納浩幸
    • Organizer
      言語処理学会第16回年次大会
    • Place of Presentation
      東京大学
    • Year and Date
      2010-03-11
    • Related Report
      2009 Annual Research Report
  • [Presentation] Webディレクトリを利用した名詞のジャンルベクトルの作成2010

    • Author(s)
      林華, 新納浩幸, 佐々木稔
    • Organizer
      言語処理学会第16回年次大会
    • Place of Presentation
      東京大学
    • Year and Date
      2010-03-10
    • Related Report
      2009 Annual Research Report
  • [Presentation] LOFと One Class SVMを用いた特異用例の検出2010

    • Author(s)
      新納浩幸, 佐々木稔
    • Organizer
      言語処理学会第16回年次大会
    • Place of Presentation
      東京大学
    • Year and Date
      2010-03-10
    • Related Report
      2009 Annual Research Report
  • [Presentation] 名詞の主要語義の推定と語義識別への応用2010

    • Author(s)
      江口晃, 新納浩幸, 佐々木稔
    • Organizer
      言語処理学会第16回年次大会
    • Place of Presentation
      東京大学
    • Year and Date
      2010-03-10
    • Related Report
      2009 Annual Research Report
  • [Presentation] Manabu Okumura, Semi-supervised Clustering for Word Instances and Its Effect on Word Sense Disambiguation2009

    • Author(s)
      Kazunari Sugiyama
    • Organizer
      The 10th International Conference on Intelligent Text Processing and Computational Linguistics(CICLing 2009)
    • Place of Presentation
      Mexico City
    • Year and Date
      2009-03-05
    • Related Report
      2008 Self-evaluation Report
  • [Presentation] Semi-supervised Clustering for Word Instances and Its Effect on Word Sense Disambiguation2009

    • Author(s)
      Kazunari Sugiyama, Manabu Okumura
    • Organizer
      The 10th International Conference on Inte lligent Text Processing and Computational Linguistics (CICLing 2009)
    • Place of Presentation
      Mexico City
    • Year and Date
      2009-03-05
    • Related Report
      2008 Annual Research Report
  • [Presentation] 新語義発見のための用例クラスタと辞書定義文の対応付け2009

    • Author(s)
      田中博貴, 中村誠, 白井清昭
    • Organizer
      第15回言語処理学会年次大会
    • Place of Presentation
      鳥取大学
    • Year and Date
      2009-03-04
    • Related Report
      2008 Annual Research Report
  • [Presentation] BCCWJを用いた新しい語義曖昧性解消タスク2009

    • Author(s)
      奥村学, 白井清昭
    • Organizer
      第15回言語処理学会轍大会
    • Place of Presentation
      鳥取大学
    • Year and Date
      2009-03-04
    • Related Report
      2008 Annual Research Report
  • [Presentation] 多義性を考慮した同時共起クラスタリングによる動詞の類語抽出2009

    • Author(s)
      高橋秀幸, 竹内孔一
    • Organizer
      電子情報通信学会,言語理解とコミュニケーション研究会
    • Place of Presentation
      倉敷芸文館
    • Year and Date
      2009-01-27
    • Related Report
      2008 Annual Research Report
  • [Presentation] Extraction of Verb Synonyms using Co-clustering Approach2008

    • Author(s)
      Koichi Takeuchi
    • Organizer
      Second International Symposium on Universal Communication (ISUC 2008)
    • Place of Presentation
      Osaka International Convention Center.
    • Year and Date
      2008-12-16
    • Related Report
      2008 Annual Research Report
  • [Presentation] 単語の用例の半教師有りクラスタリング2008

    • Author(s)
      杉山 一成, 奥村 学
    • Organizer
      情報処理学会自然言語処理研究会
    • Place of Presentation
      情報通信研究機構
    • Year and Date
      2008-03-27
    • Related Report
      2007 Annual Research Report
  • [Presentation] 用例のクラスタリング結果を利用した語義曖昧性解消手法2008

    • Author(s)
      杉山 一成, 奥村 学
    • Organizer
      言語処理学会第14回年次大会
    • Place of Presentation
      東京大学
    • Year and Date
      2008-03-19
    • Related Report
      2007 Annual Research Report
  • [Presentation] Extraction of Verb Synonyms using Co-clustering Approach2008

    • Author(s)
      Koichi Takeuchi
    • Organizer
      Second International Symposium on Universal Communication(ISUC 2008)
    • Place of Presentation
      Osaka
    • Related Report
      2008 Self-evaluation Report
  • [Presentation] Personal Name Disambiguation in Web Search Results Based on a Semi-Supervised Clustering Approach2007

    • Author(s)
      Kazunari Sugiyama, Manabu Okumura
    • Organizer
      Proc. of the 10th International Conference on Asian Digital Libraries, Lecture Notes in Computer Science(LNCS)(Springer Verlag)
    • Place of Presentation
      Hanoi
    • Related Report
      2008 Self-evaluation Report
  • [Remarks] BCCWJ を用いた新しい語義曖昧性解消タスク

    • URL

      http://oku-gw.pi.titech.ac.jp/wsd.html

    • Related Report
      2010 Final Research Report
  • [Remarks] 意味役割付与システムの公開

    • URL

      http://cl.cs.okayama-u.ac.jp/study/project/sea.html

    • Related Report
      2010 Final Research Report
  • [Remarks] 動詞の概念辞書の公開

    • URL

      http://cl.cs.okayama-u.ac.jp/rsc/data/index.html

    • Related Report
      2010 Final Research Report
  • [Remarks] ホームページ

    • URL

      http://oku-gw.pi.titech.ac.jp/wsd.html

    • Related Report
      2008 Self-evaluation Report

URL: 

Published: 2006-04-01   Modified: 2018-03-28  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi