• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2010 Fiscal Year Final Research Report

Japanese semantic analysis using balanced corpus of contemporary Written Japanese

Planned Research

  • PDF
Project AreaCompilation of a balanced corpus of written Japanese: Infrastructure for the coming Japanese linguistics
Project/Area Number 18061003
Research Category

Grant-in-Aid for Scientific Research on Priority Areas

Allocation TypeSingle-year Grants
Review Section Humanities and Social Sciences
Research InstitutionTokyo Institute of Technology

Principal Investigator

OKUMURA Manabu  Tokyo Institute of Technology, 精密工学研究所, 教授 (60214079)

Co-Investigator(Kenkyū-buntansha) SHIRAI Kiyoaki  北陸先端科学技術大学院大学, 情報科学研究科, 准教授 (30302970)
SHINNOU Hiroyuki  茨城大学, 工学部, 准教授 (10250987)
TAKAMURA Hiroya  東京工業大学, 精密工学研究所, 准教授 (80361773)
TAKEUCHI Kouichi  岡山大学, 自然科学研究科, 講師 (80311174)
SASAKI Minoru  茨城大学, 工学部, 講師 (60344834)
NAKAMURA Makoto  北陸先端科学技術大学院大学, 情報科学研究科, 助教 (50377438)
Project Period (FY) 2006 – 2010
Keywords語義タグ付コーパス / 単語の新語義発見 / 機械学習 / 語彙概念構造 / クラスタリング
Research Abstract

1) We constructed a corpus with word-sense annotation, based on the balanced contemporary corpus of written Japanese.
2) We organized the SemEval-2 Japanese Word Sense Disambiguation (WSD) task by using the corpus that we constructed in 1). Nine systems from four organizations participated in the task.
3) We showed that when domain adaptation for WSD (word sense disambiguation) was performed, the most effective domain adaptation method varies according to the properties of the source data and target data. We also presented the way to select the most effective method for domain adaptation depending on these properties using decision tree learning. The average accuracy of WSD showed significant improvement when the domain adaptation method which is selected automatically was used respectively, compared to when the original methods were used collectively.
4) We proposed a supervised word sense disambiguation (WSD) system that uses features obtained from clustering results of word instances. … More Our approach is novel in that we employ semi-supervised clustering that controls the fluctuation of the centroid of a cluster, and we select seed instances by considering the frequency distribution of word senses and exclude outliers when we introduce "must-link" constraints between seed instances. In addition, we improved the supervised WSD accuracy by using features computed from word instances in clusters generated by the semi-supervised clustering.
5) We proposed a method of detecting new word senses in a corpus. It consists of two procedures : (A) clusters of word instances are constructed so that the instances of the same sense are merged, (B) then similarity between a cluster and a sense in a dictionary is measured in order to determine senses of instances in each cluster.
6) We proposed the method to detect peculiar examples of the target word from a corpus. Our method is to combine the density based method, Local Outlier Factor (LOF), and One Class SVM, which are representative outlier detection methods in the data mining domain. Our method improved precision and recall of LOF and One Class SVM. And we show that our method can detect new meanings by using the noun 'midori (green)'.
7) We presented a co-clustering-based verb synonym extraction approach that increases the number of extracted meanings of polysemous verbs from a large text corpus. Our proposed approach can extract the different meanings of polysemous verbs by recursively eliminating the extracted clusters from the initial data set. The experimental results of verb synonym extraction show that the proposed approach increases the correct verb clusters by about 50% with a 0.9% increase in precision and a 1.5% increase in recall over the previous approach. Less

  • Research Products

    (11 results)

All 2011 2010 2009 2008 Other

All Journal Article (3 results) (of which Peer Reviewed: 3 results) Presentation (5 results) Remarks (3 results)

  • [Journal Article] On SemEval-2010 Japanese WSD Task2011

    • Author(s)
      Manabu Okumura, Kiyoaki Shirai, Kanako Komiya, Hikaru Yokono
    • Journal Title

      自然言語処理 Vol.18, No.3

    • Peer Reviewed
  • [Journal Article] Co-clustering with Recursive Elimination for Verb Synonym Extraction from Large Text Corpus2009

    • Author(s)
      Koichi Takeuchi, Hideyuki Takahashi
    • Journal Title

      IEICE Transactions on Information and Systems Vol.E92-D, No.12

      Pages: 2334-2340

    • Peer Reviewed
  • [Journal Article] Analysis of Eye Movements and Linguistic Boundaries in a Text for the Investigation of Japanese Reading Processes.IEICE Transaction on Information and Systems, Special Issue on Knowledge2008

    • Author(s)
      Akemi Tera, Kiyoaki Shirai, Takaya Yuizono, Kozo Sugiyama.
    • Journal Title

      Information and Creativity Support System Vol.E91-D, No.11

      Pages: 2560-2567

    • Peer Reviewed
  • [Presentation] Document Clustering Using Semantic Relationship Between Target Documents And Related Documents2010

    • Author(s)
      Minoru Sasaki, Hiroyuki Shinnou
    • Organizer
      The Fourth International Conference on Advances in Semantic Processing
    • Place of Presentation
      Florence.
    • Year and Date
      2010-10-27
  • [Presentation] A Thesaurus of Predicate-Argument Structure for Japanese Verbs to Deal with Granularity of Verb Meanings2010

    • Author(s)
      Koichi Takeuchi, Kentaro Inui, Nao Takeuchi, Atsushi Fujita
    • Organizer
      The 8th Workshop on Asian Language Resources
    • Place of Presentation
      Beijing.
    • Year and Date
      2010-08-21
  • [Presentation] SemEval-2010 Task: Japanese WSD.2010

    • Author(s)
      Manabu Okumura, Kiyoaki Shirai, Kanako Komiya, Hikaru Yokono.
    • Organizer
      The 5th International Workshop on Semantic Evaluation, pp.67-74
    • Place of Presentation
      Uppsala.
    • Year and Date
      2010-07-15
  • [Presentation] JAIST: Clustering and Classification Based Approaches for Japanese WSD.2010

    • Author(s)
      Kiyoaki Shirai, Makoto Nakamura.
    • Organizer
      The 5th International Workshop on Semantic Evaluation, pp.379-382
    • Place of Presentation
      Uppsala.
    • Year and Date
      2010-07-15
  • [Presentation] Detection of Peculiar Examples using LOF and One Class SVM2010

    • Author(s)
      Hiroyuki Shinnou, Minoru Sasaki
    • Organizer
      LREC-2010
    • Place of Presentation
      Malta.
    • Year and Date
      2010-05-21
  • [Remarks] BCCWJ を用いた新しい語義曖昧性解消タスク

    • URL

      http://oku-gw.pi.titech.ac.jp/wsd.html

  • [Remarks] 意味役割付与システムの公開

    • URL

      http://cl.cs.okayama-u.ac.jp/study/project/sea.html

  • [Remarks] 動詞の概念辞書の公開

    • URL

      http://cl.cs.okayama-u.ac.jp/rsc/data/index.html

URL: 

Published: 2012-02-13   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi