Japanese semantic analysis using balanced corpus of contemporary Written Japanese

Planned Research

Project Area	Compilation of a balanced corpus of written Japanese: Infrastructure for the coming Japanese linguistics
Project/Area Number	18061003
Research Category	Grant-in-Aid for Scientific Research on Priority Areas
Allocation Type	Single-year Grants
Review Section	Humanities and Social Sciences
Research Institution	Tokyo Institute of Technology
Principal Investigator	OKUMURA Manabu Tokyo Institute of Technology, 精密工学研究所, 教授 (60214079)
Co-Investigator(Kenkyū-buntansha)	SHIRAI Kiyoaki 北陸先端科学技術大学院大学, 情報科学研究科, 准教授 (30302970) SHINNOU Hiroyuki 茨城大学, 工学部, 准教授 (10250987) TAKAMURA Hiroya 東京工業大学, 精密工学研究所, 准教授 (80361773) TAKEUCHI Kouichi 岡山大学, 自然科学研究科, 講師 (80311174) SASAKI Minoru 茨城大学, 工学部, 講師 (60344834) NAKAMURA Makoto 北陸先端科学技術大学院大学, 情報科学研究科, 助教 (50377438)
Project Period (FY)	2006 – 2010
Project Status	Completed (Fiscal Year 2010)
Budget Amount *help	¥84,700,000 (Direct Cost: ¥84,700,000) Fiscal Year 2010: ¥18,400,000 (Direct Cost: ¥18,400,000) Fiscal Year 2009: ¥18,400,000 (Direct Cost: ¥18,400,000) Fiscal Year 2008: ¥18,400,000 (Direct Cost: ¥18,400,000) Fiscal Year 2007: ¥18,400,000 (Direct Cost: ¥18,400,000) Fiscal Year 2006: ¥11,100,000 (Direct Cost: ¥11,100,000)
Keywords	語義タグ付コーパス / 単語の新語義発見 / 機械学習 / 語彙概念構造 / クラスタリング / 多義性解消 / 新語義発見 / 代表性
Research Abstract	1) We constructed a corpus with word-sense annotation, based on the balanced contemporary corpus of written Japanese. 2) We organized the SemEval-2 Japanese Word Sense Disambiguation (WSD) task by using the corpus that we constructed in 1). Nine systems from four organizations participated in the task. 3) We showed that when domain adaptation for WSD (word sense disambiguation) was performed, the most effective domain adaptation method varies according to the properties of the source data and target data. We also presented the way to select the most effective method for domain adaptation depending on these properties using decision tree learning. The average accuracy of WSD showed significant improvement when the domain adaptation method which is selected automatically was used respectively, compared to when the original methods were used collectively. 4) We proposed a supervised word sense disambiguation (WSD) system that uses features obtained from clustering results of word instances. … More Our approach is novel in that we employ semi-supervised clustering that controls the fluctuation of the centroid of a cluster, and we select seed instances by considering the frequency distribution of word senses and exclude outliers when we introduce "must-link" constraints between seed instances. In addition, we improved the supervised WSD accuracy by using features computed from word instances in clusters generated by the semi-supervised clustering. 5) We proposed a method of detecting new word senses in a corpus. It consists of two procedures : (A) clusters of word instances are constructed so that the instances of the same sense are merged, (B) then similarity between a cluster and a sense in a dictionary is measured in order to determine senses of instances in each cluster. 6) We proposed the method to detect peculiar examples of the target word from a corpus. Our method is to combine the density based method, Local Outlier Factor (LOF), and One Class SVM, which are representative outlier detection methods in the data mining domain. Our method improved precision and recall of LOF and One Class SVM. And we show that our method can detect new meanings by using the noun 'midori (green)'. 7) We presented a co-clustering-based verb synonym extraction approach that increases the number of extracted meanings of polysemous verbs from a large text corpus. Our proposed approach can extract the different meanings of polysemous verbs by recursively eliminating the extracted clusters from the initial data set. The experimental results of verb synonym extraction show that the proposed approach increases the correct verb clusters by about 50% with a 0.9% increase in precision and a 1.5% increase in recall over the previous approach. Less

Report

(7 results)

2010 Annual Research Report Final Research Report ( PDF )
2009 Annual Research Report
2008 Annual Research Report Self-evaluation Report ( PDF )
2007 Annual Research Report
2006 Annual Research Report

Research Products
(40 results)

All 2011 2010 2009 2008 2007 Other

All Journal Article (8 results) (of which Peer Reviewed: 3 results) Presentation (28 results) Remarks (4 results)

[Journal Article] On SemEval-2010 Japanese WSD Task2011
- Author(s)
  Manabu Okumura, Kiyoaki Shirai, Kanako Komiya, Hikaru Yokono
- Journal Title
  
  自然言語処理 Vol.18, No.3
- NAID
  130000969397
- Related Report
  2010 Final Research Report
- Peer Reviewed
[Journal Article] Co-clustering with Recursive Elimination for Verb Synonym Extraction from Large Text Corpus2009
- Author(s)
  Koichi Takeuchi, Hideyuki Takahashi
- Journal Title
  
  IEICE Transactions on Information and Systems Vol.E92-D, No.12
  
  Pages: 2334-2340
- NAID
  10026812417
- Related Report
  2010 Final Research Report 2009 Annual Research Report
- Peer Reviewed
[Journal Article] 代表性のあるコーパスを利用した日本語意味解析2009
- Author(s)
  奥村学, 白井清昭
- Journal Title
  
  人工知能学会誌 Vol.24, No.5
  
  Pages: 673-680
- Related Report
  2009 Annual Research Report
[Journal Article] コーパスにおける語の意味の自動識別2009
- Author(s)
  白井清昭
- Journal Title
  
  国文学解釈と鑑賞 Vol. 74, No. 1
  
  Pages: 61-69
- Related Report
  2008 Self-evaluation Report
[Journal Article] コーパスにおける語の意味の自動識別2009
- Author(s)
  白井清昭
- Journal Title
  
  国文学解釈と鑑賞 Vol.74, No.1
  
  Pages: 61-69
- Related Report
  2008 Annual Research Report
[Journal Article] Analysis of Eye Movements and Linguistic Boundaries in a Text for the Investigation of Japanese Reading Processes.IEICE Transaction on Information and Systems, Special Issue on Knowledge2008
- Author(s)
  Akemi Tera, Kiyoaki Shirai, Takaya Yuizono, Kozo Sugiyama.
- Journal Title
  
  Information and Creativity Support System Vol.E91-D, No.11
  
  Pages: 2560-2567
- Related Report
  2010 Final Research Report
- Peer Reviewed
[Journal Article] 現代日本語書き言葉均衡コーパスを用いた意味解析-語義の自動特定, 新語義の発見-2008
- Author(s)
  奥村学, 白井清昭
- Journal Title
  
  言語 Vol.37,No.8
  
  Pages: 66-73
- Related Report
  2008 Self-evaluation Report
[Journal Article] 現代日本語書き言葉均衡コーパスを用いた意味解析-語義の自動特定,新語義の発見-2008
- Author(s)
  奥村学, 白井清昭
- Journal Title
  
  言語 Vol.37, No.8
  
  Pages: 66-73
- Related Report
  2008 Annual Research Report
[Presentation] 教師付き外れ値検出による新語義の発見2011
- Author(s)
  新納浩幸, 佐々木稔
- Organizer
  言語処理学会第17回年次大会
- Place of Presentation
  豊橋
- Year and Date
  2011-03-10
- Related Report
  2010 Annual Research Report
[Presentation] 距離学習に基づく語義識別の性能分析2011
- Author(s)
  佐々木稔, 新納浩幸
- Organizer
  言語処理学会第17回年次大会
- Place of Presentation
  豊橋
- Year and Date
  2011-03-09
- Related Report
  2010 Annual Research Report
[Presentation] 複数の観点から定義された用例間類似度に基づく語義識別2011
- Author(s)
  中西隆一郎, 白井清昭, 中村誠
- Organizer
  言語処理学会第17回年次大会
- Place of Presentation
  豊橋
- Year and Date
  2011-03-09
- Related Report
  2010 Annual Research Report
[Presentation] 分類器の確信度を用いた合議制による語義曖昧性解消の領域適応2011
- Author(s)
  古宮嘉那子, 奥村学
- Organizer
  言語処理学会第17回年次大会
- Place of Presentation
  豊橋
- Year and Date
  2011-03-09
- Related Report
  2010 Annual Research Report
[Presentation] Document Clustering Using Semantic Relationship Between Target Documents And Related Documents2010
- Author(s)
  Minoru Sasaki, Hiroyuki Shinnou
- Organizer
  The Fourth International Conference on Advances in Semantic Processing
- Place of Presentation
  Florence.
- Year and Date
  2010-10-27
- Related Report
  2010 Final Research Report
[Presentation] Document Clustering Using Semantic Relationship Between Target Documents And Related Documents2010
- Author(s)
  Minoru Sasaki, Hiroyuki Shinnou
- Organizer
  The Fourth International Conference on Advances in Semantic Processing
- Place of Presentation
  Florence, Italy
- Year and Date
  2010-10-27
- Related Report
  2010 Annual Research Report
[Presentation] グラフに基づくクラスタリングによる動詞類義語の獲得2010
- Author(s)
  竹内孔一, 高橋秀幸, 小林大介
- Organizer
  言語理解とコミュニケーション研究会
- Place of Presentation
  機械振興会館
- Year and Date
  2010-10-23
- Related Report
  2010 Annual Research Report
[Presentation] 語義曖昧性解消のための領域適応手法の自動選択2010
- Author(s)
  古宮嘉那子, 奥村学
- Organizer
  情報処理学会自然言語処理研究会
- Place of Presentation
  国立情報学研究所
- Year and Date
  2010-09-16
- Related Report
  2010 Annual Research Report
[Presentation] A Thesaurus of Predicate-Argument Structure for Japanese Verbs to Deal with Granularity of Verb Meanings2010
- Author(s)
  Koichi Takeuchi, Kentaro Inui, Nao Takeuchi, Atsushi Fujita
- Organizer
  The 8th Workshop on Asian Language Resources
- Place of Presentation
  Beijing.
- Year and Date
  2010-08-21
- Related Report
  2010 Final Research Report
[Presentation] A Thesaurus of Predicate-Argument Structure for Japanese Verbs to Deal with Granularity of Verb Meanings2010
- Author(s)
  Koichi Takeuchi, Kentaro Inui, Nao Takeuchi, Atsushi Fujita
- Organizer
  The 8th Workshop on Asian Language Resources
- Place of Presentation
  Beijing
- Year and Date
  2010-08-21
- Related Report
  2010 Annual Research Report
[Presentation] SemEval-2010 Task: Japanese WSD.2010
- Author(s)
  Manabu Okumura, Kiyoaki Shirai, Kanako Komiya, Hikaru Yokono.
- Organizer
  The 5th International Workshop on Semantic Evaluation, pp.67-74
- Place of Presentation
  Uppsala.
- Year and Date
  2010-07-15
- Related Report
  2010 Final Research Report
[Presentation] JAIST: Clustering and Classification Based Approaches for Japanese WSD.2010
- Author(s)
  Kiyoaki Shirai, Makoto Nakamura.
- Organizer
  The 5th International Workshop on Semantic Evaluation, pp.379-382
- Place of Presentation
  Uppsala.
- Year and Date
  2010-07-15
- Related Report
  2010 Final Research Report
[Presentation] Detection of Peculiar Examples using LOF and One Class SVM2010
- Author(s)
  Hiroyuki Shinnou, Minoru Sasaki
- Organizer
  LREC-2010
- Place of Presentation
  Malta.
- Year and Date
  2010-05-21
- Related Report
  2010 Final Research Report
[Presentation] Detection of Peculiar Examples using LOF and One Class SVM2010
- Author(s)
  Hiroyuki Shinnou, Minoru Sasaki
- Organizer
  LREC-2010
- Place of Presentation
  Malta
- Year and Date
  2010-05-21
- Related Report
  2010 Annual Research Report
[Presentation] Webディレクトリを利用した意味的関連語集合の作成2010
- Author(s)
  佐々木稔, 三上健太, 新納浩幸
- Organizer
  言語処理学会第16回年次大会
- Place of Presentation
  東京大学
- Year and Date
  2010-03-11
- Related Report
  2009 Annual Research Report
[Presentation] Webディレクトリを利用した名詞のジャンルベクトルの作成2010
- Author(s)
  林華, 新納浩幸, 佐々木稔
- Organizer
  言語処理学会第16回年次大会
- Place of Presentation
  東京大学
- Year and Date
  2010-03-10
- Related Report
  2009 Annual Research Report
[Presentation] LOFと One Class SVMを用いた特異用例の検出2010
- Author(s)
  新納浩幸, 佐々木稔
- Organizer
  言語処理学会第16回年次大会
- Place of Presentation
  東京大学
- Year and Date
  2010-03-10
- Related Report
  2009 Annual Research Report
[Presentation] 名詞の主要語義の推定と語義識別への応用2010
- Author(s)
  江口晃, 新納浩幸, 佐々木稔
- Organizer
  言語処理学会第16回年次大会
- Place of Presentation
  東京大学
- Year and Date
  2010-03-10
- Related Report
  2009 Annual Research Report
[Presentation] Manabu Okumura, Semi-supervised Clustering for Word Instances and Its Effect on Word Sense Disambiguation2009
- Author(s)
  Kazunari Sugiyama
- Organizer
  The 10th International Conference on Intelligent Text Processing and Computational Linguistics(CICLing 2009)
- Place of Presentation
  Mexico City
- Year and Date
  2009-03-05
- Related Report
  2008 Self-evaluation Report
[Presentation] Semi-supervised Clustering for Word Instances and Its Effect on Word Sense Disambiguation2009
- Author(s)
  Kazunari Sugiyama, Manabu Okumura
- Organizer
  The 10th International Conference on Inte lligent Text Processing and Computational Linguistics (CICLing 2009)
- Place of Presentation
  Mexico City
- Year and Date
  2009-03-05
- Related Report
  2008 Annual Research Report
[Presentation] 新語義発見のための用例クラスタと辞書定義文の対応付け2009
- Author(s)
  田中博貴, 中村誠, 白井清昭
- Organizer
  第15回言語処理学会年次大会
- Place of Presentation
  鳥取大学
- Year and Date
  2009-03-04
- Related Report
  2008 Annual Research Report
[Presentation] BCCWJを用いた新しい語義曖昧性解消タスク2009
- Author(s)
  奥村学, 白井清昭
- Organizer
  第15回言語処理学会轍大会
- Place of Presentation
  鳥取大学
- Year and Date
  2009-03-04
- Related Report
  2008 Annual Research Report
[Presentation] 多義性を考慮した同時共起クラスタリングによる動詞の類語抽出2009
- Author(s)
  高橋秀幸, 竹内孔一
- Organizer
  電子情報通信学会,言語理解とコミュニケーション研究会
- Place of Presentation
  倉敷芸文館
- Year and Date
  2009-01-27
- Related Report
  2008 Annual Research Report
[Presentation] Extraction of Verb Synonyms using Co-clustering Approach2008
- Author(s)
  Koichi Takeuchi
- Organizer
  Second International Symposium on Universal Communication (ISUC 2008)
- Place of Presentation
  Osaka International Convention Center.
- Year and Date
  2008-12-16
- Related Report
  2008 Annual Research Report
[Presentation] 単語の用例の半教師有りクラスタリング2008
- Author(s)
  杉山一成, 奥村学
- Organizer
  情報処理学会自然言語処理研究会
- Place of Presentation
  情報通信研究機構
- Year and Date
  2008-03-27
- Related Report
  2007 Annual Research Report
[Presentation] 用例のクラスタリング結果を利用した語義曖昧性解消手法2008
- Author(s)
  杉山一成, 奥村学
- Organizer
  言語処理学会第14回年次大会
- Place of Presentation
  東京大学
- Year and Date
  2008-03-19
- Related Report
  2007 Annual Research Report
[Presentation] Extraction of Verb Synonyms using Co-clustering Approach2008
- Author(s)
  Koichi Takeuchi
- Organizer
  Second International Symposium on Universal Communication(ISUC 2008)
- Place of Presentation
  Osaka
- Related Report
  2008 Self-evaluation Report
[Presentation] Personal Name Disambiguation in Web Search Results Based on a Semi-Supervised Clustering Approach2007
- Author(s)
  Kazunari Sugiyama, Manabu Okumura
- Organizer
  Proc. of the 10th International Conference on Asian Digital Libraries, Lecture Notes in Computer Science(LNCS)(Springer Verlag)
- Place of Presentation
  Hanoi
- Related Report
  2008 Self-evaluation Report
[Remarks] BCCWJ を用いた新しい語義曖昧性解消タスク
- URL
  http://oku-gw.pi.titech.ac.jp/wsd.html
- Related Report
  2010 Final Research Report
[Remarks] 意味役割付与システムの公開
- URL
  http://cl.cs.okayama-u.ac.jp/study/project/sea.html
- Related Report
  2010 Final Research Report
[Remarks] 動詞の概念辞書の公開
- URL
  http://cl.cs.okayama-u.ac.jp/rsc/data/index.html
- Related Report
  2010 Final Research Report
[Remarks] ホームページ
- URL
  http://oku-gw.pi.titech.ac.jp/wsd.html
- Related Report
  2008 Self-evaluation Report

Japanese semantic analysis using balanced corpus of contemporary Written Japanese

Principal Investigator

OKUMURA Manabu Tokyo Institute of Technology, 精密工学研究所, 教授 (60214079)

¥84,700,000 (Direct Cost: ¥84,700,000)

Report

Research Products

[Journal Article] On SemEval-2010 Japanese WSD Task2011

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Co-clustering with Recursive Elimination for Verb Synonym Extraction from Large Text Corpus2009

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 代表性のあるコーパスを利用した日本語意味解析2009

Author(s)

Journal Title

Related Report

[Journal Article] コーパスにおける語の意味の自動識別2009

Author(s)

Journal Title

Related Report

[Journal Article] コーパスにおける語の意味の自動識別2009

Author(s)

Journal Title

Related Report

[Journal Article] Analysis of Eye Movements and Linguistic Boundaries in a Text for the Investigation of Japanese Reading Processes.IEICE Transaction on Information and Systems, Special Issue on Knowledge2008

Author(s)

Journal Title

Related Report

[Journal Article] 現代日本語書き言葉均衡コーパスを用いた意味解析-語義の自動特定, 新語義の発見-2008

Author(s)

Journal Title

Related Report

[Journal Article] 現代日本語書き言葉均衡コーパスを用いた意味解析-語義の自動特定,新語義の発見-2008

Author(s)

Journal Title

Related Report

[Presentation] 教師付き外れ値検出による新語義の発見2011

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 距離学習に基づく語義識別の性能分析2011

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 複数の観点から定義された用例間類似度に基づく語義識別2011

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 分類器の確信度を用いた合議制による語義曖昧性解消の領域適応2011

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Document Clustering Using Semantic Relationship Between Target Documents And Related Documents2010

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Document Clustering Using Semantic Relationship Between Target Documents And Related Documents2010

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] グラフに基づくクラスタリングによる動詞類義語の獲得2010

Author(s)

Organizer

Place of Presentation