Analysis of the Relationship between Proper nouns in Large Scale Corpus

Research Project

Project/Area Number	15500090
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	Intelligent informatics
Research Institution	Toyohashi University of Technology
Principal Investigator	UMEMURA Kyoji Toyohashi University of Technology, Information and Computer Sciences, Professor, 工学部, 教授 (80273324)
Project Period (FY)	2003 – 2004
Project Status	Completed (Fiscal Year 2004)
Budget Amount *help	¥3,300,000 (Direct Cost: ¥3,300,000) Fiscal Year 2004: ¥1,700,000 (Direct Cost: ¥1,700,000) Fiscal Year 2003: ¥1,600,000 (Direct Cost: ¥1,600,000)
Keywords	Statistical Analysis / Support Vector Machine / Medical System / Synonym / 関連語 / シソーラス / 統計的言語処理
Research Abstract	In the first year, we have developed computer cluster system from parts, and developed the specialized software package for frequency analysis. Though most of these works are combination of existing result, we have realized a powerful environment to analyze the corpus. In the second year, we have used the SVM to detect keywords from corpus. The input of SVM is the statistical values of many strings, and the SVM judges whether the string is keywords or not. Sine this method does not use any kind of dictionary, the identical program works for both Japanese and Chinese. It is very interesting and remarkable result that the keyword can be extracted without any kind of dictionaries. All we need are samples of keywords in each language. We have also applied our environment to analyze the decease name of medical information systems. The data in the system consists of 7 years of medical record. Without our environment, it would be very difficult to analyze the data and get the synonyms of decease names from the data.

Report

(3 results)

2004 Annual Research Report Final Research Report Summary
2003 Annual Research Report

Research Products
(16 results)

All 2005 2004 2003 Other

All Journal Article (12 results) Publications (4 results)

[Journal Article] 医療情報システムのデータマイニングによる関連病名の発見2005
- Author(s)
  Pattamon, 梅村
- Journal Title
  
  情報処理学会プログラミング・シンポジウム (口頭発表)
  
  Pages: 6-6
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2004 Final Research Report Summary
[Journal Article] 頻度差が著しい場合における一対多関係を推定する類似尺度2005
- Author(s)
  岡部, 梅村
- Journal Title
  
  情報処理学会2005年情報学シンポジウム (口頭発表)
  
  Pages: 8-8
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2004 Final Research Report Summary
[Journal Article] 医療情報システムのデータマイニングによる関連病名の発見2005
- Author(s)
  Pattamon, 梅村
- Journal Title
  
  情報処理学会プログラミング・シンポジウム口頭発表
  
  Pages: 187-192
- Related Report
  2004 Annual Research Report
[Journal Article] 頻度差が著しい場合における一対多関係を推定する類似尺度2005
- Author(s)
  岡部, 梅村
- Journal Title
  
  情報処理学会2005 年情報学シンポジウム口頭発表
  
  Pages: 129-136
- Related Report
  2004 Annual Research Report
[Journal Article] SVMと一般化文書頻度によるキーワードの推定2004
- Author(s)
  尾形, 寺尾, 梅村
- Journal Title
  
  言語処理学会第10回年次大会NLP2004併設ワークショップ固有表現と専門語抽出 (口頭発表)
  
  Pages: 4-4
- NAID
  170000169384
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2004 Final Research Report Summary
[Journal Article] Japanese Multiword Extraction using SVM and Adaptation2004
- Author(s)
  T.Ogata, K.Terao, K.Umemura
- Journal Title
  
  LREC -2004 Workshop on Methodologies and Evaluation of Multiword Units in Real-world Applications (口頭発表)
  
  Pages: 4-4
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2004 Final Research Report Summary
[Journal Article] Bigramの反復度を用いた技術用語抽出2004
- Author(s)
  中瀬, 梅村
- Journal Title
  
  第46回デジタルドキュメント研究会 Vol.2004 No.97
  
  Pages: 6-6
- NAID
  110002914320
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2004 Final Research Report Summary
[Journal Article] SVMと一般化文書頻度によるキーワードの推定2004
- Author(s)
  尾形, 寺尾, 梅村
- Journal Title
  
  言語処理学会第10回年次大会 NLP2004 併設ワークショップ固有表現と専門語抽出口頭発表
  
  Pages: 44-47
- NAID
  170000169384
- Related Report
  2004 Annual Research Report
[Journal Article] Japanese Multiword Extraction using SVM and Adaptation2004
- Author(s)
  T.Ogata, K.Terao, K.Umemura
- Journal Title
  
  LREC -2004 Workshop on Methodologies and Evaluation of Multiword Units in Real-world Application 口頭発表
  
  Pages: 8-11
- Related Report
  2004 Annual Research Report
[Journal Article] Bigramの反復度を用いた技術用語抽出2004
- Author(s)
  中瀬, 梅村
- Journal Title
  
  第46回デジタルドキュメント研究会 IPSJ-DD04046003 Vol.2004 No.97
  
  Pages: 15-20
- NAID
  110002914320
- Related Report
  2004 Annual Research Report
[Journal Article] 一大規模コーパスに対す計数手法る般化文書頻度の2003
- Author(s)
  寺尾健一郎, 梅村恭司
- Journal Title
  
  情報処理学会夏のプログラミング・シンポジウム (口頭発表)
  
  Pages: 12-12
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2004 Final Research Report Summary
[Journal Article] 一大規模コーパスに対す計数手法る般化文書頻度の計数手法2003
- Author(s)
  寺尾健一郎, 梅村恭司
- Journal Title
  
  情報処理学会夏のプログラミング・シンポジウム口頭発表
  
  Pages: 25-36
- Related Report
  2004 Annual Research Report
[Publications] Yinghuo XU, Kyoji Umemura: "Optimal Local Dimension Analysis of Latent Semantic Indexing Query NeighborSpace"IEICE TRANSACTIONS On Information and Systems. 第135号. 1762-1772 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Yoshiyuki Takeda, Kyoji Umemura, Eiko Yamamoto: "Determining Indexing Strings with Statistical Analysis"IEICE TRANSACTIONS On Information and Systems. 第135号. 1781-1787 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Junan Chakma, Kyoji Umemura: "Factor Controlled Hierarchical SOM Visualization for Large Set of Data"IEICE TRANSACTIONS On Information and Systems. 第135号. 1796-1803 (2003)
- Related Report
  2003 Annual Research Report
[Publications] 武田善行, 梅村恭司, 藤井敦: "Webマイニング"共立出版. 197 (2004)
- Related Report
  2003 Annual Research Report

Analysis of the Relationship between Proper nouns in Large Scale Corpus

Principal Investigator

UMEMURA Kyoji Toyohashi University of Technology, Information and Computer Sciences, Professor, 工学部, 教授 (80273324)

¥3,300,000 (Direct Cost: ¥3,300,000)

Report

Research Products

[Journal Article] 医療情報システムのデータマイニングによる関連病名の発見2005

Author(s)

Journal Title

Description

Related Report

[Journal Article] 頻度差が著しい場合における一対多関係を推定する類似尺度2005

Author(s)

Journal Title

Description

Related Report

[Journal Article] 医療情報システムのデータマイニングによる関連病名の発見2005

Author(s)

Journal Title

Related Report

[Journal Article] 頻度差が著しい場合における一対多関係を推定する類似尺度2005

Author(s)

Journal Title

Related Report

[Journal Article] SVMと一般化文書頻度によるキーワードの推定2004

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Japanese Multiword Extraction using SVM and Adaptation2004

Author(s)

Journal Title

Description

Related Report

[Journal Article] Bigramの反復度を用いた技術用語抽出2004

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] SVMと一般化文書頻度によるキーワードの推定2004

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Japanese Multiword Extraction using SVM and Adaptation2004

Author(s)

Journal Title

Related Report

[Journal Article] Bigramの反復度を用いた技術用語抽出2004

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 一大規模コーパスに対す計数手法る般化文書頻度の2003

Author(s)

Journal Title

Description

Related Report

[Journal Article] 一大規模コーパスに対す計数手法る般化文書頻度の計数手法2003

Author(s)

Journal Title

Related Report

[Publications] Yinghuo XU, Kyoji Umemura: "Optimal Local Dimension Analysis of Latent Semantic Indexing Query NeighborSpace"IEICE TRANSACTIONS On Information and Systems. 第135号. 1762-1772 (2003)

Related Report

[Publications] Yoshiyuki Takeda, Kyoji Umemura, Eiko Yamamoto: "Determining Indexing Strings with Statistical Analysis"IEICE TRANSACTIONS On Information and Systems. 第135号. 1781-1787 (2003)

Related Report

[Publications] Junan Chakma, Kyoji Umemura: "Factor Controlled Hierarchical SOM Visualization for Large Set of Data"IEICE TRANSACTIONS On Information and Systems. 第135号. 1796-1803 (2003)

Related Report

[Publications] 武田善行, 梅村恭司, 藤井 敦: "Webマイニング"共立出版. 197 (2004)

Related Report

[Publications] 武田善行, 梅村恭司, 藤井敦: "Webマイニング"共立出版. 197 (2004)