2007 Fiscal Year Final Research Report Summary
Development of Language Information System and Systematic Extraction of Latent Knowledge, Based on Graph Computation of Semantic Network
Project/Area Number |
18500192
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
情報図書館学・人文社会情報学
|
Research Institution | Tokyo Institute of Technology |
Principal Investigator |
AKAMA Hiroyuki Tokyo Institute of Technology, Graduate School of Decision Science and Technology, Associate Professor (60242301)
|
Co-Investigator(Kenkyū-buntansha) |
NISHINA Kikuko Tokyo Institute of Technology, International Student Center, Professor (40198479)
SHIMIZU Yumiko Musashi Institute of Technology, Faculty of Environmental and Information Studies, Assistant Professor (30298020)
MIYAKE Maki Osaka University, Department of Language and Culture, Assistant Professor (80448018)
|
Project Period (FY) |
2006 – 2007
|
Keywords | Semantic Network / Graph Theory / Latent Knowledge / Educational Information System / Hidden Knowledge |
Research Abstract |
We developed a new method of solving the cluster-size imbalance problem observed when documents and corpora are processed with MCL. The Branching MCL (BMCL) or the latent adjacency matrix can resize overly inclusive Markov clusters (core clusters) into appropriate subsets. This method is applied to a semantic network built from the large-scale corpus of Gakken's Large Dictionary of Japanese (GLDJ), covering 100,000 words, definitions, examples, and grammatical explanations. The effectiveness of these techniques is currently being tested by creating a clustered semantic network for the GLDJ. As the applications of the graph clustering to the field of Humanities, we made the semantic networks from the lexical co-occurrence data of some historical documents or novels : the books of two contemporary thinkers, Cabanis and Mesmer, to measure the similarity of thinking between them ; the very famous novel of Saint-Exupery, "Le petit prince" to objectively propose a method of word sense disambiguation applicable to his enigmatic word usage. In this study we proposed as an alternative to the keyword-based clustering a new windowing method called Incrementally Advancing Window (TAW) that generates co-occurring word pairs that can be used as inputs to the Incremental Routing Algorithm. The results of the MCL applied to co-occurrence and/or adjacency data matrices were evaluated by using the indexes as weighted curvature, modularity Q and F measure.
|
Research Products
(33 results)