2002 Fiscal Year Final Research Report Summary

Comparative Study on the Methodologies of Analyzing Textual Genres and Styles by Means of Multivariate Analysis

Research Project

Project/Area Number	13610579
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	英語・英米文学
Research Institution	The University of Tokushima
Principal Investigator	NAKAMURA Junsaku The University of Tokushima, Faculty of Integrated Arts and Sciences, Professor, 総合科学部, 教授 (20035695)
Co-Investigator(Kenkyū-buntansha)	TABATA Tomoji Osaka University, Faculty of Language and Culture, Associate Professor, 言語文化部, 助教授 (10249873)
Project Period (FY)	2001 – 2002
Keywords	Corpus / Quantification of Contingency Table / Correspondence Analysis / Principal Component Analysis / BNC World Edition / Dickens / Manner Adverbs / Style
Research Abstract	Multivariate analysis such as Factor Analysis has long been used in analyzing corpus data. B. Biber (1988 and others) are the typical examples of its use in text typology and seem to have been successful in explaining the differences of registers. However, other multivariate methods like Principal Component Analysis (PCA) and Quantification of Contingency Table (QCT) have also been employed for more or less the same purposes. Burrows (1989 and others) used PCA for stylistic investigations of Jane Austin's novels, and Sigley (1998) used it in proposing a formality index of different text types. One of the investigators of the present research, Tabata (1995 and others), also used it in investigating stylistics of Dickens' texts. These studies also seem to have been successful in their ways of analyzing textual data The same can be said of Nakamura (2002 and others) which made use of QCT in determining the structures of corpora based upon the distributions of various kinds of linguistic f … More eatures or items. Pilot studies by means of QCT concerning the distributions of degree, frequency and manner adverbs across different text categories of the BNC Sampler revealed that manner adverbs behaved differently from the others : the distributions of degree and frequency adverbs were mainly ascribed to the dichotomy of spoken vs. written texts ; imaginative texts played the main role in explaining the behaviors of manner adverbs and the dichotomy of registers turned out to be a secondary factor. In the present research, the behaviors of ?ly manner adverbs across different text categories in the BNC World Edition will be further examined both by PCA and QCT. Column-wise Analysis of PCA turned out to be effective in classifying text domains but the distribution of adverbs are not so effective in separating them into meaningful groups. Row-wise analysis could not extract factors to produce meaningful interpretations. In contrast, QCT provided quite reasonable interpretation for both the distribution of textual domains and that of adverbs with the primary factor being narrative vs. expository style and the secondary factor being informal vs. formal style. Tabata also used Dickens's novels and sketches and conducted several studies : 1) correspondence analysis (ANACOR : same as QCT in principle) of word-class distributions across texts, 2) PCA of 30-60 most common word-types in the dialogue of Dickens's fiction and 3) ANACOR of 1246 types of ' ?ly' adverbs. These studies reveal various aspects of Dickens's texts variation across text categories, variation over time, differentiation of ideolects in fictional discourse due to social dynamics and formal vs. colloquial styles. Further analysis of manner adverbs, the main interest of the present research, revealed that ANACOR generally works better than PCA. In conclusion, in order to choose an appropriate for multivariate analysis, the types of data in question such as types of variable, I.e., quantitative or qualitative, whether raw scores are normalized or not, the number of variables, etc should be taken into consideration. Less

Research Products
(2 results)

All Other

All Publications (2 results)

[Publications] Junsaku Nakamura, Tomoji Tabata: "The structure of the BNC World Edition Based upon the Distribution of-ly Manner Adverbs : Cross-Examination by Means of Principal Component Analysis and Quantification of Contingency Table"ICAME 2002, 22-26 May,2002,Goteborg, Sweden. (2002)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Junsaku NAKAMURA and Tomoji TABATA: ""The Structure of the BNC World Edition Based upon the Distribution of -ly Manner Adverbs : Cross Examination by Means of Principal Component Analysis and Quantification of Contingency Table", at ICAME (International Computer Archive of Modern and Medieval English) 2002, held on 22-26 May, 2002, at Goteborg, Sweden."Paper to be submitted to the Anthology Commemorating the 10th Anniversary of the JAECS (Japan Association for English Corpus Studies). (in preparation).
- Description
  「研究成果報告書概要(欧文)」より

2002 Fiscal Year Final Research Report Summary

Comparative Study on the Methodologies of Analyzing Textual Genres and Styles by Means of Multivariate Analysis

Principal Investigator

NAKAMURA Junsaku The University of Tokushima, Faculty of Integrated Arts and Sciences, Professor, 総合科学部, 教授 (20035695)

Research Products

[Publications] Junsaku Nakamura, Tomoji Tabata: "The structure of the BNC World Edition Based upon the Distribution of-ly Manner Adverbs : Cross-Examination by Means of Principal Component Analysis and Quantification of Contingency Table"ICAME 2002, 22-26 May,2002,Goteborg, Sweden. (2002)

Description

Description