Comparative Study on the Methodologies of Analyzing Textual Genres and Styles by Means of Multivariate Analysis
Project/Area Number |
13610579
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
英語・英米文学
|
Research Institution | The University of Tokushima |
Principal Investigator |
NAKAMURA Junsaku The University of Tokushima, Faculty of Integrated Arts and Sciences, Professor, 総合科学部, 教授 (20035695)
|
Co-Investigator(Kenkyū-buntansha) |
TABATA Tomoji Osaka University, Faculty of Language and Culture, Associate Professor, 言語文化部, 助教授 (10249873)
|
Project Period (FY) |
2001 – 2002
|
Project Status |
Completed (Fiscal Year 2002)
|
Budget Amount *help |
¥2,300,000 (Direct Cost: ¥2,300,000)
Fiscal Year 2002: ¥500,000 (Direct Cost: ¥500,000)
Fiscal Year 2001: ¥1,800,000 (Direct Cost: ¥1,800,000)
|
Keywords | Corpus / Quantification of Contingency Table / Correspondence Analysis / Principal Component Analysis / BNC World Edition / Dickens / Manner Adverbs / Style / ジャンル |
Research Abstract |
Multivariate analysis such as Factor Analysis has long been used in analyzing corpus data. B. Biber (1988 and others) are the typical examples of its use in text typology and seem to have been successful in explaining the differences of registers. However, other multivariate methods like Principal Component Analysis (PCA) and Quantification of Contingency Table (QCT) have also been employed for more or less the same purposes. Burrows (1989 and others) used PCA for stylistic investigations of Jane Austin's novels, and Sigley (1998) used it in proposing a formality index of different text types. One of the investigators of the present research, Tabata (1995 and others), also used it in investigating stylistics of Dickens' texts. These studies also seem to have been successful in their ways of analyzing textual data The same can be said of Nakamura (2002 and others) which made use of QCT in determining the structures of corpora based upon the distributions of various kinds of linguistic f
… More
eatures or items. Pilot studies by means of QCT concerning the distributions of degree, frequency and manner adverbs across different text categories of the BNC Sampler revealed that manner adverbs behaved differently from the others : the distributions of degree and frequency adverbs were mainly ascribed to the dichotomy of spoken vs. written texts ; imaginative texts played the main role in explaining the behaviors of manner adverbs and the dichotomy of registers turned out to be a secondary factor. In the present research, the behaviors of ?ly manner adverbs across different text categories in the BNC World Edition will be further examined both by PCA and QCT. Column-wise Analysis of PCA turned out to be effective in classifying text domains but the distribution of adverbs are not so effective in separating them into meaningful groups. Row-wise analysis could not extract factors to produce meaningful interpretations. In contrast, QCT provided quite reasonable interpretation for both the distribution of textual domains and that of adverbs with the primary factor being narrative vs. expository style and the secondary factor being informal vs. formal style. Tabata also used Dickens's novels and sketches and conducted several studies : 1) correspondence analysis (ANACOR : same as QCT in principle) of word-class distributions across texts, 2) PCA of 30-60 most common word-types in the dialogue of Dickens's fiction and 3) ANACOR of 1246 types of ' ?ly' adverbs. These studies reveal various aspects of Dickens's texts variation across text categories, variation over time, differentiation of ideolects in fictional discourse due to social dynamics and formal vs. colloquial styles. Further analysis of manner adverbs, the main interest of the present research, revealed that ANACOR generally works better than PCA. In conclusion, in order to choose an appropriate for multivariate analysis, the types of data in question such as types of variable, I.e., quantitative or qualitative, whether raw scores are normalized or not, the number of variables, etc should be taken into consideration. Less
|
Report
(3 results)
Research Products
(4 results)