研究概要 |
Preparation, data processing and basic analysis of the data went as planned. The number of messages for the corpus was reduced to 434 items. After grammatical tokenization, the total number of "words" in the corpus was 83,379, consisting of 5,771 word types. After a few basic quantitative aspects (message length, frequent vocabulary items and collocations, sender and title names, etc.) had been examined, I proceeded to a more detailed look at single messages with respect to the dealing with taboo expressions and the portrayal of female sexuality. It was already possible to publish a paper on this analysis in a peer-reviewed journal (cf. below).
今後の研究の推進方策 |
This second phases of the project will be mainly dedicated to a more in-depth analysis of the data plus triangulation. Based on feedback I received when first presenting the data to a scientific audience in December last year (cf. below), I'm planning to add a separate empirical dimension by collecting recipients' evaluations on a given number of specific spam messages from the corpus. To be more specific, I'm intending to show these messages to between 50 and 100 undergraduate students and ask them to rate (1) the overall credibility of the contents of a message, (2) the naturalness of the language, and (3) the likeliness of a recipient's reaction. Of further interest will be if there are any recognizable differences in evaluations with regard to the gender of the participating students.