Budget Amount *help |
¥3,500,000 (Direct Cost: ¥3,500,000)
Fiscal Year 2005: ¥800,000 (Direct Cost: ¥800,000)
Fiscal Year 2004: ¥1,400,000 (Direct Cost: ¥1,400,000)
Fiscal Year 2003: ¥1,300,000 (Direct Cost: ¥1,300,000)
|
Research Abstract |
Data mining from Internet increases its importance as the Internet increases its importance as a social infrastructure. In this study, we investigate following data mining techniques to realize new marketing tools for the networked society : ●A method for creating the viewers' side annotations that reflect viewers' attentions on TV drama are proposed. Internet bulletin boards are filled with large amount of viewers' dialogs concerning TV programs. Our approach is to extract the viewers' attentions embedded in these dialogs and express them in the form of the graphical structure called "attention graph". The testing results demonstrate that attention graphs sufficiently act as viewers' side annotations, in terms of pointing out which scene the viewers pay attentions to and clarifying how deeply viewers' are impressed by the scene. Although further researches are required, this result is an important technique to realize internet marketing. ●A method for finding spam mails is proposed. The
… More
volume of mass unsolicited electronic mail, often known as spam, has recently increased enormously and has become a serious threat not only to the Internet but also to society. We propose a new spam detection method which uses document space density information. Although the proposed method requires extensive e-mail traffic to acquire the necessary information, it can achieve perfect detection (i.e., both recall and precision is 100%) under practical conditions. A direct-mapped cache method contributes to the handling of over 13,000 e-mail messages per second. Experimental results, which were conducted using over 50 million actual e-mail messages, are also reported in this research. ●Network security is an important issue in maintaining the Internet as an important social infrastructure. Especially, finding excessive consumption of network bandwidth caused by P2P mass flow is important. Finding Internet viruses are also an important security issue. Although stream mining techniques seem to be promising techniques to find P2P and Internet viruses, vast network flow prevents the simple application of such techniques. A mining technique which works well with extremely limited memory is required. Also it should have a real-time analysis capability. In this paper, we propose a cache based mining method to realize such a technique. By analyzing the characteristics of the proposed method with real Internet backbone flow data, we show the advantages of the proposed method, i.e. less memory consumption while realizing real-time analysis capability. We also show the fact that we can use the proposed method to find mass flow information from Internet backbone flow data. Less
|