Systemization of audio-visual knowledge resources using graphical models
Project/Area Number |
17300059
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Perception information processing/Intelligent robotics
|
Research Institution | Tokyo Institute of Technology |
Principal Investigator |
SHINODA Koichi Tokyo Institute of Technology, Graduate School of Information Science and Engineering, Associate Professor (10343097)
|
Co-Investigator(Kenkyū-buntansha) |
FURUI Sadaoki Tokyo Institute of Technology, Graduate School of Information Science and Engineering, Professor (90293076)
|
Project Period (FY) |
2005 – 2007
|
Project Status |
Completed (Fiscal Year 2007)
|
Budget Amount *help |
¥14,780,000 (Direct Cost: ¥13,700,000、Indirect Cost: ¥1,080,000)
Fiscal Year 2007: ¥4,680,000 (Direct Cost: ¥3,600,000、Indirect Cost: ¥1,080,000)
Fiscal Year 2006: ¥4,300,000 (Direct Cost: ¥4,300,000)
Fiscal Year 2005: ¥5,800,000 (Direct Cost: ¥5,800,000)
|
Keywords | Multimodal recognition / Speech Recognition / Video Recognition / Large Knowledge Resources / Sequence data modeling / 時系列モデリング / グラフィカルモデリング / マルチメディアコンテンツ / CBVIR / 知識の体系化 / セマンティクス |
Research Abstract |
Recent advances in computer technology, particularly in storage technology, have resulted in significant increases in the number and quality of audio-visual knowledge resources. Most of those resources are not equipped with index information, and thus, it has become difficult for ordinary people to browse the entire content of each database. Techniques for systemizing audio-visual knowledge resources and utilizing them have been strongly demanded. However, statistical pattern recognition techniques have not yet achieved enough performance for this purpose. In addition, it is not always clear what kinds of indexing are useful. In this study, we take an approach to index those databases in different ways with unsupervised manner, and extract dependencies among those labels. First, we carried scene recognition for baseball video. We constructed annotated database for 43 games of Major League Baseball with NHK Science & Technical Research Labs and used them for our evaluation. We used vari
… More
ous relationships between scene labels such as scene contexts, and unified audio and visual information. We achieved 60% accuracy for 16 scene recognition and 90% recall rate for score scene detection. Our techniques are expected to contribute much to make automatic highlight extraction systems for broadcast companies. Second, we participated in TRECVID workshop organized by NIST, USA, to study the high-level feature extraction task. We constructed tree-structured dictionaries of "visual words" by unsupervised clustering for video features, and selected a tree-cut as a dictionary for each word. By using Bag-of-word approach, we constructed a robust extraction system against the differences in data amount for each feature. We also extracted effective "motion words" for dynamic features. Our method achieved significant improvements in the task of extracting 39 features. The other research topics include robust speech recognition using graphical models, multi-modal interface for asynchronous multi-modal inputs, human-gait modeling. Less
|
Report
(4 results)
Research Products
(79 results)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
[Presentation] "TokyoTech's TRECVID2006 Notebook"2006
Author(s)
Taichi Nakamura, Yuichi Miyamura, Koichi Shinoda and Sadaoki Furui
Organizer
Online Proceedings of the TRECVID Workshops
Place of Presentation
Washington, USA
Year and Date
2006-11-13
Description
「研究成果報告書概要(和文)」より
Related Report
-
[Presentation] TokyoTech's TRECVID2006 Notebook2006
Author(s)
Taichi Nakamura, Yuichi Miyamura, Koichi Shinoda, Sadaoki Furui
Organizer
Online Proceedings of the TRECVID Workshops
Place of Presentation
Washington, USA
Year and Date
2006-11-13
Description
「研究成果報告書概要(欧文)」より
Related Report
-
-
[Presentation] Robust Scene Recognition Using Language Models2006
Author(s)
Ryoichi, Ando, Koichi, Shinoda, Sadaoki, Furui, Takahiro, Mochizuki
Organizer
MIR 2006, ACM Workshop 2006
Place of Presentation
Santa Barbara, California, USA
Year and Date
2006-10-27
Description
「研究成果報告書概要(欧文)」より
Related Report
-
-
-
-
-
-
-
-
-
-
-
-
-
-
[Presentation] Scene Recognition for TV Baseball Program Using Acoustic Information2006
Author(s)
Taro, Miyazaki, Hiromitsu, Nakagawa, Ryuta, Nakagawa, Koji, Iwano, Koichi, Shinoda, Sadaoki, Furui
Organizer
The 2006 Spring meeting of ASJ
Place of Presentation
Tokyo
Year and Date
2006-03-14
Description
「研究成果報告書概要(欧文)」より
Related Report
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
[Presentation] Study of Interface Having Simultaneous Inputs of Speech and Handwritten Characters2005
Author(s)
Ryuta, Nakagawa, Yui, Kobayashi, Ryuji, Kobayashi, Koichi, shinoda, Sadaoki, Furui
Organizer
Meeting of IPSJ-SLP
Place of Presentation
Hachiouji,Tokyo
Year and Date
2005-05-26
Description
「研究成果報告書概要(欧文)」より
Related Report
-