研究課題/領域番号 |
17K12739
|
研究機関 | 東京工業大学 |
研究代表者 |
ドローズド アレクサンドロ 東京工業大学, 学術国際情報センター, 研究員 (90740126)
|
研究期間 (年度) |
2017-04-01 – 2019-03-31
|
キーワード | benchmarking / evaluation / embeddings |
研究実績の概要 |
During FY 2017 we have implemented a set of benchmarks which can be applicable to evaluation of text corpora quality. Our toolbox includes benchmarks for part of speech tagging, named entity recognition, chunking, sentiment polarity classification, word similarity judgment and analogical reasoning. All these benchmarks are integrated into a library called VSMlib and which is publicly available. Additionally we have done some work on interpretability of the results of these benchmarks, i.e. linking them to certain linguistic properties being captured by word embeddings. We have also made some preparatory engineering work for extracting raw texts from Common Crawl archives, performing deduplication etc.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
2: おおむね順調に進展している
理由
The project was being implement roughly as it was planned in the original proposal. The only impediment is that the amount of funding was decreased as compared to what was requested in the application, particulary
|
今後の研究の推進方策 |
In FY 2018 we will focus on using developed evaluation tools for producing actual corpora. We will keep improve quality of text filtering algorithms and evaluation benchmarks. Additionally we will work on performance and scalability aspects of our toolchain.
|
次年度使用額が生じた理由 |
We had to move 100000 yen from FY 2018 to FY 2017 budget as is was the minimal transferable amount, but we needed only half of this sum. Transferring remaining 48143 yen to FY 2018 budget would very helpful, as in FY 2018 purchase of additional equipment was planned.
|