20th IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology

Post published:November 5, 2021
Post category:News / Publication

December 14-17, 2021, Melbourne, Australia

New Paper Accepted: “On the Impact of Dataset Size: A Twitter Classification Case Study”

Authors: Thi Huyen Nguyen, Hoang H. Nguyen, Zahra Ahmadi, Tuan-Anh Hoang, Thanh-Nam Doan

The recent advent and evolution of deep learning models and pre-trained embedding techniques have created a breakthrough in supervised learning. Typically, we expect that adding more labeled data improves the predictive performance of supervised models, including deep neural networks. On the other hand, collecting more labeled data is not an easy task due to several difficulties, such as manual labor costs, data privacy, and storage limitation. Hence, a comprehensive study on the relation between training set size and the classification performance of different methods could be essentially useful in the selection of a learning model for a specific task. However, the literature lacks such a thorough and systematic study. In this paper, we concentrate on this relationship in the context of short, noisy texts from Twitter. We design a systematic mechanism to comprehensively observe the performance improvement of supervised learning models with the increase of data sizes on three well-known Twitter tasks: sentiment analysis, informativeness detection, and information relevance. Besides, we study how significantly better the recent deep learning models are compared to traditional machine learning approaches in the case of various data sizes. Our extensive experiments show (a) recent pre-trained models have overcome big data requirements, (b) a good choice of text representation has more impact than adding more data, and (c)adding more data is not always beneficial in supervised learning.

For more information about the conference click here:

Migration-Related Risks Caused by Misconceptions of Opportunities and Requirement

MIRROR has received funding from the European Union’s Horizon 2020 research and innovation action program under grant agreement No 832921.

20th IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology

December 14-17, 2021, Melbourne, Australia

New Paper Accepted: “On the Impact of Dataset Size: A Twitter Classification Case Study”

© All rights reserved

Imprint | Privacy Policy

December 14-17, 2021, Melbourne, Australia

New Paper Accepted: “On the Impact of Dataset Size: A Twitter Classification Case Study”

You Might Also Like

Successful collaboration between Malta Police Force and the University of Malta on the MIRROR project

New MIRROR Project Flyer! Grab a copy of it!

“Summarizing videos using concentrated attention and considering the uniqueness and diversity of the video frames” accepted by the ICMR’22 conference.

© All rights reserved

Imprint | Privacy Policy