Identification of Misconceptions about Corona Outbreak Using Trigrams and Weighted TF-IDF Model

Sujatha Arun Kokatnoor and Balachandran Krishnan

Misconceptions of a particular issue like health, diseases, politics, government policies, epidemics and pandemics have been a social issue for a number of years, particularly after the advent of social media, and often spread faster than true truth. The engagement with social media like Twitter being one of the most prominent news outlets continuing is a major source of information today, particularly the information distributed around the network. In this paper, the efficacy of Misconception Detection System was tested on Corona Pandemic Dataset extracted from Twitter posts. A Trigram and a weighted TF-IDF Model followed by a supervised classifier were used for categorizing the dataset into two classes: one with misconceptions about COVID-19 virus and the other comprising correct and authenticated information. Trigrams were more reliable as the functional words related to coronavirus appeared more frequently in the corpus created. The proposed system using a combination of trigrams and weighted TF-IDF gave relevant and a normalized score leading to an efficient creation of vector space model and this has yielded good performance results when compared with traditional approaches using Bag of Words and Count Vectorizer technique where the vector space model was created only through word count.

Volume 12 | 05-Special Issue

Pages: 524-533

DOI: 10.5373/JARDCS/V12SP5/20201788