Archives

A Supervised Term Relevance Weighting Method for Arabic Text Classification


Ghassan Khazal Ali and Alexander Zamyatin
Abstract

Vector Space Model is the process of converting textual data into a vector of terms in a term space so that the text data can be classified using a classification algorithm. Terms may be either words or phrases. Since terms have different levels of importance in a text, term weighting is needed to assign appropriate weights to the terms to improve the performance of the text classification or information retrieval. Most Arabic text classification researchers represent the text using a term frequency–inverse document frequency (TF–IDF) method. We propose a supervised term weighting method (TF–RF) to improve the accuracy for Arabic text classification. From the experimental results, this term-weighting method performed better than other methods that based on information theory or statistical metric perform the worst in all experiments. By contrast, the popularly used TF–IDF method did not perform uniformly well on different data sets. The goal of this work is to compare feature representation with TF–IDF and TF–RF using two kinds of text classification algorithms: support vector machines and neural networks. The results showed that the TF–RF with a support vector machine performed best achieving anF1-measure of 83.27%.

Volume 11 | Issue 11

Pages: 206-212

DOI: 10.5373/JARDCS/V11I11/20193189