A Feature Selection based Approach for Authorship Attribution

VSai Krishna, K Durga Prasad, A Phaneendra

Authorship Attribution (AA) is considered as a text classification problem in which author is assigned as class label for the given text by examining the writing skills of different authors. In order to differentiate the authors writing style the researchers extracted a wide variety of features such as content based, lexical, character based, syntactic, structural and readability features in the approaches of Authorship Attribution. The researchers found that content based features play a crucial role when contrasted with other features in the identification of the author. In this work, the content based features are used in the experiment. A feature selection algorithm is used to find the best informative terms to differentiate the writing style of authors. These terms are considered as a bag of terms and every document is represented as a vector. In the existing approaches the documents are represented as vectors with the frequency of terms as parameter. It is observed that the term frequency is not suitable to represent the terms in the representation of vector and there is a strong need to consider the distribution of the terms within the corpus. We used two supervised term weight measures to compute the weight of the term in the vector representation of documents. These vectors are passed to machine learning algorithms to generate the learning model. This learning model is used to detect the author of a new document. The experiment performed on a dataset of 5000 reviews of ten authors which was collected from Amazon website. The accuracy of our approach for Authorship Attribution is prominent when compared with several approaches.

Volume 12 | Issue 2

Pages: 744-755

DOI: 10.5373/JARDCS/V12I2/S20201091