Authorship profiling is about finding out different characteristic of an author like age, gender, native languages, education background etc., by finding out the patterns in their writing. Blog authors write about a lot of topics like purchase decisions, digital advertising, personality development, fitness, technology updates etc., and these authors play an influential role on its readers. In this paper, we are categorizing the blog authors in three different age groups based on the content available from the blog. Natural Language Toolkit (NLTK) is a set of libraries used for natural language processing to distinguish among the different writing pattern of the author based on the different age groups. NLTK helps to make analysis on the words of the blogs which is an important feature in our research. We also wanted to conduct sentiment analysis on the blog in order to understand the insight on how the author feels about the blog topic. Thus, we have used Naïve Bayes Classifier for doing the analysis and considered two sentiments for the same: positive and negative. An average accuracy of 66.78% was achieved in predicting the age of authors. From the sentiment analysis we figured out that elder authors tend to have more positivity in their blogs as compared to younger authors.
Volume 11 | 06-Special Issue