Assamese newspapers, media agencies, bloggers are the fastest growing in the region of Northeast India. Assamese is read and spoken in Assam, Meghalaya, Arunachal Pradesh, and Nagaland and in some bordering areas of Bhutan and Bangladesh. These news, media, blogger agencies have been generating digitalized text as they are reporting basically on everything. The main highlight of the paper is on the problem of event detection in Assamese language i.e. to identify event of interest from Assamese text. Examining this kind of contents can be extremely valuable, allowing the user, institute or organisation to acquire knowledge. This model aims to observe real word happenings in that space and time. In this paper, the researchers have introduced an event extraction rule and a powerful learning algorithm which is Conditional Random Field (CRF) method, to seize the events. A vast dataset has been used to train the model. The main objective of the paper is to draw specific knowledge to predict the events specified in digital text. As some text are noisy and may not carry any valuable information which makes the procedure more critical to extract any information. CRF are trained on a set of certain features and the choosing these features are an important step in the learning process. They are used for labeling a sequence of tokens observed in a text. In this research paper, an algorithm to find relevant features (event) extraction from Assamese text have been proposed.
Volume 12 | 04-Special Issue
Pages: 1370-1375
DOI: 10.5373/JARDCS/V12SP4/20201615