*Abhijit Debbarma, Paritosh Bhattacharya, Bipul Syam Purkayastha


Identifying and classifying a given entity from a text input corpus is the study of Named entity recognition (NER). The entities to be identified and classified can be the name of location, person name, organization, named of a product or drug etc. We have in our study chosen the entity of location, person, organization, number and name of day. We have tried to study the problem of named entity recognition for Kokborok language, a low resource language. Kokborok is the official language of the North Eastern Indian state of Tripura. It is also spoken in the state of Mizoram, Assam and in the hill areas of Bangladesh in the Chittagong Hill track region. The problem of NER can be solved through rule based method or by machine learning approach. In this paper we have tried the study the scope of Kokborok named entity in machine learning approach. We have tried to implement the support vector machine (SVM) approach to solve the problem of NER. We have experimented to study the feature for Kokborok named entity. Several features when combined together gives better result in our experiments. We have also used the frequency based dictionary lookup approach to find the named entities. This paper tries to study the hybrid approach model for Kokborok Named Entity Recognition (NER). Due to various limitations for a low resource language, the machine learning approach is more suitable. Limitation of the digital resource is a major drawback for low resource language. The result of the hybrid approach is found to perform better in identifying the named entity. We have obtained an F1 score of 83.3 for our work.

Issue: 02-Special Issue

Year: 2017

Pages: 738-747

Purchase this Article