Menu

Monday, December 20, 2021

Sentiment Analysis for Indic Language : Hindi

 This article exhibits how to use the library VADER  for doing the sentiment analysis.

Sentiment analysis is a metric to that conveys how positive or negative or neutral the text or data is. It is performed on textual data to help businesses monitor brand and product sentiment in customer feedback, and understand customer needs. It is time-efficient, cost-friendly solution to analyse huge data.

Python avails great support for doing sentiment analysis of data. Few of the libraries available for this purpose are: NLTK, TextBlob and VADER.

For doing sentiment analysis of Indic languages such as Hindi we need to do following tasks.

1.   Read the text file which is in Hindi.

2.   Translate the sentences in Hindi to the sentences in English as the python libraries do support text-analysis in the English language. (Even if you give the Hindi sentences to such functions the ‘compound score’ which is metric of the sentiment if the sentence is calculated in a wrong manner. So before computing this metric conversion to the equivalent sentence in the English language is appropriate.)  The Google Translator helps in this task.

3.   Do sentiment analysis of the translated text using any of the libraries mentioned above.

 The following steps need to be done.

Step 1: Import the necessary libraries / packages.


# codecs provides access to the internal Python codec registry

import codecs

 

# This is to translate the text from Hindi to English

from deep_translator import GoogleTranslator

 

# This is to analyse the sentiment of text

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

Step 2: Read the file data.  The ‘codecs’ library provides access to the internal Python codec registry.  Most standard codecs are text encodings, which encode text to bytes. Custom codecs may encode and decode between arbitrary types.

# Read the hindi text into 'sentences'

with codecs.open('SampleHindiText.txt', encoding='utf-8') as f:

    sentences = f.readlines()

§  Step 3: Translate the sentences read into the English so that VADER library can process the translated text for sentiment analysis. The polarity_scores() returns the sentiment dictionary of the text which includes the ‘'compound'’ score that tells about the sentiment of the sentence as given below.

* positive sentiment: compound score >= 0.05 

* Neutral sentiment : compound score > -0.05 and compound score < 0.05

* Negative sentiment : compound score <= -0.05

for sentence in sentences:

    translated_text = GoogleTranslator(source='auto', target='en').translate(sentence)

    #print(translated_text)

    analyzer = SentimentIntensityAnalyzer()

    sentiment_dict = analyzer.polarity_scores(translated_text)

   

    print("\nTranslated Sentence=",translated_text, "\nDictinary=",sentiment_dict)

    if sentiment_dict['compound'] >= 0.05 :

            print("It is a Poistive Sentence")

            

    elif sentiment_dict['compound'] <= - 0.05 :

            print("It is a Negative Sentence")     

    else :   

           print("It is a Neutral Sentence"))

·        The source file 'SampleHindiText.txt' is as given below.

गोवा की यात्रा बहुत अच्छी रही।

समुद्र तट बहुत गर्म थे।

मुझे समुद्र तट पर खेलने में बहुत मजा आया।

मेरी बेटी बहुत गुस्से में थी।

 

·        The output of the code is shown as below.



The article has been contributed by Rupali Kulkarni.

        rdkulkarni21@kkwagh.edu.in

    90118966811



No comments:

Post a Comment