Implementation of Text Mining (baby steps)

STEP :

Tokenizing
Stemming
Analyzing
Result / Knowledge

Install nltk : pip install nltk Install sklearn : pip install scikit-learn

Open Visual Studio Code, then type this :

import pandas as pd

import nltk

from nltk.sentiment.vader import SentimentIntensityAnalyzer

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize

from nltk.stem import WordNetLemmatizer

# download nltk corpus (first time only) import nltk

nltk.download("all")

# Load the amazon review dataset

df = pd.read_csv(

    "https://raw.githubusercontent.com/pycaret/pycaret/master/datasets/amazon.csv"

)

def preprocess_text(text):

    # Tokenize the text

    tokens = word_tokenize(text.lower())

    # Remove stop words

    filtered_tokens = [

        token for token in tokens if token not in stopwords.words("english")

]

    # Lemmatize the tokens

    lemmatizer = WordNetLemmatizer()

    lemmatized_tokens = [lemmatizer.lemmatize(token) for token in filtered_tokens]

    # Join the tokens back into a string

    processed_text = " ".join(lemmatized_tokens)

    return processed_text

analyzer = SentimentIntensityAnalyzer()

# create get_sentiment function

def get_sentiment(text):

    scores = analyzer.polarity_scores(text)

    sentiment = 1 if scores["pos"] > 0 else 0

    return sentiment

# apply get_sentiment function

df["sentiment"] = df["reviewText"].apply(get_sentiment)

from sklearn.metrics import confusion_matrix

print(confusion_matrix(df["Positive"], df["sentiment"]))

from sklearn.metrics import classification_report

print(classification_report(df["Positive"], df["sentiment"]))

Running them, and here's the result :

precision recall f1-score support 0 0.69 0.29 0.41 4767 1 0.81 0.96 0.88 15233 accuracy 0.80 20000 macro avg 0.75 0.62 0.64 20000 weighted avg 0.78 0.80 0.77 20000

Cari Blog Ini

Dyo Rizqal Pahlevi - 21572001, Teknik Informatika

Implementation of Text Mining (baby steps)

Komentar

Posting Komentar

Postingan populer dari blog ini

Panduan Menginstall Numpy, Pandas, dan Matplotlib pada Python Visual Studio Code