โ
Hinglish Comments
Code-mixed social media
โ
Test Accuracy
CV: โ ยฑ โ
โ
ROC-AUC
Weighted one-vs-rest
โ
Feature Dimensions
Unigram + bigram
150
Hinglish Entries
Custom built lexicon
โ
Test Samples
80/20 stratified split
Normalization Impact โ Before vs After
Accuracy improvement from applying the Hinglish text normalization pipeline
โ
โ
โ
Gain from Normalization
โ
Avg Substitutions/Comment
Model Performance
Precision ยท Recall ยท F1 per Class
Classification report breakdown
Per Class
Confusion Matrix
Actual vs predicted ยท test set
10,000 samples
Sentiment Distribution
50,000 labeled comments
Dataset
Language Mix Analysis
Mix Type Distribution
Language mixing patterns
Token-level
Accuracy by Mix Type
Model performance per mix category
Mix ร Accuracy
Accuracy by Domain
Model performance per topic domain
Domain
Platform Distribution
Comments by social platform
Platforms
Data Insights
Monthly Sentiment Trend
Positive / Negative / Neutral over 24 months
2022โ2024
Sentiment by Domain
Distribution across 8 topic domains
Stacked
Normalization Pipeline โ Before & After
Sample Normalization Transformations
Raw Hinglish โ Normalized English โ rule-based + lexicon substitution
Examples
Top TF-IDF Features per Sentiment
Most Discriminative Features โ Logistic Regression Coefficients
Highest weighted unigrams and bigrams driving each sentiment class
Coefficients
Sample Predictions