Confusion matrices comparison across all evaluated models.

alt text
Baseline DistilBERT confusion matrices on the TweetEval emotion prediction task. Notice how the model appears to be heavily biased for classifying for optimism by default. [!h]
alt text
Baseline DistilRoBERTa confusion matrices on the TweetEval emotion prediction task. Notice how the model is slightly more distributed in classifying then DistilBERT base, but still tends to classify for sadness by default. [!h]
alt text
Confusion Matrices for bert-tweeteval-distilbert. Notice how the classifications now tend towards the original training distributions, in comparison to the original base classifications. [!h]
alt text
Confusion Matrices for bert-tweeteval-distilroberta. Notice how the classifications now tend towards the original training distributions, in comparison to the original base classifications. [!h]
alt text
Confusion Matrices for all LLM models and prompting strategies. Structured prompts clearly yielded better results.