Bert-TweetEval training loss diagrams.

alt text
Loss curve for DistilBERT TweetEval trainin. Loss makes a steady decline toward zero, successfully learning on training. By Epoch 9, it has nearly perfect prediction. However, we can tell our model is massively overfitting because our validation loss consistently rises, and the F1 score shifts quite dramatically, though this may be due to the validation being relatively scarce at less then 400 samples. For future training, this model requires greater regularization with augmentation techniques and more data. [!h]
alt text
Loss curve for DistilRoBERTa TweetEval trainin. Similar in form to DistilBERT overfitting, and actually makes an early stop at just 4 epochs, as it reached it's best Macro F1 score at Epoch 1 and yielded worse scores 3 times. This is also indicative that the model needs greater regularization, as it's lacking the substained training that would allow it to learn more about its domain. [!h]