ResNet10 Run Report
Summarized Findings
We modify the training pipeline to perform augmentation, following base transformations (in addition to our standard resize crop, horizontal flip, color jitter); we randomly apply mixup 50% of the time.
In a regular batch, we set:
which is just our standard cross entropy loss. When we set MixUp, this loss is altered to:
where is a vector values 0 to 1. This is the standard Mixup Loss. Our training loss is a sliding average of these two behaviors.
From our 69.50% baseline, we saw an increase to 72.00%.
Possible Improvements.
Introducing MixUp into the classification loss creates a divergence problem [1]. Yu-Ting Chang et al. attempt to resolve this in MixUp by sharpening the probability distribution:
Standard interpolation smooths the distribution, while their approach encourages confident predictions.
To prevent the attention maps from scattering non-uniformly across the image , they apply a concentration loss directly to the Class Activation Map (CAM):
This calculates the spatial center of mass for a given category’s activation. It then heavily penalizes activated pixels that are physically far away from that center, encouraging the model to group related features together.
They combine these into their complete loss calculation:
Notebook
Please see the companion notebook. It contains:
- Training Confusion Matrix
- Validation Confusion Matrix
- Training Classification Report
- Validation Classification Report
- Training GradCam
- Validation GradCam