ResNet10 Run Report

Summarized Findings

We modify the training pipeline to perform augmentation, following base transformations (in addition to our standard resize crop, horizontal flip, color jitter); we randomly apply mixup 50% of the time.

In a regular batch, we set:

Loss = \text{mean}(CE(\text{logits}, \text{labels}) \cdot 1.0 + CE(\text{logits}, \text{labels}) \cdot 0.0)

which is just our standard cross entropy loss. When we set MixUp, this loss is altered to:

Loss = \text{mean}(\lambda \cdot CE(\text{logits}, y_a) + (1 - \lambda) \cdot CE(\text{logits}, y_b))

where $\lambda$ is a vector values 0 to 1. This is the standard Mixup Loss. Our training loss is a sliding average of these two behaviors.

From our 69.50% baseline, we saw an increase to 72.00%.

Possible Improvements.

Introducing MixUp into the classification loss creates a divergence problem [1]. Yu-Ting Chang et al. attempt to resolve this in MixUp by sharpening the probability distribution:

\mathcal{L}_{ent} = -\frac{1}{HW}\sum_{h,w}\sum_{c\in C}P^c(h,w)\log P^c(h,w)

Standard $\lambda$ interpolation smooths the distribution, while their approach encourages confident predictions.

To prevent the attention maps from scattering non-uniformly across the image , they apply a concentration loss directly to the Class Activation Map (CAM):

\mathcal{L}_{con}(M) = \sum_{c\in \bar{C}}\sum_{h,w}||\langle h,w \rangle - \langle \mu_h^c, \mu_w^c \rangle||^2 \cdot \hat{M}^c(h,w)

This calculates the spatial center of mass $(\mu_h^c, \mu_w^c)$ for a given category’s activation. It then heavily penalizes activated pixels $\hat{M}^c(h,w)$ that are physically far away from that center, encouraging the model to group related features together.

They combine these into their complete loss calculation:

\mathcal{L}_{all} = \mathcal{L}_{cls}(I', Y') + \lambda_{ent}\mathcal{L}_{ent}(I') + \lambda_{con}\mathcal{L}_{con}(M)

Notebook

Please see the companion notebook. It contains:

Training Confusion Matrix
Validation Confusion Matrix
Training Classification Report
Validation Classification Report
Training GradCam
Validation GradCam

[1]

Y.-T. Chang, Q. Wang, W.-C. Hung, R. Piramuthu, Y.-H. Tsai, and M.-H. Yang, “Mixup-CAM: Weakly-supervised Semantic Segmentation via Uncertainty Regularization.” 2020. [Online]. Available: https://arxiv.org/abs/2008.01201