ResNet10 Run Report

Summarized Findings

We modify the training pipeline to perform augmentation, following base transformations (in addition to our standard resize crop, horizontal flip, color jitter); we randomly apply mixup 50% of the time.

In a regular batch, we set:

Loss=mean(CE(logits,labels)1.0+CE(logits,labels)0.0)Loss = \text{mean}(CE(\text{logits}, \text{labels}) \cdot 1.0 + CE(\text{logits}, \text{labels}) \cdot 0.0)

which is just our standard cross entropy loss. When we set MixUp, this loss is altered to:

Loss=mean(λCE(logits,ya)+(1λ)CE(logits,yb))Loss = \text{mean}(\lambda \cdot CE(\text{logits}, y_a) + (1 - \lambda) \cdot CE(\text{logits}, y_b))

where λ\lambda is a vector values 0 to 1. This is the standard Mixup Loss. Our training loss is a sliding average of these two behaviors.

From our 69.50% baseline, we saw an increase to 72.00%.

Possible Improvements.

Introducing MixUp into the classification loss creates a divergence problem [1]. Yu-Ting Chang et al. attempt to resolve this in MixUp by sharpening the probability distribution:

Lent=1HWh,wcCPc(h,w)logPc(h,w)\mathcal{L}_{ent} = -\frac{1}{HW}\sum_{h,w}\sum_{c\in C}P^c(h,w)\log P^c(h,w)

Standard λ\lambda interpolation smooths the distribution, while their approach encourages confident predictions.

To prevent the attention maps from scattering non-uniformly across the image , they apply a concentration loss directly to the Class Activation Map (CAM):

Lcon(M)=cCˉh,wh,wμhc,μwc2M^c(h,w)\mathcal{L}_{con}(M) = \sum_{c\in \bar{C}}\sum_{h,w}||\langle h,w \rangle - \langle \mu_h^c, \mu_w^c \rangle||^2 \cdot \hat{M}^c(h,w)

This calculates the spatial center of mass (μhc,μwc)(\mu_h^c, \mu_w^c) for a given category’s activation. It then heavily penalizes activated pixels M^c(h,w)\hat{M}^c(h,w) that are physically far away from that center, encouraging the model to group related features together.

They combine these into their complete loss calculation:

Lall=Lcls(I,Y)+λentLent(I)+λconLcon(M)\mathcal{L}_{all} = \mathcal{L}_{cls}(I', Y') + \lambda_{ent}\mathcal{L}_{ent}(I') + \lambda_{con}\mathcal{L}_{con}(M)

Notebook

Please see the companion notebook. It contains:

[1]
Y.-T. Chang, Q. Wang, W.-C. Hung, R. Piramuthu, Y.-H. Tsai, and M.-H. Yang, “Mixup-CAM: Weakly-supervised Semantic Segmentation via Uncertainty Regularization.” 2020. [Online]. Available: https://arxiv.org/abs/2008.01201