Paper

[Paper] (MoCo) Momentum Contrast for Unsupervised Visual Representation Learning

구코딩 2025. 1. 28. 12:52

contrastive learning을 위한 dynamic dictionary를 구축하는 메커니즘

https://arxiv.org/pdf/1911.05722

특징

unsupervised visual representation learning
contrastive learning을 dictionary 형태로 바라봄.
queue와 moving-average encoder 사용.
downstream task에 잘 사용 가능

Dictionary

Large

continuous, high-dimensional visual space에서 sampling을 잘 하기 위해서 dictionary size가 커야 함.
기저에 존재하는 연속적이고 고차원적인 시각 공간을 더 잘 샘플링할 수 있음.

Queue

현재 mini-batch의 인코딩된 표현들이 큐에 추가(enqueue)되고, 가장 오래된 샘플들은 큐에서 제거(dequeue)
이전 mini-batch의 key들을 재사용할 수 있음
dictionary 크기를 mini batch size와 분리 → dictionary 크게 유지

Consistency

사전의 키가 이전의 여러 미니배치에서 생성되기 때문에, 쿼리 인코더의 momentum-based moving average으로 구현된 점진적으로 업데이트되는 키 인코더를 사용하여 일관성을 유지하는 방식
dictionary의 key가 일관되어야 query와의 비교 또한 일관되므로 좋은 성능을 낼 수 있음

Update

큰 dictionary 크기로 인해 모든 representation에 대해 모두 back-propagation 수행 불가/
query encoder $f_q$를 key encdoer $f_k$로 복사 → consistency를 크게 해침.
momentum moving average로 $f_k$ 점진적 update.

Method

encoded query $q$와 encoded key의 dictionary $\{k_0, k_1, k_2, ...\}$가 있을 때

dictionary에서 $q$와 매칭되는 key $k_+$가 있다고 하자.
contrastive loss는 $q$가 $k_+$와 유사도가 높고 나머지 key와는 유사도가 낮을 때 작은 값을 가짐.

Loss function: InfoNCE

$\mathcal L_q = -\log \cfrac{\exp\left(q \cdot k_+/{\tau}\right)}{\sum_{i=0}^{K} \exp\left(q \cdot k_i/{\tau}\right)}$

$\tau$: temperature hyperparameter
- 0.07로 세팅
- similarity 점수의 스케일링 역할
- 큰 값: positive/negative key 간 차이 강조 → 미세한 차이에 민감
- 작은 값: negative 고르게 취급. → 모델이 완만히 학습
분모: positive sample 하나와 $K$개의 negative sample에 대해 sum.
$q$를 $k_+$로 분류하려고 하는 $(K+1)$-way softmax-based classifier로 해석할 수 있음.
$q = f_q(x^q)$
$k = f_k(x^k)$
$f_q, f_k$: query/key sample의 encoder network.

Momentum update

$\theta_k \gets m\theta_k + (1 - m)\theta_q$

인코더를 부드럽게 변화하도록 해 key의 일관성을 유지하도록 함.

encoder를 back propagation하지 않고 천천히 업데이트
$m$을 크게 하여 매우 천천히 업데이트하도록 하는 것이 성능 향상에 도움이 되었음.

Relations to previous mechanisms

End-to-End

mini-batch에서 positive sample과 negative sample을 만들고 encoder를 통과시켜 contrastive loss를 통해 각 encoder를 학습
키는 동일한 인코더로 생성 → consistency 높음.
dictionary 크기는 mini-batch 크기에 종속 → GPU 메모리 한계

Memory Bank

memory bank에 모든 샘플들의 representation을 넣어두고, memory bank에서 무작위로 샘플링해서 key로 사용
샘플링된 키는 역전파 없이 사용 → 큰 dictionary 크기
memory bank에 여러 시점의 샘플들을 보관 → consistency 낮음

Pretext Task

Instance Discrimination

양성 샘플 (positive pair): 같은 이미지에서의 query와 key.
- 같은 이미지에 대해 random views 생성 (augmentation)
- query와 key를 각각 쿼리 인코더 $f_q$와 키 인코더 $f_k$를 사용해 인코딩.
음성 샘플 (negative pair): 서로 다른 이미지에서 유래한 query와 key.

Encoder

ResNet
FCN: 128-dimension
L2-norm
Augmentation: random color jittering, random horizontal flip, random grayscale conversion

Shuffling BN

치팅 문제를 방지하고 BN의 혜택을 유지하는 방법
각 GPU에서 batch normalization 독립적 수행

key encoder $f_k$

현재 미니배치의 샘플 순서를 GPU에 배분하기 전에 랜덤 셔플링.
인코딩 후 다시 원래 순서로 복원.

query encoder $f_q$

sample 순서 변경X

Experiments

Dataset: ImageNet-1M, Instagram-1B

Training

SGD / Weight decay: 0.0001 / Momentum: 0.9
ImageNet-1M:
- mini-batch size: 256 (8 GPU 사용)
- initial learning rate: 0.03
- training epoch: 200 (120, 160 epoch에서 learning rate 0.1배 감소)
- ResNet-50 학습 시간: 약 53시간
Instagram-1B:
- mini-batch size: 1024 (64 GPU 사용)
- initial learning rate: 0.12 (62,500회 반복마다 0.9배 감소)
- ResNet-50 학습 시간: 약 6일

References

https://ffighting.net/deep-learning-paper-review/self-supervised-learning/moco/

https://velog.io/@kowoonho/논문-리뷰-Momentum-Contrast-for-Unsupervised-Visual-Representation-Learning-MoCo

https://kyujinpy.tistory.com/40

https://hongl.tistory.com/122#google_vignette

'Paper' 카테고리의 다른 글

[Paper] LMDrive: Closed-Loop End-to-End Driving with Large Language Models (0)	2025.02.07
[Paper] (PPGeo) Pre-training for Autonomous Driving via Self-supervised Geometric modeling (1)	2025.02.05
[Paper] SelfD: Self-Learning Large-Scale Driving Policies From the Web (2)	2025.02.03
[Paper] ViT - Vision Transformer (3)	2025.01.19
[Paper] Transformer: Attention is All You Need (1)	2025.01.14

현재글[Paper] (MoCo) Momentum Contrast for Unsupervised Visual Representation Learning

공부한 것을 내 방식대로 정리해서 기록하는 블로그.

링크계층, Imitation Learning, 네트워크계층, link layer, greedy algorithm, 그리디알고리즘, UVA, end-to-end autonomous driving:challenges and frontiers, 동적계획법, DP, 네트워크, dynamic programming, 알고리즘, end-to-end autonomous driving, C언어, e2e 자율주행, 컴퓨터비전, closed-loop, 동적프로그래밍, e2e 자율주행 설명,

Today :
Yesterday :

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

9._.coding

[Paper] (MoCo) Momentum Contrast for Unsupervised Visual Representation Learning

contrastive learning을 위한 dynamic dictionary를 구축하는 메커니즘

특징

Dictionary

Large

Queue

Consistency

Update

Method

Loss function: InfoNCE

Momentum update

Relations to previous mechanisms

End-to-End

Memory Bank

Pretext Task

Instance Discrimination

Encoder

Shuffling BN

Experiments

Training

References

'Paper' 카테고리의 다른 글

'Paper'의 다른글

티스토리툴바

[Paper] (MoCo) Momentum Contrast for Unsupervised Visual Representation Learning

contrastive learning을 위한 dynamic dictionary를 구축하는 메커니즘

특징

Dictionary

Large

Queue

Consistency

Update

Method

Loss function: InfoNCE

Momentum update

Relations to previous mechanisms

End-to-End

Memory Bank

Pretext Task

Instance Discrimination

Encoder

Shuffling BN

Experiments

Training

References

'Paper' 카테고리의 다른 글

'Paper'의 다른글

관련글

티스토리툴바