Paper

[Paper] SelfD: Self-Learning Large-Scale Driving Policies From the Web

구코딩 2025. 2. 3. 12:49

Large&Unlabeled 온라인 데이터를 효과적으로 학습할 수 있는 매커니즘.

Online data
- 방대한 양의 online ego-centric 내비게이션 Large YouTube 비디오 데이터
- unconstrained&unlabeled demonstration 온라인 데이터를 활용하여 복잡하고 동적인 환경에서 강건한 비전 기반 내비게이션을 위한 일반화된 모델

iterative semi-supervised training
- Unlabeled data를 활용하기 위해 small labeled data에서 imitation learning.
- 이를 사용하여 pseudo-labeled data로부터 imitation agent 학습
- 초기 학습된 policy의 knowledge&robustness 효과적으로 augment

BEV space에서 직접 planning 학습 → 직접적인 reasoning
- data의 플랫폼 및 시점(perspective)에 구애받지 않음.
- dataset-agnostic & platform-agnostic 모델 설계

효과
- 다양한 환경과 시나리오에서 효과적으로 작동할 수 있는 확장 가능한 decision-making model
- Robustness&generalized

Contributions

unconstrained 이미지에서 학습할 수 있도록 새로운 모델 개발
- BEV 계획 space로 매핑 → 카메라 캘리브레이션 없이 다양한 환경에서도 학습 가능
새로운 반지도 학습 접근법을 제안
- hypothetical data augmentation을 포함한 self-training 기법을 도입.
- 다양한 품질의 demonstration data를 효과적으로 활용할 수 있도록, 새로운 샘플링 기법을 개발
cross-dataset experiments
- 초기 훈련된 의사결정 모델이 최소한의 데이터 가정 하에서 self-improvement 및 generalization 능력을 향상할 수 있는지 분석
- 다양한 환경에서 평가해 SOTA 수준의 일반화 성능 달성

Method

Overview

Initial BEV policy $f_\theta$ imitation learning

특정 도메인(domain-specific)의 small, labeled dataset $\mathcal D$를 활용하여, initial BEV policy $f_θ$ 학습.
supervised learning을 통한 imitation learning 수행.

$f_\theta$ 활용하여 large pseudo-labeled dataset $\hat {\mathcal D}$ 생성

학습된 fθ를 이용하여 unlabeled 데이터에 대해 pseudo-label 생성.
large pseudo-labeled dataset $\hat {\mathcal D}$ 구축.

$\hat {\mathcal D}$을 활용한 pre-training & $\mathcal D$를 활용한 fine-tuning

$\hat {\mathcal D}$에서 일반화된 정책 $f_θ$ pre-training
$\mathcal D$에서 fine-tuning하여 성능 향상.

Problem Setting

observations $\text x = (\mathbf I, v, c) \in \mathcal X$

agent가 driving navigational decision으로 매핑하는 것 학습.
$\mathbf I \in \mathbb{R}^{W \times H \times 3}$: 전방 카메라 이미지
$v \in \mathbb{R}$: ego-vehicle speed
$c \in \mathbb{N}$: categorical navigational command

$y \in Y$: waypoint trajectory → interpretability&generalization 뛰어남.

$f_\theta : \mathcal X \to \mathcal Y$ waypoint prediction function

$\theta \in \mathbb{R}^d$: learnable parameter

Conditional Imitation Learning from Observations(CILfO):

unlabeled on-line data $\mathcal U = \{\text I_i\}_{i=1}^M$에서도 학습이 가능하도록 label recovery 적용

적절한 waypoints, 조작 명령, 속도 등을 recover하여 large pseudo-labeled dataset 생성

Initial Data Assumption

학습을 시작하기 위한 small labeled data 필요

Initial policy 학습
- human expert demonstration을 사용해 small labeled dataset 사용.
- supervised learning based conditional imitation learning
- dataset $\mathcal D = \{(\text x_i, \text y_i)\}{i=1}^N$를 활용하여 waypoint prediction function **$f{\theta}$ 학습**
  - loss function optimization: $\displaystyle \min_{\theta} \mathbb{E}{(x, y) \sim D} \left[ \mathcal{L}(y, f\theta(x)) \right]$
pseudo-label 생성
- 학습된 모델을 이용하여 large unlabeled data에 대해 pseudo-label을 생성하고 이를 활용하여 더욱 강건한 내비게이션 정책을 학습
- data augmentation → policy의 robust 크게 향상
- $\hat{y}, \hat{c}, \hat{v}$ (예측된 경로, 내비게이션 명령, 속도)를 추정하여 데이터셋 $\hat{D} = \{((\mathbf I_i, \hat{v}_i, \hat{c}_i), \hat{y}i)\}{i=1}^M$를 생성.
- 이렇게 복원된 데이터셋을 활용하여 CIL을 적용해 policy learning.

BEV Plan Network

BEV space에서 waypoint를 직접 예측 + quality estimates 고려한 학습

monocular 이미지 기반 플래너를 활용하여 BEV 공간에서 직접 의사결정을 수행
다양한 perspectives에서도 robust&generalize
예측된 BEV 웨이포인트는 PID와 같은 low-level controller와 결합 가능

Monocular → BEV space

Confidence-aware learning 적용 + quality estimates 고려
모델을 $f_{\theta} : \mathcal X \to \mathcal Y \times \mathcal R$로 확장 & quality estimates $σ∈\mathcal R$ 추가

Loss Function

$\mathcal L = \mathcal L_{\text{plan}} + \lambda \mathcal L_{\text{quality}}$

$\mathcal L_{\text{plan}}$: predicted waypoint와 ground-truth 간 L1 loss.
$\mathcal L_{\text{quality}}$: 예측된 품질(quality)과 실제 품질 간의 Binary Cross-Entropy
$λ$: 두 가지 학습 목표를 조정하는 hyper-parameter.

“What If” Pseudo-Labeling of Unlabeled Data

Unlabeled imgae $\mathcal U$에 대해 self-training을 통해 pseudo-labels 생성

"What If" Augmentation

On-line video data: 노이즈 심해 신뢰X trajectories + demonstration 안전X&복원 어려움.
unlabeled single-frame에 대해 $f_θ$를 활용하여 hypothetical future trajectories 생성, pseudo-label로 활용 → 학습 데이터 확장

Method

speed $\hat v$ & command $\hat c$ random sampling. (가상의 값)
pseudo-labels $(\hat {\mathbf y},\hat {\boldsymbol {σ}}) = f_θ(\mathbf I, \hat v, \hat c)$ → 다양한 주행 시나리오에 대한 가상 라벨링 수행
Label quality estimates $\hat \sigma$: 노이즈가 심한 trajectory를 필터링 → 높은 신뢰도 데이터셋 구축

Future Trajectory

BEV 기준 meter 단위, 속도는 m/s 단위.
Conditional Commands: left = 1 / forward = 2 / right = 3

효과

다양한 시나리오를 학습 가능
누락된 속도 및 명령 입력을 보완 + 추가 supervision
조건부 에이전트가 특정 상황에서 어떻게 행동해야 할지를 더 잘 판단할 수 있도록 돕는다
일반화 성능 향상

Model Pre-Training and Fine-Tuning

두 dataset $\hat {\mathcal D}$ 와 $\mathcal D$ 에 대해 별도로 학습, learned representations 통해 knowledge transfer.

$f_{\theta}$ 를 $\hat {\mathcal D}$에 대해 re-train.
사전 학습된 policy $f_\theta$를 $\mathcal D$에 대해 추가로 fine-tune.

효과

$\hat {\mathcal D}$로부터 얻은 추가적인 지식을 활용하여 성능 향상
learning rate 등 세밀한 hyper-parameter tuning 필요X
labeled data와 pseudo-labeled data 섞는 비율을 신중하게 조정할 필요X

Experiments

Experimental Setup

Dataset

YouTube 운전 영상 (100시간): 다양한 도시, 날씨, 낮/밤 조건을 포함한 운전 데이터 수집
공개 자율주행 데이터셋 사용
- nuScenes (Boston, Singapore) → 실험을 위해 두 지역으로 분할
- Waymo (8개 도시)
- Argoverse
목적: 도시 간 도메인 차이를 고려하여 모델이 얼마나 일반화할 수 있는지 평가

Evaluation Metrics

Open-Loop (BEV 기반 waypoint 예측 정확도)

ADE (Average Displacement Error)
- 미래의 waypoints에 대해 평균 L2 distance error 측정
FDE (Final Displacement Error)
- 미래의 마지막 waypoint의 L2 거리 오차
- 마지막 예측 지점의 정확도를 측정하는 지표
Collision Rate
- 예측된 waypoints가 다른 차량과 충돌할 확률
- nuScenes, Argoverse에서만 평가 가능

Closed-Loop (CARLA 시뮬레이션 주행 성능 평가)

SR (Success Rate, 성공률)
- 목표 지점까지 도달한 비율
RC (Route Completion, 주행 경로 완료율)
- 전체 주행 경로에서 얼마나 많이 진행했는지 (%)
Collision Frequency (충돌 빈도, per 10km)
- 10km 주행당 충돌 횟수

Model Architecture

기존 CIL 모델
- 이미지에서 2D waypoints 예측
- 고정된 투영 변환을 통해 BEV로 변환 (정확도 낮음)
- ADE (Average Displacement Error): 1.86
새로운 BEV Planner (제안 방법)
- 이미지에서 직접 BEV waypoints 예측
- 카메라 보정 정보 (intrinsic/extrinsic parameters) 없이 학습
- ADE: 1.14

Pseudo-Labeling

YouTube 데이터에 대한 pseudo-labeling을 적용해 추가 학습
Visual Odometry (VO) 모델을 pseudo-label 생성에 활용했지만, 성능이 낮아짐
제안된 "What If" Augmentation 적용 시 성능 향상!
Self-Training 없이 학습하면 ADE: 1.18
Self-Training + What If Augmentation 적용 시 ADE: 1.14

CARLA Closed-Loop Evaluation

'Paper' 카테고리의 다른 글

[Paper] LMDrive: Closed-Loop End-to-End Driving with Large Language Models (0)	2025.02.07
[Paper] (PPGeo) Pre-training for Autonomous Driving via Self-supervised Geometric modeling (1)	2025.02.05
[Paper] (MoCo) Momentum Contrast for Unsupervised Visual Representation Learning (0)	2025.01.28
[Paper] ViT - Vision Transformer (3)	2025.01.19
[Paper] Transformer: Attention is All You Need (1)	2025.01.14

현재글[Paper] SelfD: Self-Learning Large-Scale Driving Policies From the Web

공부한 것을 내 방식대로 정리해서 기록하는 블로그.

동적계획법, 링크계층, DP, 알고리즘, UVA, Imitation Learning, end-to-end autonomous driving:challenges and frontiers, e2e 자율주행, 그리디알고리즘, C언어, end-to-end autonomous driving, 네트워크계층, dynamic programming, 컴퓨터비전, greedy algorithm, e2e 자율주행 설명, link layer, 동적프로그래밍, 네트워크, closed-loop,

Today :
Yesterday :

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

9._.coding

[Paper] SelfD: Self-Learning Large-Scale Driving Policies From the Web

Large&Unlabeled 온라인 데이터를 효과적으로 학습할 수 있는 매커니즘.

Contributions

Method

Overview

Initial BEV policy $f_\theta$ imitation learning

$f_\theta$ 활용하여 large pseudo-labeled dataset $\hat {\mathcal D}$ 생성

$\hat {\mathcal D}$을 활용한 pre-training & $\mathcal D$를 활용한 fine-tuning

Problem Setting

Conditional Imitation Learning from Observations(CILfO):

Initial Data Assumption

BEV Plan Network

Monocular → BEV space

Loss Function

“What If” Pseudo-Labeling of Unlabeled Data

"What If" Augmentation

Model Pre-Training and Fine-Tuning

Experiments

Experimental Setup

Dataset

Evaluation Metrics

Model Architecture

Pseudo-Labeling

CARLA Closed-Loop Evaluation

'Paper' 카테고리의 다른 글

'Paper'의 다른글

티스토리툴바

[Paper] SelfD: Self-Learning Large-Scale Driving Policies From the Web

Large&Unlabeled 온라인 데이터를 효과적으로 학습할 수 있는 매커니즘.

Contributions

Method

Overview

Initial BEV policy $f_\theta$ imitation learning

$f_\theta$ 활용하여 large pseudo-labeled dataset $\hat {\mathcal D}$ 생성

$\hat {\mathcal D}$을 활용한 pre-training & $\mathcal D$를 활용한 fine-tuning

Problem Setting

Conditional Imitation Learning from Observations(CILfO):

Initial Data Assumption

BEV Plan Network

Monocular → BEV space

Loss Function

“What If” Pseudo-Labeling of Unlabeled Data

"What If" Augmentation

Model Pre-Training and Fine-Tuning

Experiments

Experimental Setup

Dataset

Evaluation Metrics

Model Architecture

Pseudo-Labeling

CARLA Closed-Loop Evaluation

'Paper' 카테고리의 다른 글

'Paper'의 다른글

관련글

티스토리툴바