[E2E 자율주행] (7)-5 Challenges: Policy Distillation

Autonomous Driving/End-to-End Autonomous Driving

구코딩 2024. 12. 16. 23:46

End-to-End Autonomous Driving과 관련된 다양한 게시물은
Introduction에서 확인하실 수 있습니다.

Policy Distillation

Imitation learning(또는 세부 항목인 behavior cloning)은 expert의 행동을 모방하는 지도학습이므로 일반적으로 Teacher-Student paradigm을 따름.

이와 관련한 두 main challenges:

Teacher(expert)가 완벽한 운전자가 아니지만 주변 agent와 map의 ground-truth에 접근 가능
Student는 오직 sensor input으로 기록된 output에 의해 지도되므로 perceptual feature를 얻고 policy를 처음부터 학습(learning from scratch)해야 함.

환경 state에 접근하여 행동하는 방법 학습
previleged ground-truth 정보에 접근하는 robust policy 학습.
이를 위한 input으로 더 compact한 BEV representation은 원래 expert보다 강력한 generalization 능력과 supervision 제공.

output stage에서 feature distillation, output imitation을 통해 privileged agent 모방
planning 결과 supervising 외에도 feature level에서 knowledge 추출.

분리된 패러다임(decoupled paradigm)은 teacher의 knowledge와 student의 training 효율성 높임.
- 여전히 비효율적
- Ex) privileged agent: 신호등(작고 찾기 어려움)의 ground-truth에 접근할 수 있음.
- visuomoter agent: privileged agent에 비해 큰 성능 차이가 있음.
- → student에게 casual confusion을 야기할 수 있음.
이 격차(previleged agent - visuomoter agent)를 최소화하는 것이 앞으로의 연구 방향.

FM-Net: feature training을 가이드하기 위해 segmentation과 optical flow model을 auxiliary teacher로 사용.
SAM: teacher/student networks 간에 L2 feature loss 추가
CaT: BEV에서 feature 정렬.
WoR: model-based action-value function을 학습하고 이를 사용하여 visuomoter policy를 지도함.
Roach: 더 강한 privileged exeprt를 RL로 학습하고, BC의 upper bound를 제거함.
- action distribution, values/rewards, latent feature 등의 여러 distillation 대상을 포함함.
TCP: 강력한 RL expert 활용해 단일 카메라를 input으로 사용하여 CARLA에서 sota 달성.
DriveAdapter: perception-only student와 feature alignment 목표를 가진 adapter 학습.

[E2E 자율주행] (7)-7 Challenges: Causal Confusion (0)	2024.12.29
[E2E 자율주행] (7)-6 Challenges: Interpretability (0)	2024.12.18
[E2E 자율주행] (7)-4 Challenges: World-model / Multi-task Learning (0)	2024.12.15
[E2E 자율주행] (7)-3 Challenges: Visual Abstraction / Representation Learning (1)	2024.12.14
[E2E 자율주행] (7)-2 Challenges: Sensing / Sensor-fusion / Input Modalities (1)	2024.12.13

공부한 것을 내 방식대로 정리해서 기록하는 블로그.

end-to-end autonomous driving, 컴퓨터비전, 네트워크, 알고리즘, 동적프로그래밍, DP, UVA, end-to-end autonomous driving:challenges and frontiers, 동적계획법, closed-loop, dynamic programming, 그리디알고리즘, 링크계층, e2e 자율주행 설명, C언어, link layer, e2e 자율주행, greedy algorithm, 네트워크계층, Imitation Learning,