[E2E 자율주행] (7)-3 Challenges: Visual Abstraction / Representation Learning

Autonomous Driving/End-to-End Autonomous Driving

[E2E 자율주행] (7)-3 Challenges: Visual Abstraction / Representation Learning

구코딩 2024. 12. 14. 11:02

End-to-End Autonomous Driving과 관련된 다양한 게시물은
Introduction에서 확인하실 수 있습니다.

Dependence on Visual Abstraction

End-to-End autonomous driving system에서의 두 단계:

state를 latent feature representation으로 encoding
intermediate features 기반으로 driving policy decoding.

문제점

도시 주행에서 주변 환경과 ego state가 매우 다양하고 고차원적.
이는 representation과 policy making에 필요한 attention area의 불일치 발생 가능.

해결

Intermediate Perception Represenatation을 잘 설계
visual-encoder를 proxy task으로 사전 학습.

효과

중요한 정보를 효율적으로 추출
subsequent policy stage 강화.
강화학습의 sample efficiency 높일 수 있음.

Representation Design

CNN 기반: translation equivariance와 효율성 장점. 여전히 주요 기술로 사용됨. Depth 사전 학습 CNN은 성능을 대폭 향상.
Transformer 기반: 확장성이 뛰어나지만, 아직 E2E Driving에서 광범위하게 채택되지 않음.
Bird's-Eye View (BEV): sensor modality와 temporal information를 통합해 3D 공간 내 통일된 표현을 제공, downstream 작업에 적합.
Grid-based 3D occupancy: 불규칙한 객체(irregular ojects)를 포착하고 collision avoidance 계획에 사용, BEV에 비해 computation cost가 큼.

representations of the map: HD Maps

전통적 자율주행에서 자주 사용.
비용이 높아, BEV segmentation, vectorized lanlines, centerlines topology, lane segments 등 다양한 대안이 제시됨.
E2E driving에 가장 적합한 방식이 아직 검증되지 않음.

Representation Learning

Inductive Bias와 prior information 통합.

학습된 표현에서 information bottleneck이 발생할 수 있고, decision과 무관한 redundant context는 제거될 수 있음.

Segmentation Mask

초기 방식은 사전 학습된 네트워크의 segmentation mask를 policy training의 input representation으로 사용함.
SESR: VAE를 활용해 segmentation mask를 클래스별로 구분된 representation으로 encoding.
신호등 상태, 차선 중심 오프셋, 앞차와의 거리 같은 affordance indicator를 policy learning에 활용.

위와 같은 방식은 사람으로부터 정의된 병목 현상(=information bottleneck)을 발생시킬 수 있고, 유용한 정보 손실(=redundant context).

몇 가지 방식은 사전 학습 작업으로부터 intermediate feature를 RL training에서 효과적인 representation으로 선택.

VAE의 latent features는 segmentation의 diffused boundary에서 얻은 attention map에 의해 augment되고, depth map은 highlight important region이 됨.
TARP: 다른 task와 관련된 prediction task를 수행한 이전 작업으로부터 얻은 data 활용하여 유용한 표현 얻음.
latent representation은 rewards의 차이와 dynamics model의 output으로 구성된 $\pi$-bismulation metric으로부터 근사하여 학습
ACO: contrastive learning structure에 steering angle categorization을 추가하여 discriminative features 학습.
PPGeo: uncalibrated driving video에서 self-supervised 방식으로 motion prediction과 depth estimation을 결합해 representation learning
ViDAR: raw image-point cloud pairs 사용하여 point cloud forecasting pre-task로 visual encoder 사전 학습.

large-scale unlabeled data를 통한 자기 지도 표현 학습은 policy learning에 유망하며 추가 연구가 필요함

'Autonomous Driving > End-to-End Autonomous Driving' 카테고리의 다른 글

[E2E 자율주행] (7)-5 Challenges: Policy Distillation (0)	2024.12.16
[E2E 자율주행] (7)-4 Challenges: World-model / Multi-task Learning (0)	2024.12.15
[E2E 자율주행] (7)-2 Challenges: Sensing / Sensor-fusion / Input Modalities (2)	2024.12.13
[E2E 자율주행] (6)-3 Closed-loop: Sensor Simulation / Vehicle Dynamics Simulation / Benchmarks (0)	2024.12.11
[E2E 자율주행] (6)-2 Closed-loop: Parameter Initialization / Traffic Simulation (1)	2024.12.09

현재글[E2E 자율주행] (7)-3 Challenges: Visual Abstraction / Representation Learning

공부한 것을 내 방식대로 정리해서 기록하는 블로그.

알고리즘, 컴퓨터비전, 링크계층, dynamic programming, end-to-end autonomous driving:challenges and frontiers, 동적계획법, 네트워크, link layer, Imitation Learning, 네트워크계층, e2e 자율주행 설명, greedy algorithm, DP, e2e 자율주행, 동적프로그래밍, UVA, 그리디알고리즘, C언어, end-to-end autonomous driving, closed-loop,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

9._.coding