Bellman

강화학습 #어텐션 #트랜스포머 #ViT #알고리즘 #인공지능 #머신러닝 #딥러닝 #Deep Reinforcement Learning #Deep Learning #Deep RL #Machine Learning(1)

Decision Transformer: Attention is all RL Need?
https://arxiv.org/pdf/2106.01345.pdf Instead of training a policy through conventional RL algorithms like temporal difference (TD) learning, We will train transformer models on collected experience using a sequence modeling objective. 0. 기존 RL의 학습방법과 Credit Assignement Problem 기존 RL은 위대한 수학자 Bellman에 의해 만들어진 Bellman Equation, 즉 TD를 이용해 학습했다. TD 러닝의 아이디어는 아주 간단하다. t스텝에는 t+1 스텝의 리워드를 알 수 없다. 그러므로 ..
2021.06.12

1

티스토리툴바