반응형 reinforcement Learning2 [딥러닝 with Python] GRPO란? (Group Relative Policy Optimization) 오늘 알아볼 것은 GRPO(Group Relative Policy Optimization)입니다. 1. GRPO란?- Group Relative Policy Optimization(GRPO)는 강화학습(Reinforcement Learning, RL)에서 정책 최적화(Policy Optimization)를 수행할 때, 기존의 Proximal Policy Optimization(PPO)과 달리,상대적인 그룹 기준(Group-relative criterion)**을 활용하여 안정적인 학습을 유도하는 방법입니다. - 이는 정책(policy)의 업데이트를 더 효과적으로 제어하고, 불필요한 정책 변화(excessive policy shifts)를 방지하며, 샘플 효율성(sample efficiency)을 향상시키.. 2025. 1. 30. [Machine Learning] What is machine learning? What is ML? 1. What is machine learning 1) Machine Learning : Finding Regularity in massive datasets 2) Regularities : Knowledge forms (rules, decision trees) - Machine Learning usually uses inductive knowledge to make predictions. - The procedure of ML : Data -> Finding regularity -> Representation as diverse forms -> Prediction 3) Machine Learning (Compared to traditional programming) - ML : Input -> ML -.. 2024. 3. 6. 이전 1 다음 반응형