گام برداشتن ربات انسان نما با استفاده از گراديان سياست در يادگيري تقويتي

چكيده انگليسي :

Humanoid Locomotion Using Policy Gradients in Reinforcement Learning Arash Givchi a givchi@ec iut ac ir Date of submission September 20 2011 Department of Electrical and Computer Engineering Isfahan University of Technology Isfahan 84156 83111 Iran Degree M Sc language Persian Supervisor Dr Maziar Palhang palhang@cc iut ac irAbstract Implementation of a humanoid Robot which could behave like human beings is one of the maininterests in robotics Among all of mechanical behaviors of a man the most noticeable one would be thewalking Humanoid walking is a domain of research which encompasses various areas of science such asBiology Physiology Mechanics Control Artificial Intelligence and Robotics From the perspective ofArtificial Intelligence Reinforcement learning in continuous space is an appropriate control method for learningthe behavioral patterns needed to be performed smooth and continuous In recent years this machine learningmethod has been considered widely within the Control and AI societies and tested as a trial and error method which makes the agent experiences through communicating with the environment This learning algorithm hasbeen successfully tested over various control tasks The current research proposes a learning algorithm based onReinforcement Learning named Policy Gradients which is used to make the robot learn how to step as thebasic action in walking process This research focuses on motion planning based on dividing a single step into 2different sub behaviors These sub behaviors are the transition functions which are optimized among 3 semi dynamic frames defined in advance Besides the learning algorithm is divided into 2 different learningprocesses Learning the balance of the robot and minimizing the gyroscope error Gyroscope reward functionwhich is introduced in this research is a function of angular speeds in each dimension of motion space of thehumanoid robot The first learning process is based on the Policy Gradients learning algorithm and in the secondlearning process we have devised a hill climbing search algorithm beside the Policy Gradients to enhance thequality of learning The policy which is used in the proposed method is stochastic policy which is used in PolicyGradient methods The results show that while learning the balance the robot minimizes the Gyroscope Erroras a fitness for tension minimization The results prove that although the environment is stochastic the finallearned behavior is stable enough to surpass the noise of the system The Humanoid could adjust its motor speedwhile learning the motion functions Increasing the speed of motion meanwhile increasing the speed ofconvergence are the other consequences of this learning algorithm Although there is no convergence proof forthe Policy Gradient methods in reinforcement learning it will be demonstrated that these methods beside frame based motion planning and Gyroscope reward function in reinforcement learning not only leads into stablewalking manner based on the learned step behaviors but also it converges toward a suitable local optimum inthe policy space KeywordsHumanoid Robots Reinforcement Learning Policy Gradients Gyroscope Hill Climbing

گيوچي، آرش

گام برداشتن ربات انسان نما با استفاده از گراديان سياست در يادگيري تقويتي

ژيرسكوپ , تپه نوردي