文章基本信息

标题：Investigation of disturbances on a novel reinforcement learning based control approach.
作者：Albers, Albert ; Sommer, Hermann ; Frietsch, Markus 等
期刊名称：Annals of DAAAM & Proceedings
印刷版ISSN：1726-9679
出版年度：2011
期号：January
语种：English
出版社：DAAAM International Vienna
摘要：Key words: machine-learning, robotics, dynamics, control
关键词：Control systems

Investigation of disturbances on a novel reinforcement learning based control approach.

Albers, Albert ; Sommer, Hermann ; Frietsch, Markus 等

Abstract: Automatic manipulator control poses a complex challenge for control systems with various dynamic and nonlinear effects as well as high dimensionality issues. The aim of this paper is to analyze the behavior of a novel approach for a reinforcement learning based motion control of a 2-DOF robotic manipulator when it is subdues to the effect of a set of different disturbances. The implementation of the presented approach shows the performance of the manipulator when circumstances force him into unexpected situations. The experimental results are presented

Key words: machine-learning, robotics, dynamics, control

1. INTRODUCTION

1.1 Reinforcement Learning in modern Robotics

One of the biggest challenges in current research in robotics is, that robots "leave" their well structured environment and are confronted with new tasks in a more complex environment. An example for this new setting is the striving for autonomy and versatility in the fields of humanoid robotics as explained in (Peters et al., 2003). Due to this, it can only be successful resp. useful, when it is able to adapt itself and to learn from its experiences. Reinforcement Learning (RL), a branch of machine learning (Mitchell, 1997), is one possible approach to deal with this problem. However, the application of this learning process is limited due to its complexity. RL is a learning process, which uses reward and punishment signals from the interaction with the agent's environment in order to learn a distinct policy for achieving tasks. Various RL methods e.g. Q-learning (Watkins, 1989) or the SARSA algorithm have been studied in (Sutton & Barto, 1998) where it is shown that two problems must be considered (Park & Choi, 2002). At first, the high computational efforts: RL is disturbed by the "curse of dimensionality" (Sutton & Barto, 1998), which refers to the tendency of a state space to grow exponentially in its dimension.

Secondly, the information of every different task is stored separately. This results in both a storing place and computational effort issue and reduces the usefulness of RL for practical applications. Furthermore, it poses the question of how already acquired knowledge can be reused. In (Martin & De Lope, 2007), an approach is presented where a distributed architecture in RL serves as a pragmatic solution for some common robotic manipulators with different DOF. RL based approaches have been applied to various robotic systems in the past, although mostly applied to the learning of elemental tasks meant to serve as "building blocks of movement generation" as in (Peters, 2008). Nevertheless, new computations and additional storage space is required for performing new tasks.

The adaptability of these approaches regarding new, unexpected situations using already acquired knowledge has not been studied in detail. In this paper, the behavior of the novel relative approach for positioning tasks presented in (Albers et al., 2010) when exposed to a set of different disturbances is investigated and the experimental results are presented.

1.2 Model of the 2-DOF planar Robot

A 2 DOF manipulator system is chosen as a prototypical nonlinear system. Fig. 1. shows the schematic drawing of the manipulator including all relevant system parameters. The aim is to create a trajectory using as few control commands as possible. The state of the manipulator is described by:

s = [[[theta].sub.1], [[??].sub.1], [[[theta].sub.2], [[??].sub.2]] (1)

The system is detailedly described in (Denzinger & Laureyns, 2008) and adapted for the purpose of this investigation.

2. CONTROL OF SYSTEM DISTURBANCES USING THE RELATIVE APPROACH

2.1 Overview of the relative Approach

Using the relative approach developed in (Yan et al., 2009) each position can be calculated as a constant value plus a difference possible, this way every positioning task can be remodeled to a simple offset compensation, reducing every possible individual tasks to a single one. When the knowledge about this task is properly implemented as shown in (Albers et al., 2010), no new learning is required to complete new tasks. An exhaustive investigation of this approach suggests that the RL-algorithm shows a behavior similar to classical closed loop controllers. To test this behavior, a representative set of disturbances were implemented and the manipulator's response was studied.

2.2 Experimental Setup

In order to evaluate the response of the manipulator to possible disturbances it was necessary to adjust the underlying dynamic system employed for the computation of the agent's behavior.

[FIGURE 1 OMITTED]

[FIGURE 2 OMITTED]

For the purpose of our investigation we confront the manipulator control with two commonly encountered example disturbances. The first is a force impulse with a magnitude of 5 N which simulates the manipulators response to short time disturbances like a collisions or current peaks. The second is a force in form of a step signal, which represents a disturbance with a longer duration. The magnitude of the second disturbance is 0.5 N. The smaller magnitude was selected due to the lengthy application time. The point of application of the disturbing force is the Tool Center Point (TCP) of the robotic agent, located at the end of its second link. The direction of application is perpendicular to the second link at all times. The agents task during an experiment is to maintain the outstretched position ([DELTA][[theta].sub.1],[DELTA][[theta].sub.2]) = (0, 0) for 250 time steps (1 interval = 0.05 s) and to compensate the positioning offset inflicted by the disturbances if necessary.

For both cases under study the application time of the disturbances was set to 3 s. The experiments end after 12.5 s and the response curves for both links are plotted. The results for the first test case are depicted in Fig. 2, while the results for the second test case are depicted in Fig. 3. The results of conducted experiments with varied magnitudes for the applied external disturbances, which are not depicted here, showed similar behavior.

3. RESULTS

The response of the manipulator to the impulselike disturbance is depicted in Fig.2. For the first 3 seconds of the experiment the manipulator maintains the outstretched target position without noticeable difficulties. At this point the force impulse is applied to the second joint and the manipulators first joint is clearly brought out of position, while the second joint shows a smaller angular deviation as a consequence of the disturbance than the first one. After approximately one single oscillation around the target position, the first joint stabilizes. After around 5 s the manipulator has reestablished the outstretched configuration, which he then successfully maintains till the end of the experimental task as seen in Fig. 2.

In the second experimental setup the disturbing force is applied to the manipulator responding to a step function after 3 s. The first joint angle oscillates briefly before it reaches a stable position around -90[degrees] away of the manipulator, thus not being able to compensate the angular offset. The second joint remains unaffected after the disturbing force is applied and maintains its position for the rest of the experimental task. Throughout all experiments the second joint proved to be less influenced by external forces.

[FIGURE 3 OMITTED]

4. CONCLUSION

The manipulator is capable of reacting positively to short term disturbances and of reestablishing the target position after such a disturbance. During the time of effect of the disturbances, the underlying mapping of states and actions is no longer valid since the system dynamics are changed. For disturbances with a longer duration of effect, or even time invariant disturbances, a new learning procedure or remapping is necessary. Nevertheless, the approach does present a remarkable advantage towards a traditional open loop control. Having learnt the new mapping though, the manipulator could recognize the change in its environment and adapt or even load a previously computed new mapping of its behavior accordingly. A possibility would be to implement a model based control to recognize differing mappings and adjust the learning behavior. Future research includes this topic among others like discretization smoothness and partial recoupling of the manipulator axis. Furthermore, additional experiments will be conducted on a three DOF system.

5. REFERENCES

Albers, A.; Sommer, H. & Frietsch, M. (2010). A New Approach for Solving Positioning Tasks of Robotic Systems Based on Reinforcement Learning, Annals of DAAAM for 2010 & Proceedings of the 21st International DAAAM Symposium

Denzinger J.; Laureyns I. & et.al. (2008). A study of reward functions in reinforcement learning on a dynamic model of a two-link planar robot, The 2nd European DAAAM International Young Researchers' and Scientists' Conference

Mitchell, Tom M. (1997). Machine learning, McGraw-Hill, New York, ISBN 0-07-042807-7

Peters J., Vijayakuar S. & Schaal S. (2003) Reinforcement learning for humanoid robotics. Third IEEE-RAS International Conference on Humanoid Robots, Karlsruhe, Germany

Peters J. (2008). Machine Learning for Robotics. VDM Verlag Dr. Muller, Saarbrucken, ISBN 978-3-639-02110-3

Sutton R.S & Barto A.G. (1998). Reinforcement Learning, an introduction, The MIT press, MA

Yan, W & et. al. (2009). Application of reinforcement learning to a two DOF Robot arm control, Annals of DAAAM for 2009 & Proceedings of 20th DAAAM International Symposium