文章基本信息

标题：Hand tracking with a humanoid robot head.
作者：Ambrosch, Roland ; Malisa, Viktorio
期刊名称：Annals of DAAAM & Proceedings
印刷版ISSN：1726-9679
出版年度：2007
期号：January
语种：English
出版社：DAAAM International Vienna
摘要：Key words: humanoid robot, cognitive systems, computer vision
关键词：Algorithms;Computer vision;Control systems;Hand;Human-computer interaction;Machine vision;Robots

Hand tracking with a humanoid robot head.

Ambrosch, Roland ; Malisa, Viktorio

Abstract: This cognitive machine vision system was build for the interaction of humans with a humanoid robot. It is able to track and follow hands via a vision system and a human like head and neck. The control loop runs at a speed of 1 kHz and the vision algorithm at four frames per second. The actuation is realized via brushless motors. The vision algorithm and the follow-me behavior is implemented on a personal computer using OpenCV. Tracking works at distances between 0.5 m and 2 m. This system will be used to move the humanoid robot's hand and position it to grasp objects.

Key words: humanoid robot, cognitive systems, computer vision

1. INTRODUCTION

The development of the humanoid robot head CoSyNA (Cognitive System, Neat and Attractive) represents the beginning of the development of a humanoid robot system. The main aim of this head is the research for possibilities of human-robot interaction. The first developed algorithm is a hand tracking algorithm, which gives the possibility to follow a human hand.

In this paper the structure of the robot's control system and mechanical design is shown and the used vision algorithm is presented (Ambrosch, 2007).

2. MECHANICAL DESIGN

Parts of humanoid robots should look like humans and therefore the right ratio between the look of a robot and a human must be found. With today's actuation it is difficult and sometimes impossible to design a humanoid robot like humans (Shimada, 2006).

In terms of CoSyNA a humanoid head with two cameras, connected with the trunk via two degrees of freedom was developed. The DOF make it possible to pitch and yaw like humans. No additional DOF are needed during interacting tasks. The measurements of the head are adopted from a ten year old human at a size of 145 cm. The head and neck have a height of 26 cm and a depth of 18 cm, while the neck has a diameter of 9 cm. This size was chosen to get a robot which can interact with human environments (e.g. tables, door knobs and light switches). A sketch of the developed robot head can be seen in Figure 1. The yawing motor has to deliver a maximum torque of 2.8 Nm, whereas the pitching motor has to deliver only a quarter of that torque in mostly static movements. The examination of dynamic acting with a rotation.

[FIGURE 1 OMITTED]

velocity of 2.6 rad/s resulted in very small torque, which can be disregarded.

The two motors are positioned in the neck to perfectly interact with the human shape. They are positioned symmetrically and manipulate the head via planetary gears, drive belts, bevel gears and cogwheels. The used gear ratio is 33:1, the followed bevel gears respectively cogwheels have ratio of 3:1.

3. ELECTRICAL DESIGN

The actuation of the two DOF is realized via two brushless DC motors which are controlled via a micro controller system. The micro controller systems are connected with overlaid vision systems via CAN. The micro controllers have cascaded PID controllers for rotational speed and overlaid position controllers implemented. The control loops run at 1 kHz.

The motors are driven by two driver ICs which do the commutation and deliver information about the actual position and velocity of the motor axes. At the end of both actuated axes there are hall sensors which give an absolute position of the current status. Those sensors limit the accuracy of the positioning system to one tenth of a degree, which is enough concerning interacting tasks. At the beginning of this humanoid robot project the developed algorithms for computer vision are running on a PC. This PC does the vision and also the behavior part. After each iteration the PC transmits the wanted positions to the micro controllers.

Additionally the micro controller sends the actual head position to the higher level control layer which decides whether a head movement is enough or other positioning methods are needed. During the actual application speed the motors need not more than 60 mA.

4. VISION ALGORITHM

The vision algorithm detects skin with the help of HUV-space color segmentation. The used threshold method inputs are the mean values and standard deviation for each channel (Sonka, 1999). Additionally different methods of multi channel segmentation were evaluated. The segmentation of a 3D space with cubes and spheres delivered not as good results as channel independent methods do.

The chosen mean values and standard deviations are calculated from 2000 manually segmented positive images and 1000 negative images. If the resulting distribution is applied to the input images of the robot's head, skin is detected at a rate of about 80%. The mean and deviation values are 124 (22) for Hue, 138 (33) for Saturation and 140 (110) for Value.

Before skin segmentation the images where corrected in terms of noise. After noise reduction almost none holes appeared in skin areas.

If an arm, masked with long sleeves, is positioned in front of the cameras, the hand and some wrong areas are detected. These areas are eliminated with the help of contour detection. Every found contours length and size is evaluated, in a manner that only hands in a range between 0.5 m and 2 m distance get detected.

After the hand detection the center of area of this hand is calculated and used for tracking.

Figure 2--Hand Tracking and the Resulting Angles of the Tracking is realized via the Camshift algorithm (Francois, 2004), which uses regions of interest to reduce the execution time of future iterations. The tracking algorithm outputs the next ROI and minimizes the iteration time of following iterations. The frame rate of the not optimized algorithm is about 4 frames per second with a single-core 2.5 GHz 32Bit PC with 1 GB of RAM. Most of the time gets lost when eliminating the noise of the images.

Light influences are one of the biggest problems in human environments and got an extra attention. The implemented shutters of the used cameras (Point Grey, Firefly MV) have a permanent blasting when used against sunlight. This problem was solved with the implementation of a shutter that is coupled with the vision algorithm and shortens/lengthens the shutter speed. The vision algorithm is implemented using OpenCV, an open source computer vision library (OpenCV, 2007).

5. BEHAVIOR

The software framework is designed to transform the actual hand position into a movement for the robot head, shown in Figure 2. The vision algorithm runs on a PC, extracts the actual center of area, transmits it to the behavior and this behavior sends the wanted positions via CAN to the micro controllers. The behavior tries to hold the center of area of the hand in a circle. If the hand is outside this circle it moves the head proportional to the relative position between the center of the field of view and the center of area of the hand. The circle is described in formulas (1) and (2).

if [square root of [P.sup.2.sub.x] + [P.sub.y.sup.2]] > 50 [right arrow] [[alpha].sub.x] = [[alpha].sub.x] + [P.sub.x]/25, (1)

[[alpha].sub.y] = [[alpha].sub.y] + [P.sub.y]/25 if [square root of [P.sup.2.sub.x] + [P.sub.y.sup.2]] < 50 [right arrow] [[alpha].sub.x] = [[alpha].sub.x], [[alpha].sub.y] = [[alpha].sub.y] (2)

6. CONCLUSION AND OUTLOOK

The false detection rate of the hand tracking algorithm is nearly eliminated through the strict rules of hand detection and the limited working area. It is shown that there are methods to restrict the influences of human environments via noise filters and appropriate shutter algorithms.

Actually it is not possible to detect a hand if the arm is not covered with long sleeves. This problem can be avoided with the help of higher processing algorithms, which use a shape detection method to detect e.g. the palm shape of a hand with spread fingers. Another way of better hand detection is the implementation of a classifier that has to be trained with lots of positive and negative images (Ong, 2004). Every higher level image processing needs better PCs and therefore a cluster could be used. Moreover, the source code has to be optimized for parallel execution with multi core CPUs.

The evaluation of the control algorithm showed that the mechanical part has to be more flexible and that a way must be found to decouple the influences of not perfectly fitting bevel gears. The use of couplers would be a possibility, but couplers in those sizes must be specially manufactured. Alternatively the whole mechanical part has to be overworked and the head must be driven with other types of actuation.

If the head should be more human like, there must be more DOF which move the eyelids and perhaps also the neck.

Mechanical optimization tests showed with the help of the finite elements method that most of the head and neck can be constructed with synthetic material and so a lot of mass can be saved, which is important for an autonomous humanoid robot. The interaction between humanoid robots and humans is an important topic and there are several different methods that have to be researched in future. The combination of vision and sound has to be combined. Future research will include the optimization of the mechanical part, the improvement of the vision algorithm and the combination of different techniques to have more human--machine interaction.

[FIGURE 2 OMITTED]

7. REFERENCES

Ambrosch, R. (2007). Kognitives Machine Vision System eines Humanoiden Roboters. Master Thesis. University of Applied Sciences Technikum Wien, Mechatronics/Robotics. Vienna/Austria.

Francois, A. R. (2004). Camshi ft tracker design experiments with intel opencv and sai. University of Southern California, Los Angeles Institute for Robotics and Intelligent Systems, Los Angeles/USA.

Ong, E. & Bowden, R. (2004). A boosted classifier tree for hand shape detection. Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition. Seoul/Korea..

OpenCV, (2007). Intel Corporation [online], Available from: http://www.intel.com/technology/ computing/opencv/index.htm Accessed: 2007-08-04.

Shimada, M.; Minato, T.; Itakura, S. & Ishiguro, H. (2006). Evaluation of Android Using Unconscious Recognition. 6th IEEE-RAS International Conference on Humanoid Robots. Genova/Italy.

Sonka, M.; Hlavac, V. & Boyle, R. (1999). Image processing, analysis, and machine vision. Brooks/Cole, ISBN 978-0534953935, USA.