Hand tracking with a humanoid robot head.
Ambrosch, Roland ; Malisa, Viktorio
Abstract: This cognitive machine vision system was build for the
interaction of humans with a humanoid robot. It is able to track and
follow hands via a vision system and a human like head and neck. The
control loop runs at a speed of 1 kHz and the vision algorithm at four
frames per second. The actuation is realized via brushless motors. The
vision algorithm and the follow-me behavior is implemented on a personal
computer using OpenCV. Tracking works at distances between 0.5 m and 2
m. This system will be used to move the humanoid robot's hand and
position it to grasp objects.
Key words: humanoid robot, cognitive systems, computer vision
1. INTRODUCTION
The development of the humanoid robot head CoSyNA (Cognitive
System, Neat and Attractive) represents the beginning of the development
of a humanoid robot system. The main aim of this head is the research
for possibilities of human-robot interaction. The first developed
algorithm is a hand tracking algorithm, which gives the possibility to
follow a human hand.
In this paper the structure of the robot's control system and
mechanical design is shown and the used vision algorithm is presented
(Ambrosch, 2007).
2. MECHANICAL DESIGN
Parts of humanoid robots should look like humans and therefore the
right ratio between the look of a robot and a human must be found. With
today's actuation it is difficult and sometimes impossible to
design a humanoid robot like humans (Shimada, 2006).
In terms of CoSyNA a humanoid head with two cameras, connected with
the trunk via two degrees of freedom was developed. The DOF make it
possible to pitch and yaw like humans. No additional DOF are needed
during interacting tasks. The measurements of the head are adopted from
a ten year old human at a size of 145 cm. The head and neck have a
height of 26 cm and a depth of 18 cm, while the neck has a diameter of 9
cm. This size was chosen to get a robot which can interact with human
environments (e.g. tables, door knobs and light switches). A sketch of
the developed robot head can be seen in Figure 1. The yawing motor has
to deliver a maximum torque of 2.8 Nm, whereas the pitching motor has to
deliver only a quarter of that torque in mostly static movements. The
examination of dynamic acting with a rotation.
[FIGURE 1 OMITTED]
velocity of 2.6 rad/s resulted in very small torque, which can be
disregarded.
The two motors are positioned in the neck to perfectly interact
with the human shape. They are positioned symmetrically and manipulate
the head via planetary gears, drive belts, bevel gears and cogwheels.
The used gear ratio is 33:1, the followed bevel gears respectively
cogwheels have ratio of 3:1.
3. ELECTRICAL DESIGN
The actuation of the two DOF is realized via two brushless DC
motors which are controlled via a micro controller system. The micro
controller systems are connected with overlaid vision systems via CAN.
The micro controllers have cascaded PID controllers for rotational speed and overlaid position controllers implemented. The control loops run at
1 kHz.
The motors are driven by two driver ICs which do the commutation
and deliver information about the actual position and velocity of the
motor axes. At the end of both actuated axes there are hall sensors
which give an absolute position of the current status. Those sensors
limit the accuracy of the positioning system to one tenth of a degree,
which is enough concerning interacting tasks. At the beginning of this
humanoid robot project the developed algorithms for computer vision are
running on a PC. This PC does the vision and also the behavior part.
After each iteration the PC transmits the wanted positions to the micro
controllers.
Additionally the micro controller sends the actual head position to
the higher level control layer which decides whether a head movement is
enough or other positioning methods are needed. During the actual
application speed the motors need not more than 60 mA.
4. VISION ALGORITHM
The vision algorithm detects skin with the help of HUV-space color
segmentation. The used threshold method inputs are the mean values and
standard deviation for each channel (Sonka, 1999). Additionally
different methods of multi channel segmentation were evaluated. The
segmentation of a 3D space with cubes and spheres delivered not as good
results as channel independent methods do.
The chosen mean values and standard deviations are calculated from
2000 manually segmented positive images and 1000 negative images. If the
resulting distribution is applied to the input images of the
robot's head, skin is detected at a rate of about 80%. The mean and
deviation values are 124 (22) for Hue, 138 (33) for Saturation and 140
(110) for Value.
Before skin segmentation the images where corrected in terms of
noise. After noise reduction almost none holes appeared in skin areas.
If an arm, masked with long sleeves, is positioned in front of the
cameras, the hand and some wrong areas are detected. These areas are
eliminated with the help of contour detection. Every found contours
length and size is evaluated, in a manner that only hands in a range
between 0.5 m and 2 m distance get detected.
After the hand detection the center of area of this hand is
calculated and used for tracking.
Figure 2--Hand Tracking and the Resulting Angles of the Tracking is
realized via the Camshift algorithm (Francois, 2004), which uses regions
of interest to reduce the execution time of future iterations. The
tracking algorithm outputs the next ROI and minimizes the iteration time
of following iterations. The frame rate of the not optimized algorithm
is about 4 frames per second with a single-core 2.5 GHz 32Bit PC with 1
GB of RAM. Most of the time gets lost when eliminating the noise of the
images.
Light influences are one of the biggest problems in human
environments and got an extra attention. The implemented shutters of the
used cameras (Point Grey, Firefly MV) have a permanent blasting when
used against sunlight. This problem was solved with the implementation
of a shutter that is coupled with the vision algorithm and
shortens/lengthens the shutter speed. The vision algorithm is
implemented using OpenCV, an open source computer vision library
(OpenCV, 2007).
5. BEHAVIOR
The software framework is designed to transform the actual hand
position into a movement for the robot head, shown in Figure 2. The
vision algorithm runs on a PC, extracts the actual center of area,
transmits it to the behavior and this behavior sends the wanted
positions via CAN to the micro controllers. The behavior tries to hold
the center of area of the hand in a circle. If the hand is outside this
circle it moves the head proportional to the relative position between
the center of the field of view and the center of area of the hand. The
circle is described in formulas (1) and (2).
if [square root of [P.sup.2.sub.x] + [P.sub.y.sup.2]] > 50
[right arrow] [[alpha].sub.x] = [[alpha].sub.x] + [P.sub.x]/25, (1)
[[alpha].sub.y] = [[alpha].sub.y] + [P.sub.y]/25 if [square root of
[P.sup.2.sub.x] + [P.sub.y.sup.2]] < 50 [right arrow] [[alpha].sub.x]
= [[alpha].sub.x], [[alpha].sub.y] = [[alpha].sub.y] (2)
6. CONCLUSION AND OUTLOOK
The false detection rate of the hand tracking algorithm is nearly
eliminated through the strict rules of hand detection and the limited
working area. It is shown that there are methods to restrict the
influences of human environments via noise filters and appropriate
shutter algorithms.
Actually it is not possible to detect a hand if the arm is not
covered with long sleeves. This problem can be avoided with the help of
higher processing algorithms, which use a shape detection method to
detect e.g. the palm shape of a hand with spread fingers. Another way of
better hand detection is the implementation of a classifier that has to
be trained with lots of positive and negative images (Ong, 2004). Every
higher level image processing needs better PCs and therefore a cluster
could be used. Moreover, the source code has to be optimized for
parallel execution with multi core CPUs.
The evaluation of the control algorithm showed that the mechanical
part has to be more flexible and that a way must be found to decouple
the influences of not perfectly fitting bevel gears. The use of couplers
would be a possibility, but couplers in those sizes must be specially
manufactured. Alternatively the whole mechanical part has to be
overworked and the head must be driven with other types of actuation.
If the head should be more human like, there must be more DOF which
move the eyelids and perhaps also the neck.
Mechanical optimization tests showed with the help of the finite
elements method that most of the head and neck can be constructed with
synthetic material and so a lot of mass can be saved, which is important
for an autonomous humanoid robot. The interaction between humanoid
robots and humans is an important topic and there are several different
methods that have to be researched in future. The combination of vision
and sound has to be combined. Future research will include the
optimization of the mechanical part, the improvement of the vision
algorithm and the combination of different techniques to have more
human--machine interaction.
[FIGURE 2 OMITTED]
7. REFERENCES
Ambrosch, R. (2007). Kognitives Machine Vision System eines
Humanoiden Roboters. Master Thesis. University of Applied Sciences
Technikum Wien, Mechatronics/Robotics. Vienna/Austria.
Francois, A. R. (2004). Camshi ft tracker design experiments with
intel opencv and sai. University of Southern California, Los Angeles Institute for Robotics and Intelligent Systems, Los Angeles/USA.
Ong, E. & Bowden, R. (2004). A boosted classifier tree for hand
shape detection. Proceedings of the Sixth IEEE International Conference
on Automatic Face and Gesture Recognition. Seoul/Korea..
OpenCV, (2007). Intel Corporation [online], Available from:
http://www.intel.com/technology/ computing/opencv/index.htm Accessed:
2007-08-04.
Shimada, M.; Minato, T.; Itakura, S. & Ishiguro, H. (2006).
Evaluation of Android Using Unconscious Recognition. 6th IEEE-RAS
International Conference on Humanoid Robots. Genova/Italy.
Sonka, M.; Hlavac, V. & Boyle, R. (1999). Image processing,
analysis, and machine vision. Brooks/Cole, ISBN 978-0534953935, USA.