文章基本信息

标题：The "Beta-Star Shape" algorithm for document clipping.
作者：Boiangiu, Costin Anton ; Bucur, Ion ; Dvornic, Andrei Iulian 等
期刊名称：Annals of DAAAM & Proceedings
印刷版ISSN：1726-9679
出版年度：2008
期号：January
语种：English
出版社：DAAAM International Vienna
摘要：There are several theories and techniques used to solve the tight-fitting polygonal contour generation for a set of points defined in a certain space (Edelsbrunner, 1987). One of the most efficient, and probably the most used, is the theory of "Alpha-Shapes". For a good understanding of what this concept means, let us consider a very simple and intuitive example: imagine that all input points from a certain plane are the heads of some nails fitted into a wooden panel. Consider now a ball of radius R that is rolling over the exterior of the "nailed surface" and must keep the same rolling direction at all times. Suppose the direction is clockwise-oriented. It is obvious that the ball will enter between nails that are positioned at a distance d >= R and will keep rolling over the ones situated at a distance d < R. After the ball has performed a full rotation and reached the initial position, the nails heads "touched" by the ball in the full-rotation process will represent a non-self-intersected polygon, clockwise oriented (as previously supposed). This polygon is the R-Radius "Alpha-Shape" for the given input set of points. Most of the algorithms that compute "alpha-shapes" for a set of input points are based on Delaunay (Aurenhammer, 1991) triangulation process, an elegant but not very economical approach. Among the disadvantages of this theory is the fact that points in the plane cannot be controlled with respect to the distances between them. For this reason, generating the "alphashape" using a small radius can yield a polygon that does not contain all the input points. On the other hand, if the radius is big enough to ensure only one output polygon, the "alphashape" may not reproduce the contour very accurate, a coarse polygon being practically unusable in a demanding multimedia application.
关键词：Algorithms

The "Beta-Star Shape" algorithm for document clipping.

Boiangiu, Costin Anton ; Bucur, Ion ; Dvornic, Andrei Iulian 等

1. INTRODUCTION

There are several theories and techniques used to solve the tight-fitting polygonal contour generation for a set of points defined in a certain space (Edelsbrunner, 1987). One of the most efficient, and probably the most used, is the theory of "Alpha-Shapes". For a good understanding of what this concept means, let us consider a very simple and intuitive example: imagine that all input points from a certain plane are the heads of some nails fitted into a wooden panel. Consider now a ball of radius R that is rolling over the exterior of the "nailed surface" and must keep the same rolling direction at all times. Suppose the direction is clockwise-oriented. It is obvious that the ball will enter between nails that are positioned at a distance d >= R and will keep rolling over the ones situated at a distance d < R. After the ball has performed a full rotation and reached the initial position, the nails heads "touched" by the ball in the full-rotation process will represent a non-self-intersected polygon, clockwise oriented (as previously supposed). This polygon is the R-Radius "Alpha-Shape" for the given input set of points. Most of the algorithms that compute "alpha-shapes" for a set of input points are based on Delaunay (Aurenhammer, 1991) triangulation process, an elegant but not very economical approach. Among the disadvantages of this theory is the fact that points in the plane cannot be controlled with respect to the distances between them. For this reason, generating the "alphashape" using a small radius can yield a polygon that does not contain all the input points. On the other hand, if the radius is big enough to ensure only one output polygon, the "alphashape" may not reproduce the contour very accurate, a coarse polygon being practically unusable in a demanding multimedia application.

2. MOTIVATION

Research aimed at constructing polygonal contours that approximate sets of points started with the development of an application that was separating various items belonging to an input written page (or scanned document). The goal was to detect elements like normal texts, headings, pictures, etc. and separate them using tight-fitting contours which don't overlap (in order to successfully execute selective clipping). Since the distance between adjacent items or elements that compose these items (like text characters) is not controllable and only the shape that bounds each item is needed, the usage of "alpha-shapes" was not an option. In this paper we focus on a new technique that has been developed: the "Beta-Star Shape"--a robust algorithm that has a controllable behaviour and creates only one shape for an input set of points.

3. ALGORITHM OUTLINE

An arbitrary collection of planar points, defined by their coordinates (x, y), is considered as the input for the algorithm. The proposed technique is based on refining the convex hull of the considered points in successive iterations. Hence, the computation of the convex hull is the first step of the proposed algorithm and can be performed using any well-known robust method (Avis & Bremner, 1995). At this stage the convex hull is the starting "Beta-Star Shape" polygon that will be successively refined. The points that are not part of the convex hull are called candidate points or simple candidates. The algorithm specifies the way to insert candidates into the "BetaStar Shape" in order to refine it (preserving the precondition that it contains all the input points). The algorithm is controllable both as tightness of the shape and execution time (it can be stopped whenever the result is refined enough for the application purposes). Choosing candidates to be inserted in the "Beta-Star Shape" is done such that it allows a uniform smoothing of the contour and avoids self-intersections. For this reasons every iteration step is in fact a usable output polygonal contour that verifies all the conditions needed by an application.

For constructing the "Beta-Star Shape", a distance parameter (named d) is needed; this is somehow equivalent with the R parameter used in the description of the "AlphaShape". If two adjacent points from the constructed shape in a certain moment are placed at a distance lower than d, then the segment will be finally integrated into the "Beta-Star Shape" and will suffer no further refinements. Otherwise, the process of successive refinements will continue to try splitting the segment by finding the best candidate for it and inserting it into the chain of points.

For a good search process the algorithm uses square search areas, because they do not encourage the usage of potentially better candidates for other edges and they restrict the search to a much lower subset of the input dataset. If one candidate is closer to an edge than any other candidate and the insertion of that candidate in the polygonal chain does not generate self-intersections then it is considered the best candidate for that edge. Having the lowest distance ensures the fact that all the other points in the input dataset remain inside the polygonal "Beta-Star Shape" contour. If no successful candidate is to be found in a certain area then all of the other remaining candidates are taken into account.

[FIGURE 1 OMITTED]

In order to check if a point is inside a certain search area, the (a, b, c) coefficients that define the support line of the edge to be refined are computed based on the coordinates of the two endpoints of the line [P.sub.1]([x.sub.1], [y.sub.1]) and [P.sub.2]([x.sub.2], [y.sub.2]):

a = [y.sub.2] - [y.sub.1]; b = [x.sub.1] - [x.sub.2]; c = -a x [x.sub.1] - b x [y.sub.1] (1)

The parameters for the edges that are perpendiculars onto [P.sub.1][P.sub.2] are:

[a.sub.1] = [a.sub.2] = b; [b.sub.1] = [b.sub.2] = -a [c.sub.1] = -b x [x.sub.1] + a x [y.sub.1]; [c.sub.2] = -b x [x.sub.2] + a x [y.sub.2] (2)

Taking into account the results from equations (1) and (2) and assuming a clockwise orientation of the convex hull resulted polygon (the orientation of the polygon cannot change after any refinement that has been applied) then the conditions for a valid candidate are:

b x x - a x y + [c.sub.1] >= 0; b x x - a x y + [c.sub.2] <= 0 a x x + b x y + c <= EdgeLength; a x x + b x y + c >= 0 (3)

While the first two conditions regard the position of the point (x, y) inside the two perpendicular axes, the third condition means that the "depth" of the point is lower than the edge length (square search area); the forth inequality is a precondition that must be verified for all points in the search dataset in order to ensure that the polygon has the correct orientation and that points are inside the "Beta-Star Shape" at any moment. This specific search mode inside a square area defined by the current edge allows a good trade between speed and quality of the output polygon. Apart from that, it makes local use of the input data, thus enabling fast-recovery of the search point domain by means of a simple two-dimensional hash-table. The algorithm searches the best candidate for every edge, until all edges have potential candidates or there are no more candidates to be considered. Using the previously computed distances, the proposed method takes them one by one in ascending order and merges the candidates (denoted A1, B1, C1, D1 and E1 in Fig. 1.) into the "Beta-Star Shape" chain. None of the chosen candidates generate self-intersections; hence they can be safely inserted.

The algorithm continues by considering new search areas until all remaining candidates are generating self-intersection (internal status of "deadlock") or all the edges have lengths lower than d. The refinement of the shape is not continued if there are no "obstacle" points contained in the current shape as opposite to the standard "Beta-Shape" (Boiangiu, 2003). This allows the time consumed for the refinement process to be reduced to the maximum acceptable. We call "obstacle" points all the points that should not be part of the current points set collection (that are not desired in the resulted shape).

[FIGURE 2 OMITTED]

[FIGURE 3 OMITTED]

4. CONCLUSIONS

The proposed algorithm has been implemented and tested in order to prove its robustness and make various performance measurements.

The "Beta-Star Shape" algorithm has a variable complexity of about O([n.sup.5/2]) for random input data and about O([n.sup.3/2]) for data sets generated by real images. Fig. 3. shows some statistics about how the input time is affected by the number of points presented in the input dataset. The results are plotted without performing an average in order to enable measuring of time deviations (1000 time-measure tests have been conducted on random point datasets).

To sum up, the "Beta-Star Shape" algorithm:

* can be controlled as time vs. refinement quality using the d parameter.

* is robust and generates a single output polygon for any input dataset.

* is fully valid at every step (the intermediate result); hence it is possible to trade between execution time and refinement quality and also stop whenever the output polygon satisfies application needs and/or the allocated time for refinement is consumed.

* allows adaptive refinement since only "obstacle" points are avoided during shape update comparing with the standard "Beta-Shape" (Boiangiu, 2003).

* does not need much auxiliary memory; practically all the needed memory space is O(n), since the final "Beta-Star Shape" is a subset of the input dataset.

* has a clear end since every step performed consumes at least one candidate from the input dataset.

* always generates an output polygon containing the convex hull points.

* can be easily adapted for the 3-dimensional (and even for the n-dimensional) space since the only concepts used are: the convex-hull (de Berg et al., 1987) and space separation equations, which in the 2-dimensional case are the line equations.

5. REFERENCES

Aurenhammer, F. (1991). Voronoi diagrams--A survey of a fundamental geometric data structure, ACM Computing Survey, Volume 23, Issue 3, September 1991, pp. 345-405

Avis, D. & Bremner, D. (1995). How good are convex hull algorithms? Proceedings 11th A.C.M. Symposium on Computational Geometry, June 1995, pp. 20-28, Vancouver, Canada

Boiangiu, C.A. (2003) The "Beta-Shape" Algorithm for polygonal contour reconstruction, The 14th International Conference on Control System and Computer Science, Vol. II, July 2003, pp. 29-33

De Berg, M.; Otfried, C.; van Kreveld, M. & Overmars, M. (2008). Computational Geometry: Algorithms and Applications, Springer-Verlag

Edelsbrunner, H. (1987). Algorithms in Combinatorial Geometry, Springer-Verlag, Berlin