Data structures for road condition AVI file video augmentation.
Mihic, Srdan ; Ivetic, Dragan
1. INTRODUCTION
The public institution Road Center of Vojvodina (Centar za puteve
Vojvodine, CPV) is using ROad Measurement and Data Acquisition System
(ROMDAS) (Bennett et al., 2007) for road inspection and maintenance. The
ROMDAS system consists of several measuring devices (gyroscope, GPS
receivers etc.), a video camera mounted on a vehicle, and software to
process the collected discrete data. The video camera captures video
into AVI format. The measuring devices capture the discrete data about
the physical characteristics of the road-condition state such as: road
roughness, transverse profile and rut depths, traffic density etc. After
the completed survey run, the measured data are processed and analyzed.
ROMDAS captures the visual road-condition state with the video
camera mounted on the vehicle. This video represents a valuable source
of information for road managers and road engineers since it provides
visual feedback of the collected discrete data. Unfortunately, the video
is stored separately from the discrete data acquired by the measuring
devices. Therefore, road engineers have to search the video manually in
order to find details of interest provided by data analysis. This is a
tedious and error prone task. Hence our approach has been to integrate
both data and video in order to support more effective and comfortable
information retrieval. The integration is carried out by encapsulation of both the video and the data into one file in a way that facilitates
data storage and communication. The data used for video augmentation are
called augmented data.
Augmented Video stream Framework (AVF) was designed for ROMDAS
road-condition state video augmentation with discrete data collected by
ROMDAS measurement devices. The AVF provides a full search of augmented
data according to the data properties. The implementation of the AVF is
based on Microsoft DirectShow for synchronized playback of the basic
video and augmented data (Mihic, 2007).
The proposed AVF framework was created to solve one specific
problem of road engineering. It was designed, however, to be extensible
and applicable on a wide range of information systems.
This paper focuses on data structures that encapsulate the
augmented data.
2. VIDEO AUGMENTATION
The problem which we were to solve can be classified as part of the
so-called field of video augmentation. Studies have shown that video
stream augmented with data provides a deeper understanding of captured
reality and promotes active watching (Correia & Chambel, 1999).
Additionally, the augmented data provide consumers with the ability to
acquire new knowledge and advocate content-oriented video access and
retrieval.
One of frequently used approaches for video augmentation is
annotated video. An annotated video is a video augmented with
annotations. By annotations Schroeter et al. (Schroeter et al., 2007)
mean: descriptions, notes, subjective comments and various observations
that can be attached to the video document without actual document
modification. The variety of annotated information is limited by the
annotator's knowledge, and it is subjective (Schroeter et al.,
2007).
Another popular approach for video augmentation is augmented video.
An augmented video is the result of augmenting a certain video with
non-perceivable data captured at the time of the recording. Usually, 3D
computer generated objects are rendered on top of the video and merged
into the video stream.
The ROMDAS system can be classified as an annotated video system
with characteristics of augmented video. The measured discrete ROMDAS
data are closely related to the video content and therefore inseparable
from the video.
Agosti and Ferro abstracted the definition of annotation to include
all forms of video data augmentation. Annotations are divided by their
correlation to the video content into: content enrichment and
stand-alone documents. The former regards annotations as closely related
to the video content and therefore inseparable from the video. The
latter regards annotations as real documents and autonomous entities
that maintain some sort of connection with the video content (Agosti
& Ferro, 2007). By their definition, the ROMDAS system can be
classified as a stand-alone annotation system, whereas the nature of the
augmented data poses that the system should be a content enrichment
annotation system.
3. AUGMENTED DATA STRUCTURES
Microsoft's Audio Video Interleave (AVI) was used as
implementation multimedia container format (MCF). AVI was chosen because
ROMDAS video camera captures video into AVI format. Additionally, our
comprehensive study of commercial MCFs (Mihic, 2007) showed that other
MCFs offer very similar features, and almost all of them are suitable
for augmented data encapsulation and storage. In the early stages of
development there were considerations to create a new specifically
designed MCF, like in ANNODEX system (Pfeiffer et al., 2003). Since our
study showed that existing MCFs are suitable, that approach was
abandoned.
For security reasons and to achieve effective data storage, it has
been required to restrict access to augmented data and enable augmented
data compression while maintaining compatibility.
The next two approaches for augmenting data into MFC are common:
interleaving or non-interleaving. The former approach is suitable for
applications where video streaming is needed, and the latter is suitable
for applications where high compression ratio and effective data
encryption is needed. AVI MCF does not support interleaving of augmented
data and therefore the latter option was chosen.
AVI file is organized into small pieces called chunks. Chunks are
identified by their name (FOURCC) and they can be nested. Specification
provides a chunk named 'INFO' for additional data description.
The AVI specification requires that AVI file parsers should ignore any
unknown 'INFO' sub chunk (CORPORATE Microsoft Corp. 1991).
According to this constraint augmented data encapsulated in
'INFO' chunk should maintain compatibility and therefore would
enable unauthorized consumers to watch the basic video. We created an
'INFO' sub chunk named 'AUGD', since several sub
chunks are already defined by the specification. All the augmented data
are encapsulated into this chunk. The augmented data are compressed and
encrypted before encapsulation. Arbitrary compression and encryption
algorithms can be used.
The augmented data are structured in an object-oriented way. The
data are described using a type system similar to those in C++ and Java
programming languages (Fig. 1). AVF defines: bool, int, double, string,
image, audio stream and video stream as atomic types, as same as in
(Romero & Correia, 2003). All the data are encapsulated into
classes. AVF type system supports class inheritance. The concrete value
of the measured discrete ROMDAS data in certain time interval is
represented by the Object class. We have chosen time intervals instead
of frame/frame intervals because of the nature of the discrete ROMDAS
data. Namely, real-time video capturing poses constraints on achievable
frame rate, and the time resolution of measurement devices goes as low
as [10.sup.-3]s. If we would have used frames/frame intervals, valuable
measured data could not have been described using our system.
The implemented time resolution of AVF framework was set to
[10.sup.-8]s--the lowest value usable in commercial multimedia
presentation frameworks.
Although the AVF type system was designed to describe discrete
ROMDAS data, it can also describe arbitrary data structured in an
object-oriented manner.
4. CONCLUSION
The ROad Measurement and Data Acquisition System (ROMDAS) collects
and analyses the road-condition state through videos and the discrete
data acquired by specific measurement devices. Separation of video and
data storage forces road engineers to search the video manually in order
to find details of interest.
[FIGURE 1 OMITTED]
We concluded that the augmented video should be a self-contained
entity allowing the full data search according to data properties. A
hybrid video augmentation system was designed: Augmented Video stream
Framework (AVF). The AVF enables creation, search and playback of
self-contained augmented AVI files for effective road surveying. The AVF
approach is valuable far beyond the application area of road
maintenance.
This paper introduced AVF data structures used for video data
augmentation. The AVF uses type system similar to C++ and Java
programming languages and offers encapsulation of arbitrary data in an
object-oriented manner. Supported AVF atomic types are: bool, int,
double, string, image, audio stream and video stream. Time intervals
were used as synchronization units between the video and the augmented
data.
The ROMDAS video augmentation was carried out using only discrete
atomic types because ROMDAS measurement devices only capture the
discrete data. In the future we plan to enhance ROMDAS videos with
continual media (e.g. supplemental video of road roughness state).
Further research will be conducted towards creation of augmented
videos based on ontologies of different applications.
Acknowledgements This research was supported by IT Project No.
13013, financed by the government of Republic of Serbia.
5. REFERENCES
Agosti, M. & Ferro, N. (2007). A formal model of annotations of
digital content. ACM Transactions on Information Systems (TOIS), Vol.
26, No. 1, (November 2007), Article No. 3, 57 pages, ISSN: 1046-8188
Bennett, C.; R., Chamorro, A.; Chen, C.; Solminihac, De. H. &
Flintisch, G.W. (2007). Data Collection Technologies for Road
Management, The World Bank, East Asia Pacific Transport Unit,
Washington, D.C.
CORPORATE Microsoft Corp. (1991). Microsoft Windows multimedia
programmer's reference, Microsoft Press, ISBN: 1-55615-389-9,
Redmond, WA, USA
Correia, N. & Chambel T. (1999). Active video watching using
annotation. Proceedings of Seventh ACM international Conference on
Multimedia (Part 2), pp. 151-154, ISBN: 1-58113-239-5, Orlando, Florida,
USA, October 1999, ACM, New York, NY, USA
Mihic, S. (2007). Augmented Video stream Framework. M.Sc. thesis
(in Serbian), Faculty of Technical Sciences, Novi Sad, Serbia
Pfeiffer, S.; Parker, C. & Schremmer, C. (2003). Annodex: A
Simple Architecture to Enable Hyperlinking. Proceedings of the 5th ACM
SIGMM international Workshop on Multimedia information Retrieval. pp.
87-93, ISBN: 1-58113-778-8, Berkeley, California, November 2003, ACM,
New York, NY, USA
Romero, L. & Correia, N. (2003). Mixed reality hypermedia:
HyperReal: a hypermedia model for mixed reality. Proceedings of the
Fourteenth ACM Conference on Hypertext and Hypermedia, pp. 2-9, ISBN:
1-58113-704-4, Nottingham, UK, August 2003, ACM, New York, NY, USA
Schroeter, R.; Hunter, J.; Newman, A. (2007). Annotating
Relationships between Multiple Mixed-Media Digital Objects by Extending
Annotea. Lecture Notes in Computer Science, Vol. 4519/2007, (June 2007)
pp. 533-548, ISSN: 0302-9743