文章基本信息

标题：MULTI-MODAL SEMANTIC MESH SEGMENTATION IN URBAN SCENES
本地全文：下载
作者：D. Laupheimer ; N. Haala
期刊名称：ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
印刷版ISSN：2194-9042
电子版ISSN：2194-9050
出版年度：2022
卷号：V-2-2022
页码：267-274
DOI：10.5194/isprs-annals-V-2-2022-267-2022
语种：English
出版社：Copernicus Publications
摘要：The semantic segmentation of the huge amount of acquired 3D data has become an important task in recent years. Meshes have evolved into a standard representation next to Point Clouds (PCs) – not least because of their great visualization possibilities. Compared to PCs, meshes have commonly smaller memory footprints while jointly providing geometrical and high-resolution textural information. For this reason, we opt for semantic mesh segmentation, which is a widely overlooked topic in photogrammetry and remote sensing yet. In this work, we perform an extensive ablation study on multi-modal handcrafted features adapting the Point Cloud Mesh Association (PCMA) (Laupheimer et al., 2020) which establishes explicit connections between faces and points. The multi-modal connections are used in a two-fold manner: (i) to extend per-face descriptors with features engineered on the PC and (ii) to annotate meshes semi-automatically by propagating the manually assigned labels from the PCs. In this way, we derive annotated meshes from the ISPRS benchmark data sets Vaihingen 3D (V3D) and Hessigheim 3D (H3D). To demonstrate the effectiveness of the multi-modal approach, we use well-established and fast Random Forest (RF) models deploying various feature vector compositions and analyze their performances for semantic mesh segmentation. The feature vector compositions consider features derived from the mesh, the PC or both. The results indicate that the combination of radiometric and geometric features outperforms feature sets of a single feature type only. Besides, we observe that relative height is the most crucial feature. The main finding is that the multi-modal feature vector integrates the complementary strengths of the underlying modalities. Whereas the mesh provides outstanding textural information, the dense PCs are superior in geometry. The multi-modal feature descriptor achieves the best performance on both data sets. It significantly outperforms feature sets that incorporate only features derived from the mesh by +7.37 pp and +2.38 pp for mF1 and Overall Accuracy (OA) on V3D. The registered improvement is +9.23 pp and +4.33 pp for mF1 and OA on H3D.
关键词：Urban Scene Understanding; Semantic Segmentation; Multi-Modality; Textured Mesh; Point Cloud