摘要:Object detection for vehicles and pedestrians is extremely difficult to achieve in autopilot applications for the Internet of vehicles, and it is a task that requires the ability to locate and identify smaller targets even in complex environments. This paper proposes a single-stage object detection network (YOLOv3-promote) for the detection of vehicles and pedestrians in complex environments in cities, which improves on the traditional You Only Look Once version 3 (YOLOv3). First, spatial pyramid pooling is used to fuse local and global features in an image to better enrich the expression ability of the feature map and to more effectively detect targets with large size differences in the image; second, an attention mechanism is added to the feature map to weight each channel, thereby enhancing key features and removing redundant features, which allows for strengthening the ability of the feature network to discriminate between target objects and backgrounds; lastly, the anchor box derived from the K-means clustering algorithm is fitted to the final prediction box to complete the positioning and identification of target vehicles and pedestrians. The experimental results show that the proposed method achieved 91.4 mAP (mean average precision), 83.2 F1 score, and 43.7 frames per second (FPS) on the KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) dataset, and the detection performance was superior to the conventional YOLOv3 algorithm in terms of both accuracy and speed.
关键词:Internet of vehicles; autonomous driving; object detection; attention mechanisms; spatial pyramid pooling Internet of vehicles ; autonomous driving ; object detection ; attention mechanisms ; spatial pyramid pooling