文章基本信息

标题：Three‐stream network with context convolution module for human–object interaction detection
本地全文：下载
作者：Thomhert S. Siadari ; Mikyong Han ; Hyunjin Yoon 等
期刊名称：ETRI Journal
印刷版ISSN：1225-6463
电子版ISSN：2233-7326
出版年度：2020
卷号：42
期号：2
页码：230-238
DOI：10.4218/etrij.2019-0230
语种：English
出版社：Electronics and Telecommunications Research Institute
摘要：Human–object interaction (HOI) detection is a popular computer vision task that detects interactions between humans and objects. This task can be useful in many applications that require a deeper understanding of semantic scenes. Current HOI detection networks typically consist of a feature extractor followed by detection layers comprising small filters (eg, 1 × 1 or 3 × 3). Although small filters can capture local spatial features with a few parameters, they fail to capture larger context information relevant for recognizing interactions between humans and distant objects owing to their small receptive regions. Hence, we herein propose a three‐stream HOI detection network that employs a context convolution module (CCM) in each stream branch. The CCM can capture larger contexts from input feature maps by adopting combinations of large separable convolution layers and residual‐based convolution layers without increasing the number of parameters by using fewer large separable filters. We evaluate our HOI detection method using two benchmark datasets, V‐COCO and HICO‐DET, and demonstrate its state‐of‐the‐art performance.