摘要:Capsule endoscopy (CE) is a widely used, minimally invasive alternative to traditional endoscopy that allows visualisation of the entire small intestine. Patient preparation can help to obtain a cleaner intestine and thus better visibility in the resulting videos. However, studies on the most effective preparation method are conflicting due to the absence of objective, automatic cleanliness evaluation methods. In this work, we aim to provide such a method capable of presenting results on an intuitive scale, with a relatively light-weight novel convolutional neural network architecture at its core. We trained our model using 5-fold cross-validation on an extensive data set of over 50,000 image patches, collected from 35 different CE procedures, and compared it with state-of-the-art classification methods. From the patch classification results, we developed a method to automatically estimate pixel-level probabilities and deduce cleanliness evaluation scores through automatically learnt thresholds. We then validated our method in a clinical setting on 30 newly collected CE videos, comparing the resulting scores to those independently assigned by human specialists. We obtained the highest classification accuracy for the proposed method (95.23%), with significantly lower average prediction times than for the second-best method. In the validation of our method, we found acceptable agreement with two human specialists compared to interhuman agreement, showing its validity as an objective evaluation method.
其他摘要:Abstract Capsule endoscopy (CE) is a widely used, minimally invasive alternative to traditional endoscopy that allows visualisation of the entire small intestine. Patient preparation can help to obtain a cleaner intestine and thus better visibility in the resulting videos. However, studies on the most effective preparation method are conflicting due to the absence of objective, automatic cleanliness evaluation methods. In this work, we aim to provide such a method capable of presenting results on an intuitive scale, with a relatively light-weight novel convolutional neural network architecture at its core. We trained our model using 5-fold cross-validation on an extensive data set of over 50,000 image patches, collected from 35 different CE procedures, and compared it with state-of-the-art classification methods. From the patch classification results, we developed a method to automatically estimate pixel-level probabilities and deduce cleanliness evaluation scores through automatically learnt thresholds. We then validated our method in a clinical setting on 30 newly collected CE videos, comparing the resulting scores to those independently assigned by human specialists. We obtained the highest classification accuracy for the proposed method (95.23%), with significantly lower average prediction times than for the second-best method. In the validation of our method, we found acceptable agreement with two human specialists compared to interhuman agreement, showing its validity as an objective evaluation method.