In this paper, two novel and practical regularizing methods are proposed to improve existing neural network architectures for monocular optical flow estimation. The proposed methods aim to alleviate deficiencies of current methods, such as flow leakage across objects and motion consistency within rigid objects, by exploiting contextual information. More specifically, the first regularization method utilizes semantic information during the training process to explicitly regularize the produced optical flow field. The novelty of this method lies in the use of semantic segmentation masks to teach the network to implicitly identify the semantic edges of an object and better reason on the local motion flow. A novel loss function is introduced that takes into account the objects' boundaries as derived from the semantic segmentation mask to selectively penalize motion inconsistency within an object. The method is architecture agnostic and can be integrated into any neural network without modifying or adding complexity at inference. The second regularization method adds spatial awareness to the input data of the network in order to improve training stability and efficiency. The coordinates of each pixel are used as an additional feature, breaking the invariance properties of the neural network architecture. The additional features are shown to implicitly regularize the optical flow estimation enforcing a consistent flow, while improving both the performance and the convergence time. Finally, the combination of both regularization methods further improves the performance of existing cutting edge architectures in a complementary way, both quantitatively and qualitatively, on popular flow estimation benchmark datasets.
Keywords: coordconv; motion consistency; optical flow; regularization; semantic segmentation.