This paper was accepted on the Picture Matching: Native Options & Past workshop at CVPR 2024.
Figuring out strong and correct correspondences throughout photographs is a elementary downside in laptop imaginative and prescient that permits numerous downstream duties. Latest semi-dense matching strategies emphasize the effectiveness of fusing related cross-view info by way of Transformer. On this paper, we suggest a number of enhancements upon this paradigm. Firstly, we introduce affine-based native consideration to mannequin cross-view deformations. Secondly, we current selective fusion to merge native and world messages from cross consideration. Aside from community construction, we additionally establish the significance of implementing spatial smoothness in loss design, which has been omitted by earlier works. Primarily based on these augmentations, our community show robust matching capability below totally different settings. The total model of our community achieves state-of-the-art efficiency amongst semi-dense matching strategies at the same price to LoFTR, whereas the slim model reaches LoFTR baseline’s efficiency with solely 15% computation price and 18% parameters.