DADAN: dual-path attention with distribution analysis network for text-image matching

dc.contributor.author Li, Wenhao
dc.contributor.author Zhu, Hongqing
dc.contributor.author Yang, Suyi
dc.contributor.author Zhang, Han
dc.date.accessioned 2022-01-29T02:41:53Z
dc.date.available 2022-01-29T02:41:53Z
dc.date.issued 1/17/2022
dc.description Bidirectional visual-text retrieval task has aroused interest of many researchers in the field of computer vision. In this paper, an end-to-end trainable model inserted with a proposed dual-path attention with distribution analysis network is established to minimize misalignment caused by irrelevant matching. This architecture is effective in terms of split of path by the distribution analysis such that targeted attention mechanisms can be designed to capture truly contributing text-region pairs. In specific, the proposed row-wise attention and column-wise attention accomplish relative similarity analysis in both query modality and retrieval modality simultaneously. In each retrieval direction, the significance of relevance could be comprehensively justified along with latent alignment inference. Meanwhile, this method not only filters irrelevant retrieval current studies that mainly aim at, but also provides more reasonable order of retrieval results. Experimental results on public benchmarks illustrate noticeable improvement on text-image matching, especially for text retrieval direction.
dc.description.abstract
dc.identifier.citation
dc.identifier.other 10.1007/s11760-021-02020-2
dc.identifier.uri
dc.identifier.uri https://data.tickbase.net/handle/20.500.13086/3938
dc.title DADAN: dual-path attention with distribution analysis network for text-image matching
Files