top of page

Abstract

Image-text matching is a fundamental and crucial problem in multi-modal information retrieval. Although much progress has been made in bridging vision and language, it remains challenging because of the requirements for the intra-modal reasoning and cross-modal alignment. Despite different modality interaction patterns have been explored, there are still many effective interaction patterns that have not been considered. Besides, existing methods depend heavily on expert experience towards the design of interaction patterns, therefore lacking flexibility.
   To address these issues, we develop a novel modality interaction modeling network relying on the dynamic routing technology, which is the first unified and dynamic multimodal interaction framework for image-text matching. In particular, we first design four types of cells to explore different levels of modality interactions, and then connect them in a dense way to construct a routing space. To endow the model with path decision capability, we integrate a dynamic router in each cell for pattern exploration. As the routers are conditioned on inputs, our model can dynamically learn different activated paths for different data. Extensive experiments on two benchmark datasets, i.e., Flickr30K and MS-COCO, demonstrate the effectiveness and superiority of our model compared with the state-of-the-art methods. 

Anchor 1
framework1.png

Framework

Anchor 2

Codes & Data

yJvekMlDqr.png
MS-COCO
yJvekMlDqr.png
Precomputed Features
  • Codes
yJvekMlDqr.png
DIME
  • Datasets
yJvekMlDqr.png
Flickr30K
  • Pretrained Models
yJvekMlDqr.png
Pretrained BERT
Anchor 3

Copyright (C) <2020>  Shandong University

 

This program is licensed under the GNU General Public License 3.0 (https://www.gnu.org/licenses/gpl-3.0.html). Any derivative work obtained under this license must be licensed under the GNU General Public License as published by the Free Software Foundation, either Version 3 of the License, or (at your option) any later version, if this derivative work is distributed to a third party.

 

The copyright for the program is owned by Shandong University. For commercial projects that require the ability to distribute the code of this program as part of a program that cannot be distributed under the GNU General Public License, please contact <leigangqu@gmail.com> to purchase a commercial license.

bottom of page