We present here a new dataset for object tracking using both sound and video data. The proposed dataset is composed by 3 different sequences of audio-video data, collected with the DualCam device in both indoor and outdoor scenarios: (1) Drone Sequence; (2) Voice Sequence; and (3) Motorbike Sequence.
The aim is to show the potentialities of using acoustic images for target tracking in three challenging scenarios. In particular, the audio-based approach, proposed in the paper, is able to overcome, often dramatically, visual tracking with state-of-art algorithms, dealing efficiently with occlusions, abrupt variations in visual appearence and camouflage. These results pave the way to a widespread use of acoustic imaging in application scenarios such as in security and surveillance.