PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov, Anurag Arnab, Krzysztof Marcin Choromanski et al.
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov, Anurag Arnab, Krzysztof Marcin Choromanski et al.