Meta AI recently open-sourced data2vec, a unified framework for self-supervised deep learning on images, text, and speech audio data. When evaluated on common benchmarks, models trained using data2vec perform as well as or better than state-of-the-art models trained with modality-specific objectives, noted InfoQ.

Data2vec is a framework that uses the same learning method for either speech, NLP or computer vision. The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture, told arXiv.

According to a post in Meta’s blog, data2vec is simplifying the different algorithms functioning by training models to predict their own representations of the input data, regardless of the modality. A single algorithm can work with completely different types of input. This removes the dependence on modality-specific targets in the learning task. Directly predicting representations is not straightforward, and it requires defining a robust normalization of the features for the task that would be reliable in different modalities.

Tags: , , , , , , , , , , , , , , , , , , , , , , ,
Special Projects Editor
java2days conference link

Most popular in DevStyleR

Broadcom announces plans to buy VMware in $61 billion deal
Update One

Update One

10 May, 2022
Gameloft

Gameloft

9 May, 2022
Droxic

Droxic

27 April, 2022
Tarya Fintech Bulgaria