In the field of computer science and technology, which statement best describes "masked autoencoders are scalable vision learners"?
- Masked autoencoders are a type of generative model that excels in creating images from textual descriptions.
- Masked autoencoders use a mechanism to hide portions of input data and are designed to scale efficiently for large vision datasets.
- Masked autoencoders are primarily used for speech recognition tasks and are not suitable for visual data.
- Masked autoencoders are a type of reinforcement learning model that learns through interactions with an environment.