High quality data is integral in order to be used to train foundation models. Which type of data, which has not been tagged with characteristics, properties, or classifications, is needed in large amount for training?