👁 Image Submitted by 👁 Image Huu Nguyen 10 MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources 👁 ontocord Ontocord.AI 9 3