This week in open AI was π₯ Let's recap! π€ merve/january-31-releases-679a10669bd4030090c5de4d LLMs π¬ > Huge: AllenAI released new TΓΌlu models that outperform DeepSeek R1 using Reinforcement Learning with Verifiable Reward (RLVR) based on Llama 3.1 405B π₯ > Mistral AI is back to open-source with their "small" 24B models (base & SFT), with Apache 2.0 license π± > Alibaba Qwen released their 1M context length models Qwen2.5-Instruct-1M, great for agentic use with Apache 2.0 license π₯ > Arcee AI released Virtuoso-medium, 32.8B LLMs distilled from DeepSeek V3 with dataset of 5B+ tokens > Velvet-14B is a new family of 14B Italian LLMs trained on 10T tokens in six languages > OpenThinker-7B is fine-tuned version of Qwen2.5-7B-Instruct on OpenThoughts dataset
VLMs & vision π > Alibaba Qwen is back with Qwen2.5VL, amazing new capabilities ranging from agentic computer use to zero-shot localization π₯ > NVIDIA released new series of Eagle2 models with 1B and 9B sizes > DeepSeek released Janus-Pro, new any-to-any model (image-text generation from image-text input) with MIT license > BEN2 is a new background removal model with MIT license!
Audio π£οΈ > YuE is a new open-source music generation foundation model, lyrics-to-song generation
Apple released AIMv2 π a family of state-of-the-art open-set vision encoders apple/aimv2-6720fe1558d94c7805f7688c > like CLIP, but add a decoder and train on autoregression π€― > 19 open models come in 300M, 600M, 1.2B, 2.7B with resolutions of 224, 336, 448 > Load and use with π€ transformers
For anyone who struggles with NER or information extraction with LLM.
We showed an efficient workflow for token classification including zero-shot suggestions and model fine-tuning with Argilla, GliNER, the NuMind NuExtract LLM and SpanMarker. @argilla