--- base_model: simplescaling/s1-32B pipeline_tag: text-generation inference: true language: - en license: apache-2.0 model_creator: simplescaling model_name: s1-32B model_type: qwen2 datasets: - simplescaling/s1K quantized_by: brittlewis12 --- # s1 32B GGUF **Original model**: [s1 32B](https://huggingface.co/simplescaling/s1-32B) **Model creator**: [simplescaling](https://huggingface.co/simplescaling) > s1 is a reasoning model finetuned from Qwen2.5-32B-Instruct on just 1,000 examples. It matches o1-preview & exhibits test-time scaling via budget forcing. This repo contains GGUF format model files for simplescaling’s s1 32B, an open reproduction of OpenAI’s o1-preview on 1,000 reasoning traces, including model, source code, and data (see [s1K](https://huggingface.co/datasets/simplescaling/s1K)). Learn more on simplescaling’s [s1 github repo](https://github.com/simplescaling/s1) & [arxiv preprint](https://arxiv.org/abs/2501.19393). ### What is GGUF? GGUF is a file format for representing AI models. It is the third version of the format, introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Converted with llama.cpp build 4628 (revision [cde3833](https://github.com/ggerganov/llama.cpp/commits/cde383323959544abe10a4d79e1d3e1ee479933c)), using [autogguf-rs](https://github.com/brittlewis12/autogguf-rs). ### Prompt template: ChatML ``` <|im_start|>system {{system_message}}<|im_end|> <|im_start|>user {{prompt}}<|im_end|> <|im_start|>assistant ``` --- ## Download & run with [cnvrs](https://twitter.com/cnvrsai) on iPhone, iPad, and Mac! ![cnvrs.ai](https://pbs.twimg.com/profile_images/1744049151241797632/0mIP-P9e_400x400.jpg) [cnvrs](https://testflight.apple.com/join/sFWReS7K) is the best app for private, local AI on your device: - create & save **Characters** with custom system prompts & temperature settings - download and experiment with any **GGUF model** you can [find on HuggingFace](https://huggingface.co/models?library=gguf)! - make it your own with custom **Theme colors** - powered by Metal ⚡️ & [Llama.cpp](https://github.com/ggerganov/llama.cpp), with **haptics** during response streaming! - **try it out** yourself today, on [Testflight](https://testflight.apple.com/join/sFWReS7K)! - follow [cnvrs on twitter](https://twitter.com/cnvrsai) to stay up to date --- ## Original Model Evaluation > **Table 1**: > s1-32B is an open and sample-efficient reasoning model. > We evaluate s1-32B, Qwen, and Gemini (some entries are unknown (N.A.), see §4). > Other results are from the respective reports (Qwen et al., 2024; Team, 2024b; OpenAI, 2024; DeepSeek-AI et al., 2025; Labs, 2025; Team, 2025). > > \# ex. = number examples used for reasoning finetuning; BF = budget forcing. via [s1: Simple test-time scaling (4.1 Results)](https://arxiv.org/html/2501.19393v2#S4:~:text=Table%201%3A,BF%20%3D%20budget%20forcing.) | Model | # ex. | AIME 2024 | MATH 500 | GPQA Diamond | |-------|--------|------------|-----------|---------------| | **API only** | | o1-preview | N.A. | 44.6 | 85.5 | 73.3 | | o1-mini | N.A. | 70.0 | 90.0 | 60.0 | | o1 | N.A. | **74.4** | **94.8** | **77.3** | | Gemini 2.0 Flash Think. | N.A. | 60.0 | N.A. | N.A. | | **Open Weights** | | Qwen2.5-32B-Instruct | N.A. | 26.7 | 84.0 | 49.0 | | QwQ-32B | N.A. | 50.0 | 90.6 | 65.2 | | r1 | >>800K | **79.8** | **97.3** | **71.5** | | r1-distill | 800K | 72.6 | 94.3 | 62.1 | | **Open Weights and Open Data** | | Sky-T1 | 17K | 43.3 | 82.4 | 56.8 | | Bespoke-32B | 17K | **63.3** | **93.0** | 58.1 | | s1 w/o BF | 1K | 50.0 | 92.6 | 56.6 | | s1-32B | 1K | **56.7** | **93.0** | **59.6** |