brittlewis12
/

s1-32B-GGUF

+---
+base_model: simplescaling/s1-32B
+pipeline_tag: text-generation
+inference: true
+language:
+- en
+license: apache-2.0
+model_creator: simplescaling
+model_name: s1-32B
+model_type: qwen2
+datasets:
+  - simplescaling/s1K
+quantized_by: brittlewis12
+---
+# s1 32B GGUF
+**Original model**: [s1 32B](https://huggingface.co/simplescaling/s1-32B)
+**Model creator**: [simplescaling](https://huggingface.co/simplescaling)
+> s1 is a reasoning model finetuned from Qwen2.5-32B-Instruct on just 1,000 examples. It matches o1-preview & exhibits test-time scaling via budget forcing.
+This repo contains GGUF format model files for simplescaling’s s1 32B, an open reproduction of OpenAI’s o1-preview on 1,000 reasoning traces, including model, source code, and data (see [s1K](https://huggingface.co/datasets/simplescaling/s1K)).
+Learn more on simplescaling’s [s1 github repo](https://github.com/simplescaling/s1) & [arxiv preprint](https://arxiv.org/abs/2501.19393).
+### What is GGUF?
+GGUF is a file format for representing AI models. It is the third version of the format,
+introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp.
+Converted with llama.cpp build 4628 (revision [cde3833](https://github.com/ggerganov/llama.cpp/commits/cde383323959544abe10a4d79e1d3e1ee479933c)),
+using [autogguf-rs](https://github.com/brittlewis12/autogguf-rs).
+### Prompt template: ChatML
+```
+<|im_start|>system
+{{system_message}}<|im_end|>
+<|im_start|>user
+{{prompt}}<|im_end|>
+<|im_start|>assistant
+```
+---
+## Download & run with [cnvrs](https://twitter.com/cnvrsai) on iPhone, iPad, and Mac!
+![cnvrs.ai](https://pbs.twimg.com/profile_images/1744049151241797632/0mIP-P9e_400x400.jpg)
+[cnvrs](https://testflight.apple.com/join/sFWReS7K) is the best app for private, local AI on your device:
+- create & save **Characters** with custom system prompts & temperature settings
+- download and experiment with any **GGUF model** you can [find on HuggingFace](https://huggingface.co/models?library=gguf)!
+- make it your own with custom **Theme colors**
+- powered by Metal ⚡️ & [Llama.cpp](https://github.com/ggerganov/llama.cpp), with **haptics** during response streaming!
+- **try it out** yourself today, on [Testflight](https://testflight.apple.com/join/sFWReS7K)!
+- follow [cnvrs on twitter](https://twitter.com/cnvrsai) to stay up to date
+---
+## Original Model Evaluation
+> **Table 1**:
+> s1-32B is an open and sample-efficient reasoning model.
+> We evaluate s1-32B, Qwen, and Gemini (some entries are unknown (N.A.), see §4).
+> Other results are from the respective reports (Qwen et al., 2024; Team, 2024b; OpenAI, 2024; DeepSeek-AI et al., 2025; Labs, 2025; Team, 2025).
+>
+> \# ex. = number examples used for reasoning finetuning; BF = budget forcing.
+via [s1: Simple test-time scaling (4.1 Results)](https://arxiv.org/html/2501.19393v2#S4:~:text=Table%201%3A,BF%20%3D%20budget%20forcing.)
+| Model | # ex. | AIME 2024 | MATH 500 | GPQA Diamond |
+|-------|--------|------------|-----------|---------------|
+| **API only** |
+| o1-preview | N.A. | 44.6 | 85.5 | 73.3 |
+| o1-mini | N.A. | 70.0 | 90.0 | 60.0 |
+| o1 | N.A. | **74.4** | **94.8** | **77.3** |
+| Gemini 2.0 Flash Think. | N.A. | 60.0 | N.A. | N.A. |
+| **Open Weights** |
+| Qwen2.5-32B-Instruct | N.A. | 26.7 | 84.0 | 49.0 |
+| QwQ-32B | N.A. | 50.0 | 90.6 | 65.2 |
+| r1 | >>800K | **79.8** | **97.3** | **71.5** |
+| r1-distill | 800K | 72.6 | 94.3 | 62.1 |
+| **Open Weights and Open Data** |
+| Sky-T1 | 17K | 43.3 | 82.4 | 56.8 |
+| Bespoke-32B | 17K | **63.3** | **93.0** | 58.1 |
+| s1 w/o BF | 1K | 50.0 | 92.6 | 56.6 |
+| s1-32B | 1K | **56.7** | **93.0** | **59.6** |