Model Details

Model Description

This is a 32B reasoning model preference optimized on top of Sky-T1-32B-Preview to significantly reduce generation lengths while maintaining accuracy. The performance is on par with o1-preview model in both math and coding, while reducing generation lengths by up to 57% relative to Sky-T1-32B-Preview. Please see our blog post for more details.

  • Developed by: NovaSky Team from Sky Computing Lab at UC Berkeley.

Training Details

Training Data

10K preference pairs in math and coding domains, generated by Sky-T1-32B-Preview.

Training Procedure

We perform Simple Policy Optimization (SimPO) with a batch size of 96, learning rate of 5e-7, gamma of 0.3, and beta of 2.0.

Speeds

We use Llama-Factory for training. On 8xH100, the SimPO training takes ~2.5 hours with DeepSpeed Zero-3 Offload.

Evaluation

Sky-T1-32B-Preview Sky-T1-32B-Flash Qwen2.5-32B-Instruct QwQ-32B- Base DeepSeek-R1-Distill-Qwen-32B
Math500 Acc 88.6 88.6 76.2 89.2 90.8
Avg Len 2124 1417 (-33%) 522 2089 2010
AIME24 Acc 43.3 43.3 16.7 50 66.7
Avg Len 6881 4365 (-37%) 970 7379 9173
LCB Easy Acc 87.4 89 84.6 90.7 91.2
Avg Len 3415 2265 (-34%) 414 3255 2775
LCB Medium Acc 56.8 56.3 40.8 56.3 76.7
Avg Len 8263 4389 (-47%) 535 6742 6324
LCB Hard Acc 17.9 17.9 9.8 17.1 38.2
Avg Len 14564 6199 (-57%) 618 10450 10448
MMLU Acc 82.4 81.7 80.1 85.2 82.1
Avg Len 1087 799 (-17%) 312 1041 774
GPQA Diamond Acc 56.8 56.6 45.5 52.5 62.6
Avg Len 3503 2148 (-39%) 600 3302 5108

Acknowledgement

We would like to thanks the compute resources from Lambda Lab and AnyScale.

License

Apache-2.0

Citation

Please considering citing our blog post if you found it useful for your research. Thank you!

@misc{reduce_overthinking_2025,
  author       = {NovaSky Team},
  title        = {Think Less, Achieve More: Cut Reasoning Costs by 50% Without Sacrificing Accuracy},
  howpublished = {https://novasky-ai.github.io/posts/reduce-overthinking},
  note         = {Accessed: 2025-01-23},
  year         = {2025}
}
Downloads last month
730
Safetensors
Model size
32.8B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for NovaSky-AI/Sky-T1-32B-Flash

Base model

Qwen/Qwen2.5-32B
Finetuned
(8)
this model
Finetunes
1 model
Merges
5 models
Quantizations
4 models

Datasets used to train NovaSky-AI/Sky-T1-32B-Flash

Space using NovaSky-AI/Sky-T1-32B-Flash 1