Overview
This document presents the evaluation results of DeepSeek-R1-Distill-Llama-70B
, a 4-bit quantized model using GPTQ, evaluated with the Language Model Evaluation Harness on the ARC-Challenge benchmark.
⚙️ Model Configuration
- Model:
Llama-3.3-70B-Instruct
- Parameters:
70 billion
- Quantization:
4-bit GPTQ
- Source: Hugging Face (
hf
) - Precision:
torch.float16
- Hardware:
NVIDIA A100 80GB PCIe
- CUDA Version:
12.4
- PyTorch Version:
2.6.0+cu124
- Batch Size:
1
📌 Interpretation:
- The evaluation was performed on a high-performance GPU (A100 80GB).
- The model is significantly larger than the previous 8B version, with GPTQ 4-bit quantization reducing memory footprint.
- A single-sample batch size was used, which might slow evaluation speed.
📌 Let us know if you need further analysis or model tuning! 🚀
- Downloads last month
- 2
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.
Model tree for empirischtech/Llama-3.3-70B-gptq-4bit
Base model
meta-llama/Llama-3.1-70B
Finetuned
meta-llama/Llama-3.3-70B-Instruct