Overview

This document presents the evaluation results of DeepSeek-R1-Distill-Llama-70B, a 4-bit quantized model using GPTQ, evaluated with the Language Model Evaluation Harness on the ARC-Challenge benchmark.

⚙️ Model Configuration

  • Model: Llama-3.3-70B-Instruct
  • Parameters: 70 billion
  • Quantization: 4-bit GPTQ
  • Source: Hugging Face (hf)
  • Precision: torch.float16
  • Hardware: NVIDIA A100 80GB PCIe
  • CUDA Version: 12.4
  • PyTorch Version: 2.6.0+cu124
  • Batch Size: 1

📌 Interpretation:

  • The evaluation was performed on a high-performance GPU (A100 80GB).
  • The model is significantly larger than the previous 8B version, with GPTQ 4-bit quantization reducing memory footprint.
  • A single-sample batch size was used, which might slow evaluation speed.

📌 Let us know if you need further analysis or model tuning! 🚀

Downloads last month
2
Safetensors
Model size
11.3B params
Tensor type
I32
·
BF16
·
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for empirischtech/Llama-3.3-70B-gptq-4bit

Quantized
(84)
this model

Dataset used to train empirischtech/Llama-3.3-70B-gptq-4bit