metadata

language:
  - en
  - de
  - fr
  - it
  - pt
  - hi
  - es
  - th
license: llama3.1
pipeline_tag: text-generation
library_name: transformers
base_model: meta-llama/Meta-Llama-3.1-70B
tags:
  - pytorch
  - llama
  - llama-3
  - vultr

Model Information

The vultr/Meta-Llama-3.1-70B-Instruct-AWQ-INT4-Dequantized-FP32 model is a quantized version Meta-Llama-3.1-70B-Instruct that was dequantized from HuggingFace's AWS Int4 model and requantized and optimized to run on AMD GPUs. It is a drop-in replacement for hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4.

Throughput: 68.74 requests/s, 43994.71 total tokens/s, 8798.94 output tokens/s

Model Details

Model Description

Developed by: Meta
Model type: Quantized Large Language Model
Language(s) (NLP): English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
License: Llama 3.1
Dequantized From: hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4

Compute Infrastructure

Vultr

Hardware

AMD MI300X

Software

ROCm

Model Author

biondizzle