Llama3-Taiwan-70B-Instruct-128K-AWQ-4bits leverages 4-bit quantized weights, processed with AutoAWQ, to significantly reduce GPU memory requirements.

References

Llama3-Taiwan-70B

For more information and detailed documentation, please refer to the links provided.

Downloads last month: 8

Safetensors

Model size

11.3B params

Tensor type

I32

FP16

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.