ChatGLM3-6B-int4

介绍 (Introduction) ChatGLM3-6B 是 ChatGLM 系列最新一代的开源模型,THUDM/chatglm3-6b

ChatGLM.CPP 基於 GGML quantize 生成 Q4_0、Q4_1 權重 weights 儲存於此倉庫。

Performance

Model GGML quantize method HDD size 1 token*
chatglm3-ggml-q4_0.bin q4_0 3.51 GB 74ms
chatglm3-ggml-q4_1.bin q4_1 3.9 GB 77ms
* ms/token (CPU @ Platinum 8260) from reference

Getting Started

  1. Install dependency
pip install chatglm-cpp transformers
  1. Download weight
wget https://huggingface.co/npc0/chatglm3-6b-int4/resolve/main/chatglm3-ggml-q4_0.bin
  1. Code
import chatglm_cpp

pipeline = chatglm_cpp.Pipeline("./chatglm3-ggml-q4_0.bin")
pipeline.chat([chatglm_cpp.ChatMessage(role="user", content="你好")])
# Output: ChatMessage(role="assistant", content="你好👋!我是人工智能助手 ChatGLM-6B,很高兴见到你,欢迎问我任何问题。", tool_calls=[])
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Space using npc0/chatglm3-6b-int4 1

Collection including npc0/chatglm3-6b-int4