eaddario
/

Watt-Tool-8B-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

eaddario commited on 10 days ago

Commit

278f691

·

verified ·

1 Parent(s): 10b95e2

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -29,7 +29,7 @@ The process to produce the quantized [GGUF](https://huggingface.co/docs/hub/en/g
 2. Estimate the Perplexity score for the F16 model (base) using [wikitext-2-raw-v1](https://huggingface.co/datasets/Salesforce/wikitext/tree/main/wikitext-2-raw-v1), and record the [logits](https://huggingface.co/eaddario/Watt-Tool-8B-GGUF/tree/main/logits)
 3. Generate the [imatrix](https://huggingface.co/eaddario/Watt-Tool-8B-GGUF/tree/main/imatrix) for each calibration dataset
 4. Create quantized versions of the base model using each imatrix per quant type
-5. Calculate the Perplexity and KL Divergence scores for each quantized model [(logs)](https://huggingface.co/eaddario/Watt-Tool-8B-GGUF/tree/main/scores)
 6. For each quant type, keep the version with the best (usually the lowest) scores
 *[BF16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) would be preferred, but Apple's GPUs don't support it yet, and therefore any operations are executed in the CPU, making it unacceptably slow. This is expected to change in the near term but until then, if you are using Apple kit avoid using any models tagged BF16

 2. Estimate the Perplexity score for the F16 model (base) using [wikitext-2-raw-v1](https://huggingface.co/datasets/Salesforce/wikitext/tree/main/wikitext-2-raw-v1), and record the [logits](https://huggingface.co/eaddario/Watt-Tool-8B-GGUF/tree/main/logits)
 3. Generate the [imatrix](https://huggingface.co/eaddario/Watt-Tool-8B-GGUF/tree/main/imatrix) for each calibration dataset
 4. Create quantized versions of the base model using each imatrix per quant type
+5. Calculate the Perplexity and KL Divergence scores for each quantized model [(scores)](https://huggingface.co/eaddario/Watt-Tool-8B-GGUF/tree/main/scores)
 6. For each quant type, keep the version with the best (usually the lowest) scores
 *[BF16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) would be preferred, but Apple's GPUs don't support it yet, and therefore any operations are executed in the CPU, making it unacceptably slow. This is expected to change in the near term but until then, if you are using Apple kit avoid using any models tagged BF16