Michael Goin's picture

Michael Goin PRO

mgoin

·

mgoin_
mgoin

AI & ML interests

LLM inference optimization, compression, quantization, pruning, distillation

Recent Activity

upvoted a paper 1 day ago

QuEST: Stable Training of LLMs with 1-Bit Weights and Activations

updated a model 5 days ago

nm-testing/pixtral-12b-FP8-dynamic-all

updated a model 5 days ago

neuralmagic/pixtral-12b-FP8-dynamic

View all activity

Organizations

mgoin's activity

New activity in neuralmagic/gemma-2-9b-it-FP8 6 days ago

AttributeError: 'Gemma2Config' object has no attribute 'interleaved_sliding_window' Traceback (most recent call last):

#3 opened 8 days ago by

New activity in neuralmagic/granite-3.1-8b-instruct-FP8-dynamic 7 days ago

compressed-tensors MLA support requires fp8 activations and weights in group 'group_0',

#1 opened 8 days ago by

New activity in neuralmagic/Meta-Llama-3-8B-Instruct-FP8-KV 15 days ago

How to load this model?

#1 opened 7 months ago by

New activity in neuralmagic/Llama-3.2-90B-Vision-Instruct-FP8-dynamic about 2 months ago

Model does not run with VLLM

#3 opened about 2 months ago by

New activity in nm-testing/Llama-3.3-70B-Instruct-FP8-dynamic about 2 months ago

Nice model, any info on scripts used to quantize?

#1 opened 2 months ago by

New activity in neuralmagic/Llama-3.2-11B-Vision-Instruct-FP8-dynamic about 2 months ago

Thanks!

#2 opened 2 months ago by

New activity in mistralai/Pixtral-Large-Instruct-2411 3 months ago

Add config_format and load_format to vLLM args

#5 opened 3 months ago by

Update config.json to use null for sliding_window

#4 opened 3 months ago by

New activity in mgoin/nemotron-3-8b-chat-4k-sft-hf 3 months ago

Adding `safetensors` variant of this model

#1 opened 3 months ago by

New activity in neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a16 3 months ago

Is this the standard GPTQ quantization?

#5 opened 3 months ago by

New activity in neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16 3 months ago

Model weights are not loaded

#3 opened 6 months ago by

New activity in neuralmagic/pixtral-12b-FP8-dynamic 3 months ago

Update model card

#1 opened 3 months ago by

New activity in nm-testing/llava-1.5-7b-hf-FP8-dynamic 3 months ago

Add chat_template to tokenizer_config.json

#1 opened 3 months ago by

New activity in neuralmagic/Mistral-Nemo-Instruct-2407-FP8 3 months ago

7900xtx torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+

#3 opened 3 months ago by

New activity in mistral-community/pixtral-12b 4 months ago

Why is the Pixtral activation function "gelu" when the reference code uses "silu"?

#10 opened 4 months ago by

Update tokenizer_config.json with chat_template

#11 opened 4 months ago by

New activity in neuralmagic/Llama-3.2-90B-Vision-Instruct-FP8-dynamic 4 months ago

Any chance your team is working on a 4-bit Llama-3.2-90B-Vision-Instruct-quantized.w4a16 version?

#1 opened 5 months ago by

New activity in neuralmagic/Llama-3.2-11B-Vision-Instruct-FP8-dynamic 4 months ago

Oom with 24g vram

#1 opened 5 months ago by

New activity in neuralmagic/Phi-3.5-mini-instruct-FP8-KV 4 months ago

latest vllm docker (v0.6.2) fail to load

#1 opened 4 months ago by

New activity in neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16 5 months ago

Issue with loading model

#1 opened 5 months ago by