70 45 8

Mayank Mishra

mayank-mishra

https://mayank31398.github.io/

AI & ML interests

Large Language Models, Distributed Training and Inference

Recent Activity

new activity 29 days ago

ibm-granite/granite-3.1-2b-instruct:RE-ADD float32 please.

upvoted a collection about 2 months ago

Granite 3.1 Language Models

new activity about 2 months ago

ibm-granite/granite-3.1-8b-instruct:Exceptional creative writer

View all activity

Organizations

mayank-mishra's activity

New activity in ibm-granite/granite-3.1-2b-instruct 29 days ago

RE-ADD float32 please.

#3 opened about 1 month ago by

ctranslate2-4you

New activity in ibm-granite/granite-3.1-8b-instruct about 2 months ago

Exceptional creative writer

#1 opened about 2 months ago by

SubtleOne

New activity in ibm-granite/granite-3.0-2b-instruct 4 months ago

add base model metadata

#3 opened 4 months ago by

davanstrien

New activity in ibm-granite/granite-3.0-8b-instruct 4 months ago

add base model metadata

#5 opened 4 months ago by

davanstrien

New activity in ibm-granite/granite-3.0-1b-a400m-instruct 4 months ago

Add base model metadata

#2 opened 4 months ago by

davanstrien

New activity in ibm-research/PowerMoE-3b 5 months ago

torch and llama.cpp integration

#1 opened 5 months ago by

TobDeBer

New activity in cfahlgren1/model-release-heatmap 6 months ago

Add IBM

#5 opened 6 months ago by

mayank-mishra

New activity in ibm-granite/granite-8b-code-instruct-128k 6 months ago

Fix: link to 128k paper

#1 opened 6 months ago by

timrbula

New activity in meta-llama/Llama-3.1-405B 7 months ago

405B or 410B ?

#8 opened 7 months ago by

alielfilali01

New activity in ibm-granite/granite-3b-code-instruct-2k 7 months ago

Onnx Model Produces Different Output

#2 opened 9 months ago by

runski

New activity in ibm-granite/granite-8b-code-instruct-4k 8 months ago

[AUTOMATED] Model Memory Requirements

#4 opened 9 months ago by

model-sizer-bot

Function calling and Streaming support

#5 opened 8 months ago by

skumarai

New activity in mayank-mishra/glaive-code-assisstant-v3-20k 8 months ago

[bot] Conversion to Parquet

#1 opened 8 months ago by

parquet-converter

New activity in ibm-granite/granite-8b-code-instruct-4k 8 months ago

Input context length

#6 opened 8 months ago by

dyoung

Official quants?

#2 opened 9 months ago by

joshuaturner

New activity in ibm-granite/granite-20b-code-instruct-8k 8 months ago

Is it fine to do granite-20b model's inference with bfloat16 dtype?

#1 opened 8 months ago by

lkm1

New activity in ibm-granite/granite-3b-code-instruct-2k-GGUF 8 months ago

Issue to run the model on Ollama.

#1 opened 8 months ago by

vperrinfr

New activity in ibm-granite/granite-3b-code-base-2k 8 months ago

Release GGUF models?

#5 opened 9 months ago by

CosmicSound

commented a paper 9 months ago

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 29 •

New activity in ibm-granite/granite-20b-code-base-8k-GGUF 9 months ago

3b, 8b, and 34b versions of GGUF?

#1 opened 9 months ago by

tombenninger