huggingface/HuggingDiscussions · [FEEDBACK] Inference Providers

julien-c

Hugging Face org 25 days ago

Any inference provider you love, and that you'd like to be able to access directly from the Hub?

reach-vb

Hugging Face org 14 days ago

•

edited 14 days ago

Love that I can call DeepSeek R1 directly from the Hub 🔥

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="together",
    api_key="xxxxxxxxxxxxxxxxxxxxxxxx"
)

messages = [
    {
        "role": "user",
        "content": "What is the capital of France?"
    }
]

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1", 
    messages=messages, 
    max_tokens=500
)

print(completion.choices[0].message)

benhaotang

14 days ago

•

edited 14 days ago

Is it possible to set a monthly payment budget or rate limits for all the external providers? I don't see such options in billings tab. In case a key is or session token is stolen, it can be quite dangerous to my thin wallet:(

julien-c

Hugging Face org 14 days ago

@benhaotang you already get spending notifications when crossing important thresholds ($10, $100, $1,000) but we'll add spending limits in the future

benhaotang

14 days ago

•

edited 14 days ago

@benhaotang you already get spending notifications when crossing important thresholds ($10, $100, $1,000) but we'll add spending limits in the future

Thanks for your quick reply, good to know!

sylanaustin

14 days ago

Would be great if you could add Nebius AI Studio to the list :) New inference provider on the market, with the absolute cheapest prices and the highest rate limits...

Hazzzardous

14 days ago

Could be good to add featherless.ai

teentitan

14 days ago

TitanML !!

alexatallah

14 days ago

OpenRouter!

Dhruv

14 days ago

Hi everyone and first of all I really thank huggingface team for releasing this feature.
One another suggestion that I need is :
to give more granular parameters for the user to share the deployment configuration file (as an terraform hcl configuration) in order to define more optimum infrastructure for inference and let them define the inference model.

victor1203

14 days ago

requesty.ai
groq

llamameta

14 days ago

•

edited 13 days ago

TypeError: InferenceClient.__init__() got an unexpected keyword argument 'provider'

lukep

14 days ago

Add RunPod as an inference provider!

yuchenj

14 days ago

Hyperbolic!

nerdylive

14 days ago

Runpod!

pcuenq

Hugging Face org 13 days ago

@llamameta please, make sure you use the latest version of huggingface_hub

pip install --upgrade huggingface_hub

temirulan

13 days ago

Hello. Congratulations on great feature. How we can proceed on adding deepinfra.com to the providers list?

bitsnaps

13 days ago

That's a great news, and if you provide more measurable details on what's free inference with a small quota is about for signed-in free users, you'll build more trust around the community, hence more PRO subscribers!

Thibault-Requesty

13 days ago

requesty.ai !

Xrunner

13 days ago

Runpod is excellent.
Let's add it to inference provider list

ozone-ai

13 days ago

runpod!

Jeydd

13 days ago

Runpod would be very welcome!

mitchken

13 days ago

Yes, Runpod.io would be great indeed

Presario

13 days ago

This comment has been hidden

Presario

13 days ago

@julien-c

is there a way to add https://nineteen.ai/ as a provider ? it allows free access to top models like Deepseek R1 & the llama family
there is also : https://chutes.ai/ which can allow users to deploy any model on demand

avianio

13 days ago

Can we Add https://avian.io as a provider @julien-c

Currently have the fastest inference on Nvidia hardware, as well as highest throughput, allowing users to deploy any model on demand

faysalmehrab

13 days ago

The inference API is not working for HF Inference (∞ requests).

My Settings:

The Issue:

apollinix

12 days ago

ai/ml api - dudes have a lot of models from HF!

bghira

12 days ago

try Runware.

chuntz89

12 days ago

Runware plz – it's ~5x cheaper than other providers and still one of the fastest

Monitob

12 days ago

Try Scaleway.

flaviuR

12 days ago

Runware is best kept secret in the industry

salim4n

12 days ago

I can't use DeepSeek R1 with hf inference npm , got this error :

ChatButton.tsx:120 Error: Error: Model deepseek-ai/DeepSeek-R1 is not supported for task conversational and provider together
    at mapModel (@huggingface_inference.js?v=2d418915:235:11)
    at makeRequestOptions (@huggingface_inference.js?v=2d418915:158:13)
    at streamingRequest (@huggingface_inference.js?v=2d418915:434:31)
    at streamingRequest.next (<anonymous>)
    at chatCompletionStream (@huggingface_inference.js?v=2d418915:933:10)
    at chatCompletionStream.next (<anonymous>)
    at handleSendMessage (ChatButton.tsx:94:21)

from code :

    const client = new HfInference(hf_token);
for await (const chunk of client.chatCompletionStream({
                model: "deepseek-ai/DeepSeek-R1",
                provider: "together",
                messages: [{ role: "user", content: input }],
                temperature: 0.5,
                stream: true,
            })) {
...

NejcSusec

12 days ago

Runware would be great. I can run all of my models with them.

Caith

12 days ago

runware currently costs the same as just the electric power to run a 4090 in germany xD

junyuh

12 days ago

•

edited 12 days ago

Add Novita AI pls!!

crazycowliu

12 days ago

I prefer Novita AI. Deepseek-r1 is currently more stable on Novita than on the official platform.

novita-ai

12 days ago

Novita AI!!

bvrv

12 days ago

fireworks, they are super-fast

kg-09

11 days ago

segmind.com is the best, cheap and fastest

lysoladmin

11 days ago

Would love to see beam.cloud!

hrmndev

11 days ago

AI/ML api is fast and cheap! It could be a great choice to add as a provider.

gulyasdavid1999

11 days ago

STOP LYING!!! THE FCKIN LAW SAYS THAT I'VE 1000 REQUESTS PER DAY AS A FREE USER, AND 20'000 PER DAY AS A PAID ONE... I'D JUST BECAME A PRO MEMBER, I'D PAY 9 FCKIN DOLLARS AS A POOR GUY, AND AFTER 400 REQUESTS OUT OF TWENTY THOUSAND(!!!), USING ONLY HF INT, THE SYSTEM LIES THAT I'D EXCEEDED MY MONTHLY QUOTA!!! (LITTLE MATH: THE MONTHLY QUOTA IS 600 THOUSAND REQUESTS(!!!), I'D BARELY USED THE HALF OF THE FREE MONTHLY QUOTA THIS MONTH (10'000)...) I'D SEARCHED GOOGLE, ASKED GPT, IT SAYS NOTHING(!!!) ABOUT QUOTA CHANGE FOR HF API... IT'S MY LIVING(!!!) THAT I GENERATE PICS BY SD 3.5, POST IT ON DEVIANTART, AND THEN I SELL IT AS NFTS TO BUYERS... YOU'RE RISKING MY LIVING WITH YOUR SILLY LITTLE GAME, HUGGINGFACE!!! I LOVE THIS PLATFORM, IT HAS THE MOST LIBERAL OF ALL, AND FOR MONTHS, I'D NO PROBLEMS... REALLY, WHY YOU WANT A HF-GATE?!?! CAN I TRUST YOU??? OR MONEY IS NOW YOUR GOD TOO, LIKE ALL THOSE MONEY HUNGRY SH*T SITES??? PLEASE, END THIS NIGHTMARE NOW!!!

bghira

11 days ago

This comment has been hidden

skypaintcondor

10 days ago

Are the use of the new Inference Providers through the HF proxy covered by the same privacy policy as Hugging Face directly? No logging beyond short technical reasons, and no use of input for training?

Yesseerr

10 days ago

add https://nineteen.ai/ , it's free

gulyasdavid1999

9 days ago

Yesseerr, I'm happy that freedom is more important for you than money :)

Keffisor21

8 days ago

Add RunPod

bashbe

7 days ago

Not working with entreprise token in routed mode:

"You have exceeded your monthly included credits for Inference Endpoints. Subscribe to PRO to get 20x more monthly allowance."

bharatcoder

7 days ago

Is there a page, where I can filter out models being supported by different providers? Currently one needs to go inside the model card, of every model to find out.

gulyasdavid1999

7 days ago

HuggingFace, please, provide as a documentation, that exactly tells us how much an Interference costs. (Example: Stable Diffusion HF Interference 0.001$/image etc.)

Moibe

7 days ago

I have been using this new feature with different providers to run Flux and I really like it a lot! But today february 4 it changed from 20,000 daily inferences to $2 dollars montly (200 inferences). I know its fair, and I actually don't use 20,000 inferences, not even 20 daily. But the change from having 20,000 daily inferences to having 6 daily, is shocking, you could have make it gradually or not even giving 20,000 daily since the beggining. It's hard to asimilate that sudden change 😭

julien-c

Hugging Face org 5 days ago

HuggingFace, please, provide as a documentation, that exactly tells us how much an Interference costs. (Example: Stable Diffusion HF Interference 0.001$/image etc.)

this is in progress, currently working on it

julien-c

Hugging Face org 5 days ago

•

edited 5 days ago

@Moibe do you use HF-inference, or external providers?

bharatcoder

5 days ago

I have been using this new feature with different providers to run Flux and I really like it a lot! But today february 4 it changed from 20,000 daily inferences to $2 dollars montly (200 inferences). I know its fair, and I actually don't use 20,000 inferences, not even 20 daily. But the change from having 20,000 daily inferences to having 6 daily, is shocking, you could have make it gradually or not even giving 20,000 daily since the beggining. It's hard to asimilate that sudden change 😭

Right, it was a bit shocking. I would always like to use the HF-inference, (as my first choice). Initially I thought that those models that have HF-inference will be consuming from the existing quota (of 20,000). Only the 3rd party ones we have leverage of $2 for pro members. I agree, that 20,000 is bit too much, but the change is rather drastic.

julien-c

Hugging Face org 5 days ago

let me run the numbers, but for HF-inference the change should not be that drastic (most HF-inference requests are priced very very cheaply especially of course CPU based models)

The 20,000 daily figure was a bit unrealistic given it was "best-effort" meaning the rate of failures was quite high. Note that from now on, we exclude failing requests from those counts (we didn't up till now)

levalencia

5 days ago

Any inference provider you love, and that you'd like to be able to access directly from the Hub?

I tested it and I love it, super easy!

The first question from my company was:
With inference providers can we setup something like private endpoints?

Moibe

5 days ago

@Moibe do you use HF-inference, or external providers?

@julien-c yes I was using the hf-inference and usage was deducted from the 20,000. And about other providers, it allowed me to use them totally free (without substracting from the 20,000) and with a limited quota, meaning that I was able to make some usage of fal-ai until it messages that quota was excedded, then I used together, sambanova, etc. It was great beause it allowed me to test all provider even without having an account. I understand the change, and I think prices are still reasonable. I was just a shocking change, but I'm fine. I even got an account on fal.ai after the free test and I'm satisfied with the service amd the ease of use all from hf interface.

masatochi

4 days ago

Let's add nineteen.ai PLEAAASE

julien-c

Hugging Face org 4 days ago

With inference providers can we setup something like private endpoints?

@levalencia i don't think so but have you looked into Inference Endpoints? (it's dedicated instances)

gulyasdavid1999

1 day ago

THE HF INT API IS DENYING MY SERVICE BARKIN' "FAILED TO FETCH" (ILLEGAL EXCUSE!!!) (USED 0.20 OUT 2$ AS PRO)!!! SUPPORT, END THIS MADNESS NOW!!! TOMORROW, COURT AND PRESS, IF DON'T!!!

aramasethu

about 23 hours ago

Subject: Add new provider to the Inference Endpoint.

I work at https://docs.predictionguard.com/home/getting-started/welcome and would like to explore if it is possible to add Prediction Guard to the list of inference providers on Hugging Face. Let me know how to go about this.