[FEEDBACK] Inference Providers
Any inference provider you love, and that you'd like to be able to access directly from the Hub?
Love that I can call DeepSeek R1 directly from the Hub ๐ฅ
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="together",
api_key="xxxxxxxxxxxxxxxxxxxxxxxx"
)
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
completion = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1",
messages=messages,
max_tokens=500
)
print(completion.choices[0].message)
Is it possible to set a monthly payment budget or rate limits for all the external providers? I don't see such options in billings tab. In case a key is or session token is stolen, it can be quite dangerous to my thin wallet:(
@benhaotang you already get spending notifications when crossing important thresholds ($10, $100, $1,000) but we'll add spending limits in the future
@benhaotang you already get spending notifications when crossing important thresholds ($10, $100, $1,000) but we'll add spending limits in the future
Thanks for your quick reply, good to know!
Would be great if you could add Nebius AI Studio to the list :) New inference provider on the market, with the absolute cheapest prices and the highest rate limits...
Could be good to add featherless.ai
TitanML !!
OpenRouter!
Hi everyone and first of all I really thank huggingface team for releasing this feature.
One another suggestion that I need is :
to give more granular parameters for the user to share the deployment configuration file (as an terraform hcl configuration) in order to define more optimum infrastructure for inference and let them define the inference model.
requesty.ai
groq
TypeError: InferenceClient.__init__()
got an unexpected keyword argument 'provider'
Add RunPod as an inference provider!
Hyperbolic!
Runpod!
@llamameta
please, make sure you use the latest version of huggingface_hub
pip install --upgrade huggingface_hub
Hello. Congratulations on great feature. How we can proceed on adding deepinfra.com to the providers list?
That's a great news, and if you provide more measurable details on what's free inference with a small quota
is about for signed-in free users, you'll build more trust around the community, hence more PRO subscribers!
requesty.ai !
Runpod is excellent.
Let's add it to inference provider list
runpod!
Runpod would be very welcome!
Yes, Runpod.io would be great indeed
is there a way to add https://nineteen.ai/ as a provider ? it allows free access to top models like Deepseek R1 & the llama family
there is also : https://chutes.ai/ which can allow users to deploy any model on demand
Can we Add https://avian.io as a provider @julien-c
Currently have the fastest inference on Nvidia hardware, as well as highest throughput, allowing users to deploy any model on demand
ai/ml api - dudes have a lot of models from HF!
try Runware.
Runware plz โ it's ~5x cheaper than other providers and still one of the fastest
Try Scaleway.
Runware is best kept secret in the industry
I can't use DeepSeek R1 with hf inference npm , got this error :
ChatButton.tsx:120 Error: Error: Model deepseek-ai/DeepSeek-R1 is not supported for task conversational and provider together
at mapModel (@huggingface_inference.js?v=2d418915:235:11)
at makeRequestOptions (@huggingface_inference.js?v=2d418915:158:13)
at streamingRequest (@huggingface_inference.js?v=2d418915:434:31)
at streamingRequest.next (<anonymous>)
at chatCompletionStream (@huggingface_inference.js?v=2d418915:933:10)
at chatCompletionStream.next (<anonymous>)
at handleSendMessage (ChatButton.tsx:94:21)
from code :
const client = new HfInference(hf_token);
for await (const chunk of client.chatCompletionStream({
model: "deepseek-ai/DeepSeek-R1",
provider: "together",
messages: [{ role: "user", content: input }],
temperature: 0.5,
stream: true,
})) {
...
Runware would be great. I can run all of my models with them.
runware currently costs the same as just the electric power to run a 4090 in germany xD
Add Novita AI pls!!
I prefer Novita AI. Deepseek-r1 is currently more stable on Novita than on the official platform.
Novita AI!!
fireworks, they are super-fast
segmind.com is the best, cheap and fastest
Would love to see beam.cloud!
AI/ML api is fast and cheap! It could be a great choice to add as a provider.
STOP LYING!!! THE FCKIN LAW SAYS THAT I'VE 1000 REQUESTS PER DAY AS A FREE USER, AND 20'000 PER DAY AS A PAID ONE... I'D JUST BECAME A PRO MEMBER, I'D PAY 9 FCKIN DOLLARS AS A POOR GUY, AND AFTER 400 REQUESTS OUT OF TWENTY THOUSAND(!!!), USING ONLY HF INT, THE SYSTEM LIES THAT I'D EXCEEDED MY MONTHLY QUOTA!!! (LITTLE MATH: THE MONTHLY QUOTA IS 600 THOUSAND REQUESTS(!!!), I'D BARELY USED THE HALF OF THE FREE MONTHLY QUOTA THIS MONTH (10'000)...) I'D SEARCHED GOOGLE, ASKED GPT, IT SAYS NOTHING(!!!) ABOUT QUOTA CHANGE FOR HF API... IT'S MY LIVING(!!!) THAT I GENERATE PICS BY SD 3.5, POST IT ON DEVIANTART, AND THEN I SELL IT AS NFTS TO BUYERS... YOU'RE RISKING MY LIVING WITH YOUR SILLY LITTLE GAME, HUGGINGFACE!!! I LOVE THIS PLATFORM, IT HAS THE MOST LIBERAL OF ALL, AND FOR MONTHS, I'D NO PROBLEMS... REALLY, WHY YOU WANT A HF-GATE?!?! CAN I TRUST YOU??? OR MONEY IS NOW YOUR GOD TOO, LIKE ALL THOSE MONEY HUNGRY SH*T SITES??? PLEASE, END THIS NIGHTMARE NOW!!!
Are the use of the new Inference Providers through the HF proxy covered by the same privacy policy as Hugging Face directly? No logging beyond short technical reasons, and no use of input for training?
Yesseerr, I'm happy that freedom is more important for you than money :)
Add RunPod
Not working with entreprise token in routed mode:
"You have exceeded your monthly included credits for Inference Endpoints. Subscribe to PRO to get 20x more monthly allowance."
Is there a page, where I can filter out models being supported by different providers? Currently one needs to go inside the model card, of every model to find out.
HuggingFace, please, provide as a documentation, that exactly tells us how much an Interference costs. (Example: Stable Diffusion HF Interference 0.001$/image etc.)
I have been using this new feature with different providers to run Flux and I really like it a lot! But today february 4 it changed from 20,000 daily inferences to $2 dollars montly (200 inferences). I know its fair, and I actually don't use 20,000 inferences, not even 20 daily. But the change from having 20,000 daily inferences to having 6 daily, is shocking, you could have make it gradually or not even giving 20,000 daily since the beggining. It's hard to asimilate that sudden change ๐ญ
HuggingFace, please, provide as a documentation, that exactly tells us how much an Interference costs. (Example: Stable Diffusion HF Interference 0.001$/image etc.)
this is in progress, currently working on it
I have been using this new feature with different providers to run Flux and I really like it a lot! But today february 4 it changed from 20,000 daily inferences to $2 dollars montly (200 inferences). I know its fair, and I actually don't use 20,000 inferences, not even 20 daily. But the change from having 20,000 daily inferences to having 6 daily, is shocking, you could have make it gradually or not even giving 20,000 daily since the beggining. It's hard to asimilate that sudden change ๐ญ
Right, it was a bit shocking. I would always like to use the HF-inference, (as my first choice). Initially I thought that those models that have HF-inference will be consuming from the existing quota (of 20,000). Only the 3rd party ones we have leverage of $2 for pro members. I agree, that 20,000 is bit too much, but the change is rather drastic.
let me run the numbers, but for HF-inference the change should not be that drastic (most HF-inference requests are priced very very cheaply especially of course CPU based models)
The 20,000 daily figure was a bit unrealistic given it was "best-effort" meaning the rate of failures was quite high. Note that from now on, we exclude failing requests from those counts (we didn't up till now)
Any inference provider you love, and that you'd like to be able to access directly from the Hub?
I tested it and I love it, super easy!
The first question from my company was:
With inference providers can we setup something like private endpoints?
@Moibe do you use HF-inference, or external providers?
@julien-c yes I was using the hf-inference and usage was deducted from the 20,000. And about other providers, it allowed me to use them totally free (without substracting from the 20,000) and with a limited quota, meaning that I was able to make some usage of fal-ai until it messages that quota was excedded, then I used together, sambanova, etc. It was great beause it allowed me to test all provider even without having an account. I understand the change, and I think prices are still reasonable. I was just a shocking change, but I'm fine. I even got an account on fal.ai after the free test and I'm satisfied with the service amd the ease of use all from hf interface.
Let's add nineteen.ai PLEAAASE
With inference providers can we setup something like private endpoints?
@levalencia i don't think so but have you looked into Inference Endpoints? (it's dedicated instances)
THE HF INT API IS DENYING MY SERVICE BARKIN' "FAILED TO FETCH" (ILLEGAL EXCUSE!!!) (USED 0.20 OUT 2$ AS PRO)!!! SUPPORT, END THIS MADNESS NOW!!! TOMORROW, COURT AND PRESS, IF DON'T!!!
Subject: Add new provider to the Inference Endpoint.
I work at https://docs.predictionguard.com/home/getting-started/welcome and would like to explore if it is possible to add Prediction Guard to the list of inference providers on Hugging Face. Let me know how to go about this.