AI & ML interests

None defined yet.

Recent Activity

argilla-warehouse's activity

davidberenstein1957ย 
posted an update about 5 hours ago
davidberenstein1957ย 
posted an update 1 day ago
davidberenstein1957ย 
posted an update 2 days ago
davidberenstein1957ย 
posted an update 7 days ago
view post
Post
1532
tldr; Parquet is awesome, DuckDB too!

Datasets on the Hugging Face Hub rely on parquet files. We can interact with these files using DuckDB as a fast in-memory database system. One of DuckDBโ€™s features is vector similarity search which can be used with or without an index.

blog:
https://huggingface.co/learn/cookbook/vector_search_with_hub_as_backend
davidberenstein1957ย 
posted an update 10 days ago
burtenshawย 
posted an update 10 days ago
view post
Post
2794
Manic few days in open source AI, with game changing development all over the place. Here's a round up of the resources:

- The science team at @huggingface reproduced and open source the seek r1. https://github.com/huggingface/open-r1
- @qwen released a series of models with 1 million token context! https://qwenlm.github.io/blog/qwen2.5-1m/
- SmolVLM got even smaller with completely new variants at 256m and 500m https://huggingface.co/blog/smolervlm

There's so much you could do with these developments. Especially combining them together into agentic applications or fine-tuning them on your use case.
  • 1 reply
ยท
burtenshawย 
posted an update 13 days ago
view post
Post
1034
Hey ๐Ÿ‘‹

I'm helping out on some community research to learn about the AI community. If you want to join in the conversation, head over here where I started a community discussion on the most influential model since BERT.

OSAIResearchCommunity/README#2
burtenshawย 
posted an update 13 days ago
view post
Post
1703
๐Ÿ“ฃ Teachers and Students! Here's a handy quiz app if you're preparing your own study material.

TLDR, It's a quiz that uses a dataset to make questions and save answers

Here's how it works:

- make a dataset of multiple choice questions
- duplicate the space add set the dataset repo
- log in and do the quiz
- submit the questions to create a new dataset

I made this to get ready for the agents course, but I hope it's useful for you projects too!

quiz app burtenshaw/dataset_quiz

dataset with questions burtenshaw/exam_questions

agents course we're working on https://huggingface.co/agents-course
burtenshawย 
posted an update 14 days ago
view post
Post
2320
AI was built on side projects!
anditoย 
posted an update 14 days ago
view post
Post
1508
๐—œ๐—ป๐˜๐—ฟ๐—ผ๐—ฑ๐˜‚๐—ฐ๐—ถ๐—ป๐—ด ๐˜๐—ต๐—ฒ ๐˜„๐—ผ๐—ฟ๐—น๐—ฑ'๐˜€ ๐˜€๐—บ๐—ฎ๐—น๐—น๐—ฒ๐˜€๐˜ ๐˜ƒ๐—ถ๐˜€๐—ถ๐—ผ๐—ป ๐—น๐—ฎ๐—ป๐—ด๐˜‚๐—ฎ๐—ด๐—ฒ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น!

Weโ€™re thrilled to share ๐—ฆ๐—บ๐—ผ๐—น๐—ฉ๐—Ÿ๐—  (256M & 500M)โ€”the smallest Visual Language Models ever built. Think: running on <1GB of GPU memoryโ€”you can fine-tune it on your laptop and run it on your toaster!

Why Itโ€™s Game-Changing:
- ๐—ข๐˜‚๐˜๐—ฝ๐—ฒ๐—ฟ๐—ณ๐—ผ๐—ฟ๐—บ๐˜€ ๐—Ÿ๐—ฎ๐—ฟ๐—ด๐—ฒ๐—ฟ ๐— ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€: Even the 256M model surpasses our SOTA 80B-parameter model from just 17 months ago. Over 300x reduction!
๐— ๐—ถ๐—ด๐—ต๐˜๐˜† ๐—˜๐—ณ๐—ณ๐—ถ๐—ฐ๐—ถ๐—ฒ๐—ป๐—ฐ๐˜†: The 256M version delivers 80% of our 2.2B modelโ€™s performance, and the 500M version hits 90%
๐—Ÿ๐—ถ๐—ด๐—ต๐˜๐—ป๐—ถ๐—ป๐—ด-๐—™๐—ฎ๐˜€๐˜ ๐—ฆ๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต: SmolVLM integrates with ColiPali for state-of-the-art retrieval speedsโ€”on par with models 10x bigger. That means cheaper, faster indexing and real-world impact.

Whatโ€™s New Under the Hood:
- ๐—ก๐—ฒ๐˜„ ๐—ฉ๐—ถ๐˜€๐—ถ๐—ผ๐—ป ๐—˜๐—ป๐—ฐ๐—ผ๐—ฑ๐—ฒ๐—ฟ: Smaller overall size (400M -> 93M), but with higher resolution.
- ๐—›๐—ถ๐—ด๐—ต๐—ฒ๐—ฟ ๐—ฃ๐—ถ๐˜…๐—ฒ๐—น๐˜€/๐—ง๐—ผ๐—ธ๐—ฒ๐—ป: 4096 vs. 1820โ€”more efficient image processing.
- ๐—ฆ๐—บ๐—ฎ๐—ฟ๐˜ ๐—ง๐—ผ๐—ธ๐—ฒ๐—ป๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป: Faster training and a performance boost.

Check our blog: https://huggingface.co/blog/smolervlm
The models: HuggingFaceTB/smolvlm-256m-and-500m-6791fafc5bb0ab8acc960fb0
The demo: HuggingFaceTB/SmolVLM-256M-Demo
  • 1 reply
ยท
burtenshawย 
posted an update 15 days ago
view post
Post
3699
๐Ÿšง Work in Progress! ๐Ÿšง

๐Ÿ‘ทโ€โ™€๏ธ We're working hard on getting the official agents course ready for the 50,000 students that have signed up.

If you want to contribute to the discussion, I started these community posts. Looking forward to hearing from you:

- smolagents unit in the agents course - agents-course/README#7
- LlamaIndex Unit in the agents course - agents-course/README#6
- LangChain and LangGraph unit in the agents course - agents-course/README#5
- Real world use cases in the agents course - agents-course/README#8


davidberenstein1957ย 
posted an update 16 days ago