distilabel-internal-testing

community
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

distilabel-internal-testing's activity

davidberenstein1957 
posted an update about 6 hours ago
davidberenstein1957 
posted an update 1 day ago
davidberenstein1957 
posted an update 2 days ago
davidberenstein1957 
posted an update 7 days ago
view post
Post
1532
tldr; Parquet is awesome, DuckDB too!

Datasets on the Hugging Face Hub rely on parquet files. We can interact with these files using DuckDB as a fast in-memory database system. One of DuckDB’s features is vector similarity search which can be used with or without an index.

blog:
https://huggingface.co/learn/cookbook/vector_search_with_hub_as_backend
davidberenstein1957 
posted an update 10 days ago
burtenshaw 
posted an update 10 days ago
view post
Post
2796
Manic few days in open source AI, with game changing development all over the place. Here's a round up of the resources:

- The science team at @huggingface reproduced and open source the seek r1. https://github.com/huggingface/open-r1
- @qwen released a series of models with 1 million token context! https://qwenlm.github.io/blog/qwen2.5-1m/
- SmolVLM got even smaller with completely new variants at 256m and 500m https://huggingface.co/blog/smolervlm

There's so much you could do with these developments. Especially combining them together into agentic applications or fine-tuning them on your use case.
  • 1 reply
·
burtenshaw 
posted an update 13 days ago
view post
Post
1034
Hey 👋

I'm helping out on some community research to learn about the AI community. If you want to join in the conversation, head over here where I started a community discussion on the most influential model since BERT.

OSAIResearchCommunity/README#2
burtenshaw 
posted an update 13 days ago
view post
Post
1703
📣 Teachers and Students! Here's a handy quiz app if you're preparing your own study material.

TLDR, It's a quiz that uses a dataset to make questions and save answers

Here's how it works:

- make a dataset of multiple choice questions
- duplicate the space add set the dataset repo
- log in and do the quiz
- submit the questions to create a new dataset

I made this to get ready for the agents course, but I hope it's useful for you projects too!

quiz app burtenshaw/dataset_quiz

dataset with questions burtenshaw/exam_questions

agents course we're working on https://huggingface.co/agents-course
burtenshaw 
posted an update 14 days ago
view post
Post
2320
AI was built on side projects!