Ilyas Moutawwakil's picture

Ilyas Moutawwakil

IlyasMoutawwakil

AI & ML interests

Optimization, LLMs, Hardware, Backends, ..

Recent Activity

updated a dataset 6 days ago
optimum-benchmark/cuda
updated a dataset 6 days ago
optimum-benchmark/cpu
updated a dataset 6 days ago
optimum-benchmark/misc-windows-latest-3.8
View all activity

Organizations

Hugging Face's profile picture Training Transformers Together's profile picture OpenVINO Toolkit's profile picture HugGAN Community's profile picture ONNXConfig for all's profile picture Hugging Face Optimum's profile picture HuggingFaceM4's profile picture Hugging Face H4's profile picture Optimum AMD's profile picture That Time I got Reincarnated as a Hugging Face Organization's profile picture AI Energy Score's profile picture Optimum-Benchmark's profile picture Social Post Explorers's profile picture Dev Mode Explorers's profile picture Optimum-Intel's profile picture Hugging Face Machine Learning Optimization's profile picture Optimum Internal Testing's profile picture

IlyasMoutawwakil's activity

reacted to clem's post with ๐Ÿš€ 8 months ago
view post
Post
3674
Who said you couldn't build a big business based on open-source AI? Congrats Mistral team: https://huggingface.co/mistralai
reacted to merve's post with ๐Ÿค— 8 months ago
view post
Post
6073
Fine-tune Florence-2 on any task ๐Ÿ”ฅ

Today we release a notebook and a walkthrough blog on fine-tuning Florence-2 on DocVQA dataset @andito @SkalskiP

Blog: https://huggingface.co/blog ๐Ÿ“•
Notebook: https://colab.research.google.com/drive/1hKDrJ5AH_o7I95PtZ9__VlCTNAo1Gjpf?usp=sharing ๐Ÿ“–
Florence-2 is a great vision-language model thanks to it's massive dataset and small size!

This model requires conditioning through task prefixes and it's not as generalist, requiring fine-tuning on a new task, such as DocVQA ๐Ÿ“

We have fine-tuned the model on A100 (and one can also use a smaller GPU with smaller batch size) and saw that model picks up new tasks ๐Ÿฅน

See below how it looks like before and after FT ๐Ÿคฉ
Play with the demo here andito/Florence-2-DocVQA ๐Ÿ„โ€โ™€๏ธ
reacted to their post with ๐Ÿง ๐Ÿš€ 8 months ago
view post
Post
4031
Last week, Intel's new Xeon CPUs, Sapphire Rapids (SPR), landed on Inference Endpoints and I think they got the potential to reduce the cost of your RAG pipelines ๐Ÿ’ธ

Why ? Because they come with Intelยฎ AMX support, which is a set of instructions that support and accelerate BF16 and INT8 matrix multiplications on CPU โšก

I went ahead and built a Space to showcase how to efficiently deploy embedding models on SPR for both Retrieving and Ranking documents, with Haystack compatible components: https://huggingface.co/spaces/optimum-intel/haystack-e2e

Here's how it works:

- Document Store: A FAISS document store containing the seven-wonders dataset, embedded, indexed and stored on the Space's persistent storage to avoid unnecessary re-computation of embeddings.

- Retriever: It embeds the query at runtime and retrieves from the dataset N documents that are most semantically similar to the query's embedding.
We use the small variant of the BGE family here because we want a model that's fast to run on the entire dataset and has a small embedding space for fast similarity search. Specifically we use an INT8 quantized bge-small-en-v1.5, deployed on an Intel Sapphire Rapids CPU instance.

- Ranker: It re-embeds the retrieved documents at runtime and re-ranks them based on semantic similarity to the query's embedding. We use the large variant of the BGE family here because it's optimized for accuracy allowing us to filter the most relevant k documents that we'll use in the LLM prompt. Specifically we use an INT8 quantized bge-large-en-v1.5, deployed on an Intel Sapphire Rapids CPU instance.

Space: https://huggingface.co/spaces/optimum-intel/haystack-e2e
Retriever IE: optimum-intel/fastrag-retriever
Ranker IE: optimum-intel/fastrag-ranker
posted an update 8 months ago
view post
Post
4031
Last week, Intel's new Xeon CPUs, Sapphire Rapids (SPR), landed on Inference Endpoints and I think they got the potential to reduce the cost of your RAG pipelines ๐Ÿ’ธ

Why ? Because they come with Intelยฎ AMX support, which is a set of instructions that support and accelerate BF16 and INT8 matrix multiplications on CPU โšก

I went ahead and built a Space to showcase how to efficiently deploy embedding models on SPR for both Retrieving and Ranking documents, with Haystack compatible components: https://huggingface.co/spaces/optimum-intel/haystack-e2e

Here's how it works:

- Document Store: A FAISS document store containing the seven-wonders dataset, embedded, indexed and stored on the Space's persistent storage to avoid unnecessary re-computation of embeddings.

- Retriever: It embeds the query at runtime and retrieves from the dataset N documents that are most semantically similar to the query's embedding.
We use the small variant of the BGE family here because we want a model that's fast to run on the entire dataset and has a small embedding space for fast similarity search. Specifically we use an INT8 quantized bge-small-en-v1.5, deployed on an Intel Sapphire Rapids CPU instance.

- Ranker: It re-embeds the retrieved documents at runtime and re-ranks them based on semantic similarity to the query's embedding. We use the large variant of the BGE family here because it's optimized for accuracy allowing us to filter the most relevant k documents that we'll use in the LLM prompt. Specifically we use an INT8 quantized bge-large-en-v1.5, deployed on an Intel Sapphire Rapids CPU instance.

Space: https://huggingface.co/spaces/optimum-intel/haystack-e2e
Retriever IE: optimum-intel/fastrag-retriever
Ranker IE: optimum-intel/fastrag-ranker
reacted to Molbap's post with ๐Ÿคฏ๐Ÿค—๐Ÿš€๐Ÿ”ฅ 10 months ago
view post
Post
5152
๐Ÿš€๐Ÿš€ Exciting times for the document AI community!

We're thrilled to announce the release of some of the largest OCR datasets available to the public.
๐Ÿ”ฅ With over 26 million pages , 18 billion text tokens, and 6TB of data, these resources are a significant leap forward for document AI research.

Here's how to access these datasets quickly:

from datasets import load_dataset

pdfa_dataset = load_dataset('pixparse/pdfa-eng-wds', streaming=True)
IDL_dataset = load_dataset('pixparse/idl-wds', streaming=True)

This enables you to stream them directly, integrating seamlessly with your projects using the Hugging Face datasets library. On the hub, you can find them here:

pixparse/pdfa-eng-wds
pixparse/idl-wds

For lean data loading, the new [chug](https://github.com/huggingface/chug) library offers a solution with pdf decoding:


import chug

task_cfg = chug.DataTaskDocReadCfg(
    page_sampling='all',
)
data_cfg = chug.DataCfg(
    source='pixparse/pdfa-eng-wds',
    split='train',
    batch_size=None,
    format='hfids',
    num_workers=0,
)
data_loader = chug.create_loader(
    data_cfg,
    task_cfg,
)
sample = next(iter(data_loader))



We owe a huge thank you to Peter Wyatt, Kate Tasker, Rachel Taketa, Ali Furkan Biten, Ruben Tito, and their colleagues for their contributions. Their work putting these datasets together has been invaluable. ๐Ÿค—

Looking Ahead:

We're on a mission to enhance document AI capabilities, and these datasets are just the beginning. With your engagement and innovation, we're confident in the community's ability to develop robust OCR solutions. We encourage you to explore these datasets, experiment with the code, and contribute to the collective progress in document AI.

For detailed information on usage and licensing, please refer to the dataset cards on the Hugging Face hub.
ยท
replied to Molbap's post 10 months ago
view reply

This is so cool and the kind of AI many industries are need for !

reacted to akhaliq's post with ๐Ÿ‘ about 1 year ago
view post
Post
Here is my selection of papers for today (9 Jan)

https://huggingface.co/papers

AGG: Amortized Generative 3D Gaussians for Single Image to 3D

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

DiarizationLM: Speaker Diarization Post-Processing with Large Language Models

TeleChat Technical Report

Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon

AST-T5: Structure-Aware Pretraining for Code Generation and Understanding

Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach

Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

Mixtral of Experts