Tien Dung

tiendung

AI & ML interests

None yet

Recent Activity

liked a Space 18 days ago
Qwen/QVQ-72B-preview
updated a Space about 2 months ago
Symato/tomtat
updated a Space about 2 months ago
Symato/tomtat
View all activity

Organizations

Symato Team's profile picture Tiny Monsters's profile picture Vietnamese Mistral's profile picture

tiendung's activity

reacted to singhsidhukuldeep's post with ๐Ÿ‘€ 3 months ago
view post
Post
2106
Exciting Research Alert: Revolutionizing Dense Passage Retrieval with Entailment Tuning!

The good folks at HKUST have developed a novel approach that significantly improves information retrieval by leveraging natural language inference.

The entailment tuning approach consists of several key steps to enhance dense passage retrieval performance.

Data Preparation
- Convert questions into existence claims using rule-based transformations.
- Combine retrieval data with NLI data from SNLI and MNLI datasets.
- Unify the format of both data types using a consistent prompting framework.

Entailment Tuning Process
- Initialize the model using pre-trained language models like BERT or RoBERTa.
- Apply aggressive masking (ฮฒ=0.8) specifically to the hypothesis components while preserving premise information.
- Train the model to predict the masked hypothesis tokens from the premise content.
- Run the training for 10 epochs using 8 GPUs, taking approximately 1.5-3.5 hours.

Training Arguments for Entailment Tuning (Yes! They Shared Them)
- Use a learning rate of 2e-5 with 100 warmup steps.
- Set batch size to 128.
- Apply weight decay of 0.01.
- Utilize the Adam optimizer with beta values (0.9, 0.999).
- Maintain maximum gradient norm at 1.0.

Deployment
- Index passages using FAISS for efficient retrieval.
- Shard vector store across multiple GPUs.
- Enable sub-millisecond retrieval of the top-100 passages per query.

Integration with Existing Systems
- Insert entailment tuning between pre-training and fine-tuning stages.
- Maintain compatibility with current dense retrieval methods.
- Preserve existing contrastive learning approaches during fine-tuning.

Simple, intuitive, and effective!

This advancement significantly improves the quality of retrieved passages for question-answering systems and retrieval-augmented generation tasks.
reacted to anakin87's post with ๐Ÿ‘€ 4 months ago
view post
Post
1107
Ok, you're finally convinced that synthetic data works... โš—๏ธ

๐๐จ๐ฐ ๐ฒ๐จ๐ฎ ๐ฐ๐š๐ง๐ญ ๐ญ๐จ ๐ ๐ž๐ง๐ž๐ซ๐š๐ญ๐ž ๐š๐ง ๐ข๐ง๐ฌ๐ญ๐ซ๐ฎ๐œ๐ญ๐ข๐จ๐ง ๐๐š๐ญ๐š๐ฌ๐ž๐ญ ๐Ÿ๐จ๐ซ ๐Ÿ๐ข๐ง๐ž-๐ญ๐ฎ๐ง๐ข๐ง๐  ๐ข๐ง ๐š ๐ฅ๐š๐ง๐ ๐ฎ๐š๐ ๐ž ๐จ๐ญ๐ก๐ž๐ซ ๐ญ๐ก๐š๐ง ๐„๐ง๐ ๐ฅ๐ข๐ฌ๐ก.
But how do you get started?

I explore how to do this with Magpie in my new article
https://huggingface.co/blog/anakin87/multilingual-magpie

---

๐Ÿฆโ€โฌ› ๐–๐ก๐š๐ญ ๐ข๐ฌ ๐Œ๐š๐ ๐ฉ๐ข๐ž?

It's a recent technique for creating synthetic instruction datasets.

Magpie is based on a simple but ingenious idea ๐Ÿ‘‡
if you prompt an instruction-tuned model with a pre-query template, you can make it generate a plausible user query/instruction

Here's an example:
model: Llama-3-8B-Instruct
pre-query template: "<|begin_of_text|><|start_header_id|>user<|end_header_id|>"
generated user instruction: "What are some of the responsibilities of a commercial pilot?"

You can then feed this instruction back into the same model to get the assistant response.

By repeating this process, it's possible to generate large synthetic datasets with relatively little effort.

๐Ÿช„ The authors demonstrate that using these datasets for Supervised Fine Tuning (SFT) can yield strong performance, even competitive with the original instruct model.


๐Ÿง—๐†๐ž๐ง๐ž๐ซ๐š๐ญ๐ข๐ง๐  ๐ง๐จ๐ง-๐„๐ง๐ ๐ฅ๐ข๐ฌ๐ก ๐๐š๐ญ๐š

Most Language Models are primarily trained on English texts, so they tend to produce data in English.

How can we overcome this?

Earlier approaches were complex or costly.

Then @mrm8488 found a simple solution: add the target language to the pre-query template.
For Spanish, the template becomes "<|begin_of_text|><|start_header_id|>user<|end_header_id|>spanish:".

This method works for Spanish and German!

โŒ Unfortunately, it does not work well for other languages (๐Ÿ‡ฎ๐Ÿ‡น, ๐Ÿ‡ณ๐Ÿ‡ฑ, ...)

๐Ÿ‘‡
  • 1 reply
ยท
reacted to ImranzamanML's post with ๐Ÿ‘€ 4 months ago
view post
Post
1289
Last Thursday at KaggleX organized by Google, I presented a workshop on "Unlocking the Power of Large Language Models (LLMs) for Business Applications" where I explained how we can reduce the size of LLM models to make them more suitable for business use and addressing common resource limitations.
https://drive.google.com/file/d/1p5sT4_DeyBuwCqmYt4dCJKZOgLMpESzR/view
reacted to davidberenstein1957's post with โž•โค๏ธ 4 months ago
view post
Post
1694
You can now build a custom text classifier without days of human labeling!

๐Ÿ‘ LLMs work reasonably well as text classifiers.
๐Ÿ‘Ž They are expensive to run at scale and their performance drops in specialized domains.

๐Ÿ‘ Purpose-built classifiers have low latency and can potentially run on CPU.
๐Ÿ‘Ž They require labeled training data.

Combine the best of both worlds: the automatic labeling capabilities of LLMs and the high-quality annotations from human experts to train and deploy a specialized model.

Blog: https://huggingface.co/blog/sdiazlor/custom-text-classifier-ai-human-feedback
reacted to m-ric's post with ๐Ÿ˜Ž๐Ÿ‘ 4 months ago
view post
Post
1287
๐—”๐—ฑ๐—ฑ ๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ ๐—ต๐—ถ๐—ด๐—ต๐—น๐—ถ๐—ด๐—ต๐˜๐—ถ๐—ป๐—ด ๐˜๐—ผ ๐˜†๐—ผ๐˜‚๐—ฟ ๐—ฅ๐—”๐—š ๐˜€๐˜†๐˜€๐˜๐—ฒ๐—บ! ๐Ÿ“„๐Ÿ’ก

RAG systems are supposed to make your LLM's answer more trustworthy, by inserting in the prompt some supporting documents from a knowledge base : we say that we're "adding some context".

๐Ÿ‘Ž But if you don't know which part of the answer has been generated based on which input tokens, it's hard to tell wether it was effectively grounded in the context knowledge or not!

๐Ÿค” I've been working on the question: is it possible to add notes to the answer linking to which part of the context they're generated from?

And I've found a great solution: a great technique called Layer-wise Relevance Propagation (LRP), showcased in a paper at ICML `24 by Reduan Achtibat et al allows, allows to precisely score how important each input token was in generating your output! They've made it into a library called LXT.

๐Ÿ“Š For each generated output token, LXT gives you attribution scores for each input token.

โš™๏ธ So I've worked a bit more on aggregating these scores into meaningful spans between successive input and output tokens, and I finally obtained my desired result: RAG with source highlighting!

Try the demo here ๐Ÿ‘‰ m-ric/rag_highlights

Caveats:
- It slows down generation (for now quite a lot, could hopefully be reduced a lot)
- For now it supports only specific models: Llama models and Mixtral

If there's enough interest in this solution, I can improve it further and spin it off into a specific library for RAG! ๐Ÿš€
posted an update 4 months ago
view post
Post
1192
ICML 2024 Tutorial: Physics of Language Models
https://www.youtube.com/watch?v=yBL7J0kgldU
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction (2309.14316)

Series bร i nรณi vแป viแป‡c hiแปƒu cรกch LLM hoแบกt ฤ‘แป™ng. Rแบฅt thรบ vแป‹, hแป lร m thรญ nghiแป‡m kiแปƒm soรกt 100% cรกch huแบฅn luyแป‡n model vร  phรกt hiแป‡n rแบฑng nแบฟu pretrain khรดng chแปฉa dแบกng dแปฏ liแป‡u extraction (QA instruction, hoแบทc cรกc dแบกng dแปฏ liแป‡u mร  tรกc giแบฃ gแปi lร  knowledge augmentation) thรฌ mแบทc dรน cรณ qua instruct finetune thรฌ LLM cลฉng khรดng thแปƒ hแปc skill knowledge extraction. => ฤ‘แบทt lแบกi cรขu hแปi liแป‡u cรกch pretrain rแป“i mแป›i SFT nhฦฐ hiแป‡n tแบกi ฤ‘รฃ thแปฑc sแปฑ tแป‘t chฦฐa?

Hแป ฤ‘รฃ thแปญ vร i trฤƒm thรญ nghiแป‡m vแป›i cรกc loแบกi kiแบฟn trรบc mรด hรฌnh, ฤ‘แป™ to nhแป, ... vร  ฤ‘แปu ra kแบฟt quแบฃ nhฦฐ nhau.

KNOWLEDGE AUGMENTATION (data augmentation)
Nแบฟu bแบกn khรดng mix instruct data vแป›i pre-train data (mix training) tแป‘t nhแบฅt hรฃy รกp dแปฅng knowledge augmentation. Tแปฉc lร  cรนng mแป™t cรขu ฤ‘รณ nhฦฐng diแป…n tแบฃ lแบกi bแบฑng nhiแปu cรกch khรกc nhau.

KNOWLEDGE MANIPULATION
vรญ dแปฅ giแบฃ sแปญ ฤ‘รฃ biแบฟt (ฤ‘c huแบฅn luyแป‡n) tiแปƒu sแปญ cแปงa A (bao gแป“m ngร y thรกng nฤƒm sinh) vร  hแปi A sinh thรกng chแบตn hay lแบป (50% cฦก hแป™i trแบฃ lแปi ฤ‘รบng). Nแบฟu khรดng sแปญ dแปฅng CoT (gแปฃi nhแป› lแบกi kiแบฟn thแปฉc, xem A sinh thรกng mแบฅy) thรฌ kแบฟt quแบฃ lร  model khรดng lร m ฤ‘ฦฐแปฃc. => CoT (gแปฃi nhแป› kiแบฟn thแปฉc ฤ‘รฃ hแปc) rแบฅt quan trแปng vแป›i knowledge manipulation (phรขn loแบกi, so sรกnh, xแบฟp hแบกng ...)
  • 1 reply
ยท
reacted to alielfilali01's post with โค๏ธ 12 months ago
view post
Post
Hi friends, i'am happy to share with you all a tool that i built a week ago or so, i'am talking here about the "LLM Training Cost Calculator" - a handy tool now available on Hugging Face Spaces! This interactive Gradio app provides an easy-to-use interface for estimating the training costs of large language models (LLMs).

(I've been asked to provide a report about the cost of finetuning each model etc... so i decided to do the lazy job and build a tool for it, Prof later can choose whatever config he likes ๐Ÿ˜†)

๐Ÿ” But Why this is important?
As LLMs continue to grow in size and complexity, understanding the computational and financial requirements is crucial for planning and managing AI projects. I believe this tool simplifies this process, giving you insights into potential expenses based on the number of parameters and tokens in your dataset.

๐ŸŒŸ Features:
- Input the number of parameters (in billions) and tokens (in trillions).
- Adjust for GPU utilization rates and overhead costs.
- Get an instant estimate of your training costs.
+ Choose your GPU (A100 80GB PCle, A100 80GB SXM, V100, H100 SXM, H100 PCle)

๐Ÿ“ˆ Coming Soon:
Plans are in place to expand the calculator's capabilities to include fine-tuning costs for models using LoRA or QLoRA. You'll be able to input a model ID from the Hugging Face Hub, select your fine-tuning strategy, and specify quantization details if using QLoRA.

I believe this tool will be a valuable asset to the AI community, helping to plan and allocate resources more effectively ๐Ÿค—.

Should you have any suggestions or feedback, please don't hesitate to contribute your thoughts in the comments below. Together, we can refine and enhance this resource for all.

๐Ÿ”— Try it here : https://huggingface.co/spaces/Ali-C137/LLM-Training-Cost-Calculator

PS : All thanks to Gradio, Hugging Face and the community ofc ๐Ÿ”ฅ ๐Ÿ˜‰
reacted to macadeliccc's post with โค๏ธ 12 months ago
view post
Post
Reducing perplexity in LLM's through layer selective rank reduction

Layer-Selective Rank Reduction (LASER) is a denoising method that improves reasoning by the strategic removal of higher-order components from weight matrices in the multi-layer perceptron (MLP) layers without the need for additional parameters or training data. This process leverages singular value decomposition to identify and eliminate these components. This simple, yet effective, method has shown to improve question-answering performance by up to 27.4 percentage points.

LaserRMT implements this through a process by calculating signal to noise ratio (SNR) for each layer and selectively reducing the rank of these layers.The SNR method meticulously computes the SNR by leveraging singular value decomposition (SVD) to separate the signal (higher-order components) from the noise (lower-order components) within the weight matrices of the model's layers. The SNR calculation is what determines which layers would benefit from rank reduction without compromising the models integrity.

If a layer is identified that could benefit from rank reduction, then the layer will enter an incremental process where the weight matrices are reduced and reconstructed by retaining only the singular values that surpass the threshold. In the case of laserRMT, the threshold is calculated by Marchenko-Pastur Law.
@staticmethod
    def marchenko_pastur_threshold(sigma, n, m):
        beta = n / m if n < m else m / n
        threshold = sigma * np.sqrt((1 + np.sqrt(beta))**2)
        return thr

The two primary benefits of applying this method are reducing computational overhead of large language models and simultaneously improving output quality.

Credit to @ehartford @fernandofernandes @DavidGF for laserRMT

Resources:
โ˜„๏ธ AutoLaser: https://colab.research.google.com/drive/11j0e-w6BfvqeFN1gUrpOqdW0vcKqfVqP?usp=sharing
laserRMT: https://github.com/cognitivecomputations/laserRMT
The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction (2312.13558)
ยท