jpacifico
/

Chocolatine-Cook-3B-combined-SFT-DPO-v0.1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

jpacifico commited on Nov 24, 2024

Commit

7f79223

·

verified ·

1 Parent(s): 27a963c

Update README.md

Files changed (1) hide show

README.md +38 -6

README.md CHANGED Viewed

@@ -29,21 +29,53 @@ The recommended usage is by loading the low-rank adapter using unsloth:
 ```python
 from unsloth import FastLanguageModel
-model_name = "jpacifico/Chocolatine-Cook-3B-combined-SFT-DPO-v0.1"
 model, tokenizer = FastLanguageModel.from_pretrained(
-  model_name = model_name,
-  max_seq_length = 2048,
-  dtype = None,
-  load_in_4bit = True,
 )
 FastLanguageModel.for_inference(model)
 ```
 ### Limitations
-The Chocolatine model is a quick demonstration that a base model can be easily fine-tuned to achieve compelling performance.
 It does not have any moderation mechanism.
 - **Developed by:** Jonathan Pacifico, 2024

 ```python
 from unsloth import FastLanguageModel
+from transformers import TextStreamer
+import torch
+model_name = "jpacifico/final_model_combined_sft_dpo"
 model, tokenizer = FastLanguageModel.from_pretrained(
+    model_name,
+    max_seq_length=2048,
+    dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
+    load_in_4bit=False
 )
 FastLanguageModel.for_inference(model)
+model.eval()
+def generate_response(user_question: str):
+    messages = [
+        {"role": "system", "content": "Tu es un assistant IA spécialisé dans le langage culinaire français. Une question te sera posée. Tu dois générer une réponse précise et concise."},
+        {"role": "user", "content": "En cuisine "+user_question},
+    ]
+    inputs = tokenizer.apply_chat_template(
+        messages,
+        tokenize=True,
+        add_generation_prompt=True,
+        return_tensors="pt",
+    ).to("cuda")
+    attention_mask = (inputs != tokenizer.pad_token_id).long()
+    text_streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
+    with torch.no_grad():
+        _ = model.generate(
+            input_ids=inputs,
+            attention_mask=attention_mask,
+            max_new_tokens=128,
+            use_cache=True,
+            streamer=text_streamer,
+            do_sample=False,
+            temperature=0.7,
+        )
 ```
 ### Limitations
+The Chocolatine model series is a quick demonstration that a base model can be easily fine-tuned to achieve compelling performance.
 It does not have any moderation mechanism.
 - **Developed by:** Jonathan Pacifico, 2024