Spaces:

argilla
/

synthetic-data-generator

Running

App Files Files Community

Sara Han

davidberenstein1957 HF staff commited on 8 days ago

Commit

3b7b628

unverified ·

1 Parent(s): b2669f7

feat: different model completion (#31)

Browse files

* feat: use different models for instruction and completion

* refactor to support specific completion model

* improve comments

* add tokenizer_id

* make improvements and fix structured generation for other providers

* fix temperature issue

* Update src/synthetic_dataset_generator/constants.py

Co-authored-by: David Berenstein <[email protected]>

* apply feedback

* merging fix

---------

Co-authored-by: David Berenstein <[email protected]>

Files changed (12) hide show

README.md +3 -1
examples/hf-serverless-deployment.py +1 -1
examples/hf-serverless-different-model-for-completion.py +16 -0
examples/ollama-different-model-for-completion.py +26 -0
src/synthetic_dataset_generator/apps/chat.py +36 -5
src/synthetic_dataset_generator/apps/rag.py +23 -3
src/synthetic_dataset_generator/apps/textcat.py +1 -1
src/synthetic_dataset_generator/constants.py +47 -23
src/synthetic_dataset_generator/pipelines/base.py +35 -13
src/synthetic_dataset_generator/pipelines/chat.py +6 -6
src/synthetic_dataset_generator/pipelines/rag.py +1 -1
src/synthetic_dataset_generator/pipelines/textcat.py +1 -1

README.md CHANGED Viewed

@@ -86,12 +86,14 @@ You can set the following environment variables to customize the generation proc
 Optionally, you can use different API providers and models.
 - `MODEL`: The model to use for generating the dataset, e.g. `meta-llama/Meta-Llama-3.1-8B-Instruct`, `gpt-4o`, `llama3.1`.
-- `API_KEY`: The API key to use for the generation API, e.g. `hf_...`, `sk-...`. If not provided, it will default to the provided `HF_TOKEN` environment variable.
 - `OPENAI_BASE_URL`: The base URL for any OpenAI compatible API, e.g. `https://api.openai.com/v1/`.
 - `OLLAMA_BASE_URL`: The base URL for any Ollama compatible API, e.g. `http://127.0.0.1:11434/`.
 - `HUGGINGFACE_BASE_URL`: The base URL for any Hugging Face compatible API, e.g. TGI server or Dedicated Inference Endpoints. If you want to use serverless inference, only set the `MODEL`.
 - `VLLM_BASE_URL`: The base URL for any VLLM compatible API, e.g. `http://localhost:8000/`.
 SFT and Chat Data generation is not supported with OpenAI Endpoints. Additionally, you need to configure it per model family based on their prompt templates using the right `TOKENIZER_ID` and `MAGPIE_PRE_QUERY_TEMPLATE` environment variables.
 - `TOKENIZER_ID`: The tokenizer ID to use for the magpie pipeline, e.g. `meta-llama/Meta-Llama-3.1-8B-Instruct`.

 Optionally, you can use different API providers and models.
 - `MODEL`: The model to use for generating the dataset, e.g. `meta-llama/Meta-Llama-3.1-8B-Instruct`, `gpt-4o`, `llama3.1`.
+- `API_KEY`: The API key to use for the generation API, e.g. `hf_...`, `sk-...`. If not provided, it will default to the `HF_TOKEN` environment variable.
 - `OPENAI_BASE_URL`: The base URL for any OpenAI compatible API, e.g. `https://api.openai.com/v1/`.
 - `OLLAMA_BASE_URL`: The base URL for any Ollama compatible API, e.g. `http://127.0.0.1:11434/`.
 - `HUGGINGFACE_BASE_URL`: The base URL for any Hugging Face compatible API, e.g. TGI server or Dedicated Inference Endpoints. If you want to use serverless inference, only set the `MODEL`.
 - `VLLM_BASE_URL`: The base URL for any VLLM compatible API, e.g. `http://localhost:8000/`.
+To use a specific model exclusively for generating completions, set the corresponding environment variables by appending `_COMPLETION` to the ones mentioned earlier. For example, you can use `MODEL_COMPLETION` and `OPENAI_BASE_URL_COMPLETION`.
 SFT and Chat Data generation is not supported with OpenAI Endpoints. Additionally, you need to configure it per model family based on their prompt templates using the right `TOKENIZER_ID` and `MAGPIE_PRE_QUERY_TEMPLATE` environment variables.
 - `TOKENIZER_ID`: The tokenizer ID to use for the magpie pipeline, e.g. `meta-llama/Meta-Llama-3.1-8B-Instruct`.

examples/hf-serverless-deployment.py CHANGED Viewed

@@ -9,7 +9,7 @@ import os
 from synthetic_dataset_generator import launch
 os.environ["HF_TOKEN"] = "hf_..."  # push the data to huggingface
-os.environ["MODEL"] = "meta-llama/Llama-3.1-8B-Instruct"  # use instruct model
 os.environ["MAGPIE_PRE_QUERY_TEMPLATE"] = "llama3"  # use the template for the model
 launch()

 from synthetic_dataset_generator import launch
 os.environ["HF_TOKEN"] = "hf_..."  # push the data to huggingface
+os.environ["MODEL"] = "meta-llama/Llama-3.1-8B-Instruct"  # use model for generation
 os.environ["MAGPIE_PRE_QUERY_TEMPLATE"] = "llama3"  # use the template for the model
 launch()

examples/hf-serverless-different-model-for-completion.py ADDED Viewed

	@@ -0,0 +1,16 @@

+# /// script
+# requires-python = ">=3.11,<3.12"
+# dependencies = [
+#     "synthetic-dataset-generator",
+# ]
+# ///
+import os
+from synthetic_dataset_generator import launch
+os.environ["HF_TOKEN"] = "hf_..."  # push the data to huggingface
+os.environ["MODEL"] = "meta-llama/Llama-3.1-8B-Instruct"  # use model for instruction generation
+os.environ["MODEL_COMPLETION"] = "meta-llama/Llama-3.1-70B-Instruct"  # use model for completion generation
+os.environ["MAGPIE_PRE_QUERY_TEMPLATE"] = "llama3"  # use the template for the model
+launch()

examples/ollama-different-model-for-completion.py ADDED Viewed

	@@ -0,0 +1,26 @@

+# /// script
+# requires-python = ">=3.11,<3.12"
+# dependencies = [
+#     "synthetic-dataset-generator",
+# ]
+# ///
+# ollama serve
+# ollama run llama3.2
+# ollama run llama3.2:1b
+import os
+from synthetic_dataset_generator import launch
+os.environ["OLLAMA_BASE_URL"] = (
+    "http://127.0.0.1:11434/"  # in this case, the same base url for both models
+)
+os.environ["MODEL"] = "llama3.2" # model for instruction generation
+os.environ["MODEL_COMPLETION"] = "llama3.2:1b" # model for completion generation
+os.environ["TOKENIZER_ID"] = "meta-llama/Llama-3.2-1B-Instruct" # tokenizer for instruction generation
+os.environ["TOKENIZER_ID_COMPLETION"] = "meta-llama/Llama-3.2-3B-Instruct" # tokenizer for completion generation
+os.environ["MAGPIE_PRE_QUERY_TEMPLATE"] = "llama3" # magpie template required for instruction generation
+launch()

src/synthetic_dataset_generator/apps/chat.py CHANGED Viewed

@@ -28,6 +28,7 @@ from synthetic_dataset_generator.constants import (
     BASE_URL,
     DEFAULT_BATCH_SIZE,
     MODEL,
     SFT_AVAILABLE,
 )
 from synthetic_dataset_generator.pipelines.base import get_rewritten_prompts
@@ -148,6 +149,7 @@ def generate_dataset_from_prompt(
     num_turns: int = 1,
     num_rows: int = 10,
     temperature: float = 0.9,
     is_sample: bool = False,
     progress=gr.Progress(),
 ) -> pd.DataFrame:
@@ -155,7 +157,10 @@ def generate_dataset_from_prompt(
     progress(0.0, desc="(1/2) Generating instructions")
     magpie_generator = get_magpie_generator(num_turns, temperature, is_sample)
     response_generator = get_response_generator(
-        system_prompt, num_turns, temperature, is_sample
     )
     total_steps: int = num_rows * 2
     batch_size = DEFAULT_BATCH_SIZE
@@ -266,6 +271,7 @@ def generate_dataset_from_seed(
     num_turns: int = 1,
     num_rows: int = 10,
     temperature: float = 0.9,
     is_sample: bool = False,
     progress=gr.Progress(),
 ) -> pd.DataFrame:
@@ -278,13 +284,18 @@ def generate_dataset_from_seed(
         temperature=temperature, is_sample=is_sample
     )
     response_generator = get_response_generator(
-        system_prompt=None, num_turns=1, temperature=temperature, is_sample=is_sample
     )
     follow_up_generator_instruction = get_follow_up_generator(
         type="instruction", temperature=temperature, is_sample=is_sample
     )
     follow_up_generator_response = get_follow_up_generator(
-        type="response", temperature=temperature, is_sample=is_sample
     )
     steps = 2 * num_turns
     total_steps: int = num_rows * steps
@@ -402,6 +413,7 @@ def generate_dataset(
     num_turns: int = 1,
     num_rows: int = 10,
     temperature: float = 0.9,
     is_sample: bool = False,
     progress=gr.Progress(),
 ) -> pd.DataFrame:
@@ -411,6 +423,7 @@ def generate_dataset(
             num_turns=num_turns,
             num_rows=num_rows,
             temperature=temperature,
             is_sample=is_sample,
         )
     else:
@@ -420,6 +433,7 @@ def generate_dataset(
             num_turns=num_turns,
             num_rows=num_rows,
             temperature=temperature,
             is_sample=is_sample,
         )
     return dataframe
@@ -468,6 +482,7 @@ def push_dataset(
     num_turns: int = 1,
     num_rows: int = 10,
     temperature: float = 0.9,
     pipeline_code: str = "",
     oauth_token: Union[gr.OAuthToken, None] = None,
     progress=gr.Progress(),
@@ -491,6 +506,7 @@ def push_dataset(
         num_turns=num_turns,
         num_rows=num_rows,
         temperature=temperature,
     )
     push_dataset_to_hub(
         dataframe=dataframe,
@@ -651,6 +667,11 @@ def hide_pipeline_code_visibility():
     return {pipeline_code_ui: gr.Accordion(visible=False)}
 ######################
 # Gradio UI
 ######################
@@ -808,11 +829,20 @@ with gr.Blocks() as app:
                     temperature = gr.Slider(
                         label="Temperature",
                         minimum=0.1,
-                        maximum=1,
                         value=0.9,
                         step=0.1,
                         interactive=True,
                     )
                     private = gr.Checkbox(
                         label="Private dataset",
                         value=False,
@@ -944,6 +974,7 @@ with gr.Blocks() as app:
             num_turns,
             num_rows,
             temperature,
             pipeline_code,
         ],
         outputs=[success_message],
@@ -976,7 +1007,7 @@ with gr.Blocks() as app:
         inputs=[dataframe],
         outputs=[system_prompt, document_column, num_turns, dataframe],
     )
     app.load(fn=swap_visibility, outputs=main_ui)
     app.load(fn=get_org_dropdown, outputs=[org_name])
     app.load(fn=get_random_repo_name, outputs=[repo_name])

     BASE_URL,
     DEFAULT_BATCH_SIZE,
     MODEL,
+    MODEL_COMPLETION,
     SFT_AVAILABLE,
 )
 from synthetic_dataset_generator.pipelines.base import get_rewritten_prompts
     num_turns: int = 1,
     num_rows: int = 10,
     temperature: float = 0.9,
+    temperature_completion: Union[float, None] = None,
     is_sample: bool = False,
     progress=gr.Progress(),
 ) -> pd.DataFrame:
     progress(0.0, desc="(1/2) Generating instructions")
     magpie_generator = get_magpie_generator(num_turns, temperature, is_sample)
     response_generator = get_response_generator(
+        system_prompt=system_prompt,
+        num_turns=num_turns,
+        temperature=temperature or temperature_completion,
+        is_sample=is_sample,
     )
     total_steps: int = num_rows * 2
     batch_size = DEFAULT_BATCH_SIZE
     num_turns: int = 1,
     num_rows: int = 10,
     temperature: float = 0.9,
+    temperature_completion: Union[float, None] = None,
     is_sample: bool = False,
     progress=gr.Progress(),
 ) -> pd.DataFrame:
         temperature=temperature, is_sample=is_sample
     )
     response_generator = get_response_generator(
+        system_prompt=None,
+        num_turns=1,
+        temperature=temperature or temperature_completion,
+        is_sample=is_sample,
     )
     follow_up_generator_instruction = get_follow_up_generator(
         type="instruction", temperature=temperature, is_sample=is_sample
     )
     follow_up_generator_response = get_follow_up_generator(
+        type="response",
+        temperature=temperature or temperature_completion,
+        is_sample=is_sample,
     )
     steps = 2 * num_turns
     total_steps: int = num_rows * steps
     num_turns: int = 1,
     num_rows: int = 10,
     temperature: float = 0.9,
+    temperature_completion: Union[float, None] = None,
     is_sample: bool = False,
     progress=gr.Progress(),
 ) -> pd.DataFrame:
             num_turns=num_turns,
             num_rows=num_rows,
             temperature=temperature,
+            temperature_completion=temperature_completion,
             is_sample=is_sample,
         )
     else:
             num_turns=num_turns,
             num_rows=num_rows,
             temperature=temperature,
+            temperature_completion=temperature_completion,
             is_sample=is_sample,
         )
     return dataframe
     num_turns: int = 1,
     num_rows: int = 10,
     temperature: float = 0.9,
+    temperature_completion: Union[float, None] = None,
     pipeline_code: str = "",
     oauth_token: Union[gr.OAuthToken, None] = None,
     progress=gr.Progress(),
         num_turns=num_turns,
         num_rows=num_rows,
         temperature=temperature,
+        temperature_completion=temperature_completion
     )
     push_dataset_to_hub(
         dataframe=dataframe,
     return {pipeline_code_ui: gr.Accordion(visible=False)}
+def show_temperature_completion():
+    if MODEL != MODEL_COMPLETION:
+        return {temperature_completion: gr.Slider(value=0.9, visible=True)}
 ######################
 # Gradio UI
 ######################
                     temperature = gr.Slider(
                         label="Temperature",
                         minimum=0.1,
+                        maximum=1.5,
                         value=0.9,
                         step=0.1,
                         interactive=True,
                     )
+                    temperature_completion = gr.Slider(
+                        label="Temperature for completion",
+                        minimum=0.1,
+                        maximum=1.5,
+                        value=None,
+                        step=0.1,
+                        interactive=True,
+                        visible=False,
+                    )
                     private = gr.Checkbox(
                         label="Private dataset",
                         value=False,
             num_turns,
             num_rows,
             temperature,
+            temperature_completion,
             pipeline_code,
         ],
         outputs=[success_message],
         inputs=[dataframe],
         outputs=[system_prompt, document_column, num_turns, dataframe],
     )
     app.load(fn=swap_visibility, outputs=main_ui)
     app.load(fn=get_org_dropdown, outputs=[org_name])
     app.load(fn=get_random_repo_name, outputs=[repo_name])
+    app.load(fn=show_temperature_completion, outputs=[temperature_completion])

src/synthetic_dataset_generator/apps/rag.py CHANGED Viewed

@@ -24,7 +24,7 @@ from synthetic_dataset_generator.apps.base import (
     validate_argilla_user_workspace_dataset,
     validate_push_to_hub,
 )
-from synthetic_dataset_generator.constants import DEFAULT_BATCH_SIZE
 from synthetic_dataset_generator.pipelines.base import get_rewritten_prompts
 from synthetic_dataset_generator.pipelines.embeddings import (
     get_embeddings,
@@ -132,6 +132,7 @@ def generate_dataset(
     reranking: bool = False,
     num_rows: int = 10,
     temperature: float = 0.7,
     is_sample: bool = False,
     progress=gr.Progress(),
 ):
@@ -155,7 +156,7 @@ def generate_dataset(
         is_sample=is_sample,
     )
     response_generator = get_response_generator(
-        temperature=temperature, is_sample=is_sample
     )
     if reranking:
         reranking_generator = get_sentence_pair_generator(
@@ -320,6 +321,7 @@ def push_dataset(
     retrieval_reranking: list[str],
     num_rows: int,
     temperature: float,
     pipeline_code: str,
     oauth_token: Union[gr.OAuthToken, None] = None,
     progress=gr.Progress(),
@@ -347,6 +349,8 @@ def push_dataset(
         reranking=reranking,
         num_rows=num_rows,
         temperature=temperature,
     )
     push_dataset_to_hub(
         dataframe, org_name, repo_name, oauth_token, private, pipeline_code
@@ -512,6 +516,11 @@ def hide_pipeline_code_visibility():
     return {pipeline_code_ui: gr.Accordion(visible=False)}
 ######################
 # Gradio UI
 ######################
@@ -645,11 +654,20 @@ with gr.Blocks() as app:
                 temperature = gr.Slider(
                     label="Temperature",
                     minimum=0.1,
-                    maximum=1,
                     value=0.7,
                     step=0.1,
                     interactive=True,
                 )
                 private = gr.Checkbox(
                     label="Private dataset",
                     value=False,
@@ -779,6 +797,7 @@ with gr.Blocks() as app:
             retrieval_reranking,
             num_rows,
             temperature,
             pipeline_code,
         ],
         outputs=[success_message],
@@ -815,3 +834,4 @@ with gr.Blocks() as app:
     app.load(fn=swap_visibility, outputs=main_ui)
     app.load(fn=get_org_dropdown, outputs=[org_name])
     app.load(fn=get_random_repo_name, outputs=[repo_name])

     validate_argilla_user_workspace_dataset,
     validate_push_to_hub,
 )
+from synthetic_dataset_generator.constants import DEFAULT_BATCH_SIZE, MODEL, MODEL_COMPLETION
 from synthetic_dataset_generator.pipelines.base import get_rewritten_prompts
 from synthetic_dataset_generator.pipelines.embeddings import (
     get_embeddings,
     reranking: bool = False,
     num_rows: int = 10,
     temperature: float = 0.7,
+    temperature_completion: Union[float, None] = None,
     is_sample: bool = False,
     progress=gr.Progress(),
 ):
         is_sample=is_sample,
     )
     response_generator = get_response_generator(
+        temperature = temperature_completion or temperature , is_sample=is_sample
     )
     if reranking:
         reranking_generator = get_sentence_pair_generator(
     retrieval_reranking: list[str],
     num_rows: int,
     temperature: float,
+    temperature_completion: float,
     pipeline_code: str,
     oauth_token: Union[gr.OAuthToken, None] = None,
     progress=gr.Progress(),
         reranking=reranking,
         num_rows=num_rows,
         temperature=temperature,
+        temperature_completion=temperature_completion,
+        is_sample=True,
     )
     push_dataset_to_hub(
         dataframe, org_name, repo_name, oauth_token, private, pipeline_code
     return {pipeline_code_ui: gr.Accordion(visible=False)}
+def show_temperature_completion():
+    if MODEL != MODEL_COMPLETION:
+        return {temperature_completion: gr.Slider(value=0.9, visible=True)}
 ######################
 # Gradio UI
 ######################
                 temperature = gr.Slider(
                     label="Temperature",
                     minimum=0.1,
+                    maximum=1.5,
                     value=0.7,
                     step=0.1,
                     interactive=True,
                 )
+                temperature_completion = gr.Slider(
+                    label="Temperature for completion",
+                    minimum=0.1,
+                    maximum=1.5,
+                    value=None,
+                    step=0.1,
+                    interactive=True,
+                    visible=False,
+                )
                 private = gr.Checkbox(
                     label="Private dataset",
                     value=False,
             retrieval_reranking,
             num_rows,
             temperature,
+            temperature_completion,
             pipeline_code,
         ],
         outputs=[success_message],
     app.load(fn=swap_visibility, outputs=main_ui)
     app.load(fn=get_org_dropdown, outputs=[org_name])
     app.load(fn=get_random_repo_name, outputs=[repo_name])
+    app.load(fn=show_temperature_completion, outputs=[temperature_completion])

src/synthetic_dataset_generator/apps/textcat.py CHANGED Viewed

@@ -532,7 +532,7 @@ with gr.Blocks() as app:
                 temperature = gr.Slider(
                     label="Temperature",
                     minimum=0.1,
-                    maximum=1,
                     value=0.8,
                     step=0.1,
                     interactive=True,

                 temperature = gr.Slider(
                     label="Temperature",
                     minimum=0.1,
+                    maximum=1.5,
                     value=0.8,
                     step=0.1,
                     interactive=True,

src/synthetic_dataset_generator/constants.py CHANGED Viewed

@@ -3,10 +3,6 @@ import warnings
 import argilla as rg
-# Tasks
-TEXTCAT_TASK = "text_classification"
-SFT_TASK = "supervised_fine_tuning"
 # Inference
 MAX_NUM_TOKENS = int(os.getenv("MAX_NUM_TOKENS", 2048))
 MAX_NUM_ROWS = int(os.getenv("MAX_NUM_ROWS", 1000))
@@ -20,28 +16,56 @@ OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL")
 HUGGINGFACE_BASE_URL = os.getenv("HUGGINGFACE_BASE_URL")
 VLLM_BASE_URL = os.getenv("VLLM_BASE_URL")
-# check if model is set correctly
-if HUGGINGFACE_BASE_URL and MODEL:
-    raise ValueError(
-        "`HUGGINGFACE_BASE_URL` and `MODEL` cannot be set at the same time. Use a model id for serverless inference and a base URL dedicated to Hugging Face Inference Endpoints."
-    )
-if not MODEL:
-    if OPENAI_BASE_URL or OLLAMA_BASE_URL or VLLM_BASE_URL:
-        raise ValueError("`MODEL` is not set. Please provide a model id for inference.")
-# Check if multiple base URLs are provided
-base_urls = [
-    url
-    for url in [OPENAI_BASE_URL, OLLAMA_BASE_URL, HUGGINGFACE_BASE_URL, VLLM_BASE_URL]
-    if url
 ]
-if len(base_urls) > 1:
-    raise ValueError(
-        f"Multiple base URLs provided: {', '.join(base_urls)}. Only one base URL can be set at a time."
-    )
-BASE_URL = OPENAI_BASE_URL or OLLAMA_BASE_URL or HUGGINGFACE_BASE_URL or VLLM_BASE_URL
 # API Keys
 HF_TOKEN = os.getenv("HF_TOKEN")
 if not HF_TOKEN:

 import argilla as rg
 # Inference
 MAX_NUM_TOKENS = int(os.getenv("MAX_NUM_TOKENS", 2048))
 MAX_NUM_ROWS = int(os.getenv("MAX_NUM_ROWS", 1000))
 HUGGINGFACE_BASE_URL = os.getenv("HUGGINGFACE_BASE_URL")
 VLLM_BASE_URL = os.getenv("VLLM_BASE_URL")
+# Just used in case of selecting a different model for completions
+MODEL_COMPLETION = os.getenv("MODEL_COMPLETION", MODEL)
+TOKENIZER_ID_COMPLETION = os.getenv("TOKENIZER_ID_COMPLETION", TOKENIZER_ID)
+OPENAI_BASE_URL_COMPLETION = os.getenv("OPENAI_BASE_URL_COMPLETION", OPENAI_BASE_URL)
+OLLAMA_BASE_URL_COMPLETION = os.getenv("OLLAMA_BASE_URL_COMPLETION", OLLAMA_BASE_URL)
+HUGGINGFACE_BASE_URL_COMPLETION = os.getenv(
+    "HUGGINGFACE_BASE_URL_COMPLETION", HUGGINGFACE_BASE_URL
+)
+VLLM_BASE_URL_COMPLETION = os.getenv("VLLM_BASE_URL_COMPLETION", VLLM_BASE_URL)
+base_urls = [OPENAI_BASE_URL, OLLAMA_BASE_URL, HUGGINGFACE_BASE_URL, VLLM_BASE_URL]
+base_urls_completion = [
+    OPENAI_BASE_URL_COMPLETION,
+    OLLAMA_BASE_URL_COMPLETION,
+    HUGGINGFACE_BASE_URL_COMPLETION,
+    VLLM_BASE_URL_COMPLETION,
 ]
+# Validate the configuration of the model and base URLs.
+def validate_configuration(base_urls, model, env_context=""):
+    huggingface_url = base_urls[2]
+    if huggingface_url and model:
+        raise ValueError(
+            f"`HUGGINGFACE_BASE_URL{env_context}` and `MODEL{env_context}` cannot be set at the same time. "
+            "Use a model id for serverless inference and a base URL dedicated to Hugging Face Inference Endpoints."
+        )
+    if not model and any(base_urls):
+        raise ValueError(
+            f"`MODEL{env_context}` is not set. Please provide a model id for inference."
+        )
+    active_urls = [url for url in base_urls if url]
+    if len(active_urls) > 1:
+        raise ValueError(
+            f"Multiple base URLs are provided: {', '.join(active_urls)}. "
+            "Only one base URL can be set at a time."
+        )
+validate_configuration(base_urls, MODEL)
+validate_configuration(base_urls_completion, MODEL_COMPLETION, "_COMPLETION")
+BASE_URL = OPENAI_BASE_URL or OLLAMA_BASE_URL or HUGGINGFACE_BASE_URL or VLLM_BASE_URL
+BASE_URL_COMPLETION = (
+    OPENAI_BASE_URL_COMPLETION
+    or OLLAMA_BASE_URL_COMPLETION
+    or HUGGINGFACE_BASE_URL_COMPLETION
+    or VLLM_BASE_URL_COMPLETION
+)
 # API Keys
 HF_TOKEN = os.getenv("HF_TOKEN")
 if not HF_TOKEN:

src/synthetic_dataset_generator/pipelines/base.py CHANGED Viewed

@@ -8,11 +8,17 @@ from synthetic_dataset_generator.constants import (
     API_KEYS,
     DEFAULT_BATCH_SIZE,
     HUGGINGFACE_BASE_URL,
     MODEL,
     OLLAMA_BASE_URL,
     OPENAI_BASE_URL,
     TOKENIZER_ID,
     VLLM_BASE_URL,
 )
 TOKEN_INDEX = 0
@@ -73,12 +79,20 @@ def _get_llm_class() -> str:
         return "InferenceEndpointsLLM"
-def _get_llm(use_magpie_template=False, **kwargs):
     if OPENAI_BASE_URL:
         llm = OpenAILLM(
-            model=MODEL,
-            base_url=OPENAI_BASE_URL,
             api_key=_get_next_api_key(),
             **kwargs,
         )
         if "generation_kwargs" in kwargs:
@@ -108,19 +122,25 @@ def _get_llm(use_magpie_template=False, **kwargs):
             kwargs["generation_kwargs"] = {}
             kwargs["generation_kwargs"]["options"] = options
         llm = OllamaLLM(
-            model=MODEL,
-            host=OLLAMA_BASE_URL,
-            tokenizer_id=TOKENIZER_ID or MODEL,
             use_magpie_template=use_magpie_template,
             **kwargs,
         )
     elif HUGGINGFACE_BASE_URL:
         kwargs["generation_kwargs"]["do_sample"] = True
         llm = InferenceEndpointsLLM(
             api_key=_get_next_api_key(),
-            base_url=HUGGINGFACE_BASE_URL,
-            tokenizer_id=TOKENIZER_ID or MODEL,
             use_magpie_template=use_magpie_template,
             **kwargs,
         )
     elif VLLM_BASE_URL:
@@ -128,19 +148,21 @@ def _get_llm(use_magpie_template=False, **kwargs):
             if "do_sample" in kwargs["generation_kwargs"]:
                 del kwargs["generation_kwargs"]["do_sample"]
         llm = ClientvLLM(
-            base_url=VLLM_BASE_URL,
-            model=MODEL,
-            tokenizer=TOKENIZER_ID or MODEL,
             api_key=_get_next_api_key(),
             use_magpie_template=use_magpie_template,
             **kwargs,
         )
     else:
         llm = InferenceEndpointsLLM(
             api_key=_get_next_api_key(),
-            tokenizer_id=TOKENIZER_ID or MODEL,
-            model_id=MODEL,
             use_magpie_template=use_magpie_template,
             **kwargs,
         )

     API_KEYS,
     DEFAULT_BATCH_SIZE,
     HUGGINGFACE_BASE_URL,
+    HUGGINGFACE_BASE_URL_COMPLETION,
     MODEL,
+    MODEL_COMPLETION,
     OLLAMA_BASE_URL,
+    OLLAMA_BASE_URL_COMPLETION,
     OPENAI_BASE_URL,
+    OPENAI_BASE_URL_COMPLETION,
     TOKENIZER_ID,
+    TOKENIZER_ID_COMPLETION,
     VLLM_BASE_URL,
+    VLLM_BASE_URL_COMPLETION,
 )
 TOKEN_INDEX = 0
         return "InferenceEndpointsLLM"
+def _get_llm(
+    structured_output: dict = None,
+    use_magpie_template: str = False,
+    is_completion: bool = False,
+    **kwargs,
+):
+    model = MODEL_COMPLETION if is_completion else MODEL
+    tokenizer_id = TOKENIZER_ID_COMPLETION if is_completion else TOKENIZER_ID or model
     if OPENAI_BASE_URL:
         llm = OpenAILLM(
+            model=model,
+            base_url=OPENAI_BASE_URL_COMPLETION if is_completion else OPENAI_BASE_URL,
             api_key=_get_next_api_key(),
+            structured_output=structured_output,
             **kwargs,
         )
         if "generation_kwargs" in kwargs:
             kwargs["generation_kwargs"] = {}
             kwargs["generation_kwargs"]["options"] = options
         llm = OllamaLLM(
+            model=model,
+            host=OLLAMA_BASE_URL_COMPLETION if is_completion else OLLAMA_BASE_URL,
+            tokenizer_id=tokenizer_id,
             use_magpie_template=use_magpie_template,
+            structured_output=structured_output,
             **kwargs,
         )
     elif HUGGINGFACE_BASE_URL:
         kwargs["generation_kwargs"]["do_sample"] = True
         llm = InferenceEndpointsLLM(
             api_key=_get_next_api_key(),
+            base_url=(
+                HUGGINGFACE_BASE_URL_COMPLETION
+                if is_completion
+                else HUGGINGFACE_BASE_URL
+            ),
+            tokenizer_id=tokenizer_id,
             use_magpie_template=use_magpie_template,
+            structured_output=structured_output,
             **kwargs,
         )
     elif VLLM_BASE_URL:
             if "do_sample" in kwargs["generation_kwargs"]:
                 del kwargs["generation_kwargs"]["do_sample"]
         llm = ClientvLLM(
+            base_url=VLLM_BASE_URL_COMPLETION if is_completion else VLLM_BASE_URL,
+            model=model,
+            tokenizer=tokenizer_id,
             api_key=_get_next_api_key(),
             use_magpie_template=use_magpie_template,
+            structured_output=structured_output,
             **kwargs,
         )
     else:
         llm = InferenceEndpointsLLM(
             api_key=_get_next_api_key(),
+            tokenizer_id=tokenizer_id,
+            model_id=model,
             use_magpie_template=use_magpie_template,
+            structured_output=structured_output,
             **kwargs,
         )

src/synthetic_dataset_generator/pipelines/chat.py CHANGED Viewed

@@ -245,7 +245,7 @@ def get_response_generator(
             "max_new_tokens": 256 if is_sample else int(MAX_NUM_TOKENS * 0.5),
         }
         response_generator = TextGeneration(
-            llm=_get_llm(generation_kwargs=generation_kwargs),
             system_prompt=system_prompt,
             output_mappings={"generation": "completion"},
             input_mappings={"instruction": "prompt"},
@@ -256,7 +256,7 @@ def get_response_generator(
             "max_new_tokens": MAX_NUM_TOKENS,
         }
         response_generator = ChatGeneration(
-            llm=_get_llm(generation_kwargs=generation_kwargs),
             output_mappings={"generation": "completion"},
             input_mappings={"conversation": "messages"},
         )
@@ -281,7 +281,7 @@ def get_follow_up_generator(type: str, temperature: float, is_sample: bool):
             "max_new_tokens": MAX_NUM_TOKENS,
         }
         follow_up_generator = ChatGeneration(
-            llm=_get_llm(generation_kwargs=generation_kwargs),
         )
     follow_up_generator.load()
     return follow_up_generator
@@ -336,7 +336,7 @@ def generate_pipeline_code_seed(
 # Requirements: `pip install distilabel[hf-inference-endpoints]`
 from distilabel.models import {_get_llm_class()}
 from distilabel.pipeline import Pipeline
-from distilabel.steps import KeepColumns{", LoadDataFromDicts" if input_type != "dataset-input"  else ""}{", LoadDataFromHub" if input_type == "dataset-input" else ""}
 from distilabel.steps.tasks import GenerateSentencePair, TextGeneration {", ChatGeneration" if num_turns > 1 else ""}
 """
@@ -455,10 +455,10 @@ with Pipeline(name="sft") as pipeline:
         keep_columns = KeepColumns(columns=["messages"])
         """
         code += "load_the_dataset >> instruction_generator >> response_generator >> prepare_messages"
         for i in range(1, num_turns + 1):
             code += f" >> follow_up_instruction_{i} >> format_instruction_{i} >> follow_up_response_{i} >> format_response_{i}"
         code += " >> keep_columns"
     code += """

             "max_new_tokens": 256 if is_sample else int(MAX_NUM_TOKENS * 0.5),
         }
         response_generator = TextGeneration(
+            llm=_get_llm(is_completion=True, generation_kwargs=generation_kwargs),
             system_prompt=system_prompt,
             output_mappings={"generation": "completion"},
             input_mappings={"instruction": "prompt"},
             "max_new_tokens": MAX_NUM_TOKENS,
         }
         response_generator = ChatGeneration(
+            llm=_get_llm(is_completion=True, generation_kwargs=generation_kwargs),
             output_mappings={"generation": "completion"},
             input_mappings={"conversation": "messages"},
         )
             "max_new_tokens": MAX_NUM_TOKENS,
         }
         follow_up_generator = ChatGeneration(
+            llm=_get_llm(is_completion=True, generation_kwargs=generation_kwargs),
         )
     follow_up_generator.load()
     return follow_up_generator
 # Requirements: `pip install distilabel[hf-inference-endpoints]`
 from distilabel.models import {_get_llm_class()}
 from distilabel.pipeline import Pipeline
+from distilabel.steps import KeepColumns{", LoadDataFromDicts" if input_type != "dataset-input"  else ""}{", LoadDataFromHub" if input_type == "dataset-input" else ""}{", StepInput, step" if num_turns > 1 else ""}
 from distilabel.steps.tasks import GenerateSentencePair, TextGeneration {", ChatGeneration" if num_turns > 1 else ""}
 """
         keep_columns = KeepColumns(columns=["messages"])
         """
         code += "load_the_dataset >> instruction_generator >> response_generator >> prepare_messages"
         for i in range(1, num_turns + 1):
             code += f" >> follow_up_instruction_{i} >> format_instruction_{i} >> follow_up_response_{i} >> format_response_{i}"
         code += " >> keep_columns"
     code += """

src/synthetic_dataset_generator/pipelines/rag.py CHANGED Viewed

@@ -121,7 +121,7 @@ def get_response_generator(temperature: float, is_sample: bool):
         "max_new_tokens": MAX_NUM_TOKENS if is_sample else 256,
     }
     text_generator = TextGeneration(
-        llm=_get_llm(generation_kwargs=generation_kwargs),
         system_prompt=SYSTEM_PROMPT_RAG,
         template=RAG_TEMPLATE,
         columns=["context", "question"],

         "max_new_tokens": MAX_NUM_TOKENS if is_sample else 256,
     }
     text_generator = TextGeneration(
+        llm=_get_llm(is_completion=True, generation_kwargs=generation_kwargs),
         system_prompt=SYSTEM_PROMPT_RAG,
         template=RAG_TEMPLATE,
         columns=["context", "question"],

src/synthetic_dataset_generator/pipelines/textcat.py CHANGED Viewed

@@ -109,7 +109,7 @@ def get_labeller_generator(system_prompt: str, labels: List[str], multi_label: b
         "temperature": 0.01,
         "max_new_tokens": MAX_NUM_TOKENS,
     }
-    llm = _get_llm(generation_kwargs=generation_kwargs)
     labeller_generator = TextClassification(
         llm=llm,
         context=system_prompt,

         "temperature": 0.01,
         "max_new_tokens": MAX_NUM_TOKENS,
     }
+    llm = _get_llm(is_completion=True, generation_kwargs=generation_kwargs)
     labeller_generator = TextClassification(
         llm=llm,
         context=system_prompt,