config.json might be improperly configured or model is not made accessible from HF

#3
by dipyamanroy - opened

When I try to use the model on Colab with the example snippet, I get the following error trace.

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-12-18e69c6481b8> in <cell line: 0>()
      2 
      3 model_name = "smirki/UIGEN-T1-Qwen-7b"
----> 4 tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
      5 model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).to("cuda")
      6 
...
ValueError: Unrecognized model in smirki/UIGEN-T1-Qwen-7b. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, aria, aria_text, audio-spectrogram-transformer, autoformer, bamba, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, blenderbot, blenderbot-small, blip, blip-2, bloom, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, cohere2, colpali, conditional_detr, convbert, convnext, convnextv2, cpmant, ctrl, cvt, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deformable_detr, deit, depth_anything, deta, detr, diffllama, dinat, dinov2, dinov2_with_registers, distilbert, donut-swin, dpr, dpt, efficientformer, efficientnet, electra, emu3, encodec, encoder-decoder, ernie, ernie_m, esm, falcon, falcon_mamba, fastspeech2_conformer, flaubert, flava, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, git, glm, glpn, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gptj, gptsan-japanese, granite, granitemoe, graphormer, grounding-dino, groupvit, hiera, hubert, ibert, idefics, idefics2, idefics3, idefics3_vision, ijepa, imagegpt, informer, instructblip, instructblipvideo, jamba, jetmoe, jukebox, kosmos-2, layoutlm, layoutlmv2, layoutlmv3, led, levit, lilt, llama, llav...

Is it because the config.json is improperly configured or does the model only work locally?

Got it running using llama_cpp. However, the content only shows the the thinking process, the code doesn't get shared. This is the code snippet:

# Initialize the model
llm = Llama(model_path=model_path, n_gpu_layers=0)  # Use n_gpu_layers=0 if not using GPU

# Define the prompt with a clear request for the full response
prompt = """<|im_start|>user
Make a dark-themed dashboard for an oil rig. Provide the HTML, CSS, and JavaScript implementation including animations and Tailwind CSS classes. Make sure it is production ready with responsive design, accessibility features, and proper error handling. Also include Google Fonts and Font Awesome for the icons.
<|im_end|>
<|im_start|>assistant
<|im_start|>think
"""

# Generate the response
response = llm.create_chat_completion(
    messages=[{"role": "user", "content": prompt}],
    max_tokens=12012,  # Adjust max tokens as needed
    temperature=0.7  # Adjust the creativity level of the model
)

# Print the response
print(response["choices"][0]["message"]["content"])

Response:


Llama.generate: 24 prefix-match hit, remaining 26 prompt tokens to eval
llama_perf_context_print:        load time =    9924.23 ms
llama_perf_context_print: prompt eval time =    6580.40 ms /    26 tokens (  253.09 ms per token,     3.95 tokens per second)
llama_perf_context_print:        eval time =  390671.81 ms /   461 runs   (  847.44 ms per token,     1.18 tokens per second)
llama_perf_context_print:       total time =  398127.16 ms /   487 tokens

think
The user wants a dark-themed dashboard for an oil rig. This dashboard needs to be: Top 0.01% UI (award-winning quality) Structurally sound Absolutely amazing Production ready Robust Coolest thing ever With animations and functions HTML, CSS, JS Tailwind CSS Google Fonts Font Awesome for icons No comments in code No full SVG No SHA for libraries I need to consider the following aspects: Dark Theme:  Implement a dark color palette. Use light text on dark backgrounds. Focus on contrast and readability. Dashboard Structure:   Plan the layout of the dashboard. Common sections for an oil rig dashboard might include: Status Overview:  Current production, pressure, temperature, etc. Sensor Data:  Real-time data from sensors. Control Panel:  Options to adjust rig parameters. Maintenance Schedule:  Upcoming tasks. Weather Forecast:   (optional but good for context) Navigation:   Side or top navigation for different sections. UI/UX Principles:  Intuitive, clear, fast, accessible, responsive, visually appealing, consistent, minimal clicks, feedback, error handling, loading, animations, accessibility (WCAG), performance, code quality, mobile, future-proof, social proof, conversion. Technical Implementation:  Tailwind CSS, Google Fonts, Font Awesome, HTML5, CSS3, JS, animations (CSS transitions/animations, JS animations). Production Ready:  Robust, secure, maintainable, scalable, version control, build process, testing, deployment, monitoring, A/B testing, roll-back, split testing. Cool & Animated:  Engaging animations, smooth transitions, interactive elements. Content:  Placeholder data and text. Images:  Use picsum.photos. Constraints:  No comments, no full SVG, no SHA for libraries. High-level plan: Setup basic HTML structure with header, main content (dashboard), and footer. Define the dark theme using Tailwind CSS configuration and custom CSS for any specific dark theme elements. Structure the dashboard content with sections for overview, sensor data, control panel, etc., using Tailwind CSS classes for layout and styling. Implement animations using Tailwind CSS transitions and transforms, and potentially JS for more complex animations. Add Font Awesome icons for visual cues and navigation. Integrate Google Fonts for typography. Ensure responsiveness and accessibility

Is it because it runs out of token while thinking? How to solve it?

Hey! Sorry it was my first time doing this. It does not run out of tokens while thinking, it actually outputs an end of sentence signal. Since I used budget forcing (https://arxiv.org/html/2501.19393v1), you need to continue generating think(s) til you like it and then manually edit in an 'answer'. Try this:

from transformers import TextStreamer
FastLanguageModel.for_inference(model)

# --- 2. Define Inference Prompts (Two Variations) ---
s1_inference_prompt_think_only = """<|im_start|>user
{question}<|im_end|>
<|im_start|>assistant
<|im_start|>think"""

s1_inference_prompt_with_answer_cue = """<|im_start|>user
{question}<|im_end|>
<|im_start|>assistant
<|im_start|>think
{reasoning}<|im_end|>
<|im_start|>answer"""

# --- 3. Initialize TextStreamer ---
text_streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True) # skip_prompt=True to avoid streaming the prompt itself

def generate_ui_code_streaming(question, inference_prompt, temperature_val, max_tokens, streamer):
    """Generates UI code with reasoning using streaming output."""

    inference_prompt_formatted = inference_prompt.format(question=question, reasoning="")

    inputs = tokenizer(
        [inference_prompt_formatted],
        return_tensors="pt"
    ).to("cuda")

    outputs = model.generate(
        **inputs,
        max_new_tokens=max_tokens,
        use_cache=True,
        do_sample=True,
        temperature=temperature_val,
        top_p=0.9,
        streamer=streamer # Pass the streamer here
    )
    return tokenizer.batch_decode(outputs, skip_special_tokens=False)[0] # Return full output for parsing later


def parse_and_print_output_streaming(full_generated_output, config_name):
    """Parses and prints structured output after streaming is complete."""
    print(f"\n--- Parsed Output (Config: {config_name}) ---") # Indicate parsed output after streaming
    question_marker = "<|im_start|>user\n"
    assistant_marker = "<|im_start|>assistant\n<|im_start|>think\n"
    answer_marker = "<|im_start|>answer\n"
    im_end_marker = "<|im_end|>\n"

    question_start_index = full_generated_output.find(question_marker) + len(question_marker)
    question_end_index = full_generated_output.find("<|im_end|>", question_start_index)
    reasoning_start_index = full_generated_output.find(assistant_marker) + len(assistant_marker)
    reasoning_end_index = full_generated_output.find(answer_marker, reasoning_start_index)
    answer_start_index = full_generated_output.find(answer_marker, reasoning_start_index) + len(answer_marker)
    answer_end_index = full_generated_output.rfind("<|im_end|>")

    extracted_question = full_generated_output[question_start_index:question_end_index].strip() if question_start_index != -1 and question_end_index != -1 else "Question parsing failed"
    extracted_reasoning = full_generated_output[reasoning_start_index:reasoning_end_index].strip() if reasoning_start_index != -1 and reasoning_end_index != -1 else "Reasoning parsing failed"
    extracted_answer = full_generated_output[answer_start_index:answer_end_index].strip() if answer_start_index != -1 and answer_end_index != -1 else "Answer parsing failed"

    print(f"Question:\n{extracted_question}\n")
    print(f"Reasoning:\n{extracted_reasoning}\n")
    print(f"Answer:\n{extracted_answer}\n")


# --- 4. Prepare UI Design Question ---
ui_design_question = "Make me a modern website that sells fireworks."

# --- 5. Run Inference with Streaming and Different Configurations ---

# --- Configuration 1: Think-Only Prompt, Temperature 0.7, High max_new_tokens, Streaming ---
config1_name = "Think-Only Prompt, Temp 0.7, Streaming"
print(f"\n--- Generating with Config: {config1_name} (Streaming Output) ---") # Indicate streaming start
output_config1_streaming = generate_ui_code_streaming(ui_design_question, s1_inference_prompt_think_only, temperature_val=0.7, max_tokens=14024, streamer=text_streamer) # Reduced max_tokens for streaming example, adjust if needed
parse_and_print_output_streaming(output_config1_streaming, config1_name) # Parse AFTER streaming


# --- Configuration 2: With Answer Cue Prompt, Temperature 0.7, Streaming ---
config2_name = "With Answer Cue Prompt, Temp 0.7, Streaming"
print(f"\n--- Generating with Config: {config2_name} (Streaming Output) ---")

output_config2_streaming = generate_ui_code_streaming(ui_design_question, s1_inference_prompt_with_answer_cue, temperature_val=0.7, max_tokens=14024, streamer=text_streamer)
parse_and_print_output_streaming(output_config2_streaming, config2_name)


# # --- Configuration 3: Think-Only Prompt, Temperature 0.1 (Deterministic), Streaming ---
# config3_name = "Think-Only Prompt, Temp 0.1, Streaming"
# print(f"\n--- Generating with Config: {config3_name} (Streaming Output) ---")

# output_config3_streaming = generate_ui_code_streaming(ui_design_question, s1_inference_prompt_think_only, temperature_val=0.1, max_tokens=14024, streamer=text_streamer)
# parse_and_print_output_streaming(output_config3_streaming, config3_name)


# print("\n--- Streaming Inference Completed. Check real-time output and parsed outputs. ---")

Sign up or log in to comment