Llama-3.2-11b-4bit-mmm-r

A cutting-edge, multi-modal reasoning model specializing in mathematical problem solving. Llama-3.2-11b-4bit-mmm-r integrates vision and language understanding to tackle challenging math problems, from symbolic algebra to visual geometry, making it an ideal research partner for educators, students, and AI enthusiasts.

Model Details

Model Description

miike-ai/Llama-3.2-11b-4bit-mmm-r is designed to combine multimodal inputs—both text and images—to deliver nuanced reasoning on math problems. Fine-tuned on a curated mix of textual math datasets and annotated problem diagrams, this model excels at step-by-step reasoning, interpretable chain-of-thought explanations, and generating detailed solutions to complex mathematical queries.

Developed by: Your Team / Organization Name
Funded by: [If applicable, insert funding source or “Self-funded”]
Shared by: [Your Organization or individual name]
Model type: Multi-modal reasoning model (Math specialization)
Language(s) (NLP): English (primarily); additional languages may be supported
License: [Specify your license, e.g., Apache-2.0, MIT, etc.]
Finetuned from model: unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit

Model Sources

Repository: [Insert repository URL, e.g., https://github.com/your_org/llama-3.2-11b-4bit-multimodal-mathqa-r]
Paper: [If available, provide a link or reference to the technical report or paper]
Demo: [If available, provide a link to an interactive demo]

Uses

miike-ai/Llama-3.2-11b-4bit-mmm-r is designed to assist in a variety of math-related tasks and research applications.

Direct Use

Interactive Problem Solving: Ask the model to solve complex math problems step-by-step.
Educational Support: Use the model as a teaching aid to illustrate mathematical concepts and problem-solving strategies.
Research & Prototyping: Rapidly test mathematical hypotheses or develop AI-driven math tutoring systems.

Downstream Use

Integration into Learning Platforms: Embed the model into educational apps for dynamic math tutoring.
Automated Grading & Feedback: Utilize the model to provide detailed solution feedback for math assignments.
Data Annotation: Leverage the model to assist with annotating math datasets for further research.

Out-of-Scope Use

Medical, Legal, or Financial Advice: The model is specialized for mathematical reasoning and should not be used to provide professional advice in these domains.
General-purpose language generation: While capable, the model’s strengths lie in math-specific tasks and reasoning rather than open-domain dialogue.

Bias, Risks, and Limitations

Bias: The model’s training data is math-centric and may underperform on non-mathematical content or culturally diverse problem representations.
Risks: Users should be cautious when deploying the model for educational assessment, as errors in reasoning or calculation may mislead learners.
Limitations: As a 4-bit model, there may be trade-offs between computational efficiency and accuracy. Additionally, multi-modal inputs require careful formatting for optimal performance.

Recommendations

Users should validate the model’s outputs, especially in high-stakes or educational settings. Continuous monitoring, additional fine-tuning, and human oversight are recommended to mitigate risks and biases.

How to Get Started with the Model

Below is a quick-start code snippet to load and use the model:

import os
import time
import json
import requests
import mimetypes
from io import BytesIO
from zipfile import ZipFile
from PIL import Image
import readline  # Improves input handling

import torch
from transformers import TextStreamer
from unsloth import FastVisionModel, is_bf16_supported

def load_image_from_url(url):
    """Load an image from a URL."""
    try:
        response = requests.get(url, stream=True)
        response.raise_for_status()
        return Image.open(BytesIO(response.content))
    except Exception as e:
        raise Exception(f"Failed to load image from URL: {str(e)}")

def run_chat_inference(model, tokenizer):
    """Run an interactive multimodal chat inference session."""
    print("\nMultimodal Chat Interface")
    print("-------------------------")
    print("Commands:")
    print("  - Type 'image: <path>' to load a local image")
    print("  - Type 'url: <url>' to load an image from URL")
    print("  - Type 'quit' to exit")
    print("  - Otherwise, type your query")
    print("-------------------------\n")
    
    current_image = None

    while True:
        try:
            user_input = input("You: ").strip()
            if user_input.lower() == 'quit':
                print("Exiting chat...")
                break

            # Handle image loading commands
            if user_input.lower().startswith(('image:', 'url:')):
                command, path = user_input.split(":", 1)
                path = path.strip()
                try:
                    if command.lower() == 'image':
                        current_image = Image.open(path)
                        print(f"✓ Loaded image from local path: {path}")
                    elif command.lower() == 'url':
                        current_image = load_image_from_url(path)
                        print(f"✓ Loaded image from URL: {path}")
                except Exception as e:
                    print(f"Error loading image: {str(e)}")
                continue

            # Ensure an image is loaded before processing a chat query
            if current_image is None:
                print("Please load an image first using 'image:' or 'url:'")
                continue

            # Prepare the multimodal message
            messages = [
                {
                    "role": "user",
                    "content": [
                        {"type": "image", "image": current_image},
                        {"type": "text", "text": f"Analyze this image and {user_input}. Provide clear, natural responses focusing on what's visible."}
                    ]
                }
            ]
            # Convert the messages into a prompt string (the exact method depends on your tokenizer's API)
            input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)

            # Start timing inference
            start_time = time.time()

            # Prepare inputs for the model
            # Note: Depending on your tokenizer, you may need to pass only the text input
            # and handle images separately. Adjust accordingly.
            inputs = tokenizer(
                current_image,    # Pass image input if supported by your tokenizer
                input_text,
                add_special_tokens=False,
                return_tensors="pt",
            ).to("cuda")

            print("\nAssistant: ", end='', flush=True)
            text_streamer = TextStreamer(tokenizer, skip_prompt=True)
            _ = model.generate(
                **inputs,
                streamer=text_streamer,
                max_new_tokens=256,
                use_cache=True,
                temperature=0.7,
                min_p=0.1
            )

            # Compute and display inference time
            inference_time = time.time() - start_time
            print(f"\n[Inference time: {inference_time:.2f}s]\n")

        except KeyboardInterrupt:
            print("\nExiting chat...")
            break
        except Exception as e:
            print(f"Error: {str(e)}")
            continue

def main():
    # Load the model and tokenizer for inference.
    print("Loading model for inference...")
    model, tokenizer = FastVisionModel.from_pretrained(
        "miike-ai/Llama-3.2-11b-4bit-mmm-r",  # Replace with your model's repository or identifier if needed
        load_in_4bit=True,
        use_gradient_checkpointing="unsloth"
    )

    # Set model to inference mode and convert to float32 for generation
    FastVisionModel.for_inference(model)
    model = model.float()  # Ensure computations use float32

    # Start the interactive multimodal chat interface
    run_chat_inference(model, tokenizer)

if __name__ == "__main__":
    main()