Qwen2.5-3B-Tamil-Exp

Qwen2.5-3B-Tamil-Exp is built on the robust Qwen2.5 architecture and has been specifically adapted to excel at Tamil language tasks. By incorporating training log entries from the prithivMLmods/Deepthink-Reasoning-Tamil dataset along with the proven reasoning framework of Qwen models, this 3B-parameter variant achieves enhanced chain-of-thought reasoning and logical problem solving—especially tailored for Tamil. Its improvements extend to context understanding, structured data processing, and long-context comprehension, making it ideal for complex reasoning tasks, instruction-following, and text generation in Tamil and other languages.

Key Improvements

  1. Advanced Reasoning & Logic:
    Optimized for multi-step problem solving and logical deduction. Fine-tuning on the Deepthink-Reasoning-Tamil entries further refines its reasoning capabilities in Tamil contexts.

  2. Fine-Tuned Instruction Following:
    Generates precise responses and structured outputs (such as JSON), making it well-suited for dialog-based applications and code generation tasks that require strict adherence to Tamil language instructions.

  3. Greater Adaptability:
    Excels in role-playing scenarios, multi-turn dialogues, and diverse system prompts with a focus on culturally nuanced Tamil content while maintaining support for multiple languages.

  4. Long-Context Support:
    Capable of handling extended inputs (up to 64K tokens) and generating outputs of up to 4K tokens, enabling the processing of detailed and lengthy Tamil texts.

  5. Multilingual Proficiency with Tamil Focus:
    While supporting over 20 languages, the model’s training emphasis on Tamil ensures superior performance on tasks involving Tamil language understanding and generation.

Intended Use

  • Advanced Logical & Analytical Reasoning:
    Ideal for solving multi-step problems and deductive reasoning tasks, especially those presented in Tamil.
  • Mathematical & Scientific Computation:
    Supports theorem proving, complex calculations, and retrieval of scientific knowledge with an emphasis on Tamil terminology.
  • Code Generation & Debugging:
    Generates optimized code, detects errors, and enhances programming workflows with support for Tamil documentation or comments.
  • Structured Data Analysis:
    Processes tables, JSON, and other structured formats, which is particularly useful for localized applications requiring Tamil language outputs.
  • Multilingual Reasoning & Translation:
    While excelling in Tamil, it is also proficient in other languages for international applications.
  • Extended Text Generation:
    Capable of producing research papers, instructional guides, and in-depth reports in Tamil.

Quickstart with Transformers

Below is an example of how to load and use the model with the Hugging Face Transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "your_org/Qwen2.5-3B-Tamil-Exp"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "தமிழில் தர்க்கரீதியான எண்ணத்தை விளக்குங்கள்."  # "Explain the concept of logical reasoning in Tamil."
messages = [
    {"role": "system", "content": "நீங்கள் ஒரு தமிழில் சிறந்த தர்க்கரீதியான உதவியாளர்."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=256
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Limitations

  1. Moderate Computational Requirements:
    Requires mid-end consumer GPUs for optimal inference.
  2. Language-Specific Variability:
    While performance is strong for Tamil, results may vary for other supported languages.
  3. Potential Error Accumulation:
    Extended outputs may sometimes introduce inconsistencies.
  4. Limited Real-World Awareness:
    The model’s knowledge is based on its training data and may not include recent events.
  5. Prompt Sensitivity:
    High-quality responses depend on the clarity and specificity of the input prompt.
Downloads last month
23
Safetensors
Model size
3.09B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for prithivMLmods/Qwen2.5-3B-Tamil-Exp

Base model

Qwen/Qwen2.5-3B
Finetuned
(136)
this model
Quantizations
1 model

Dataset used to train prithivMLmods/Qwen2.5-3B-Tamil-Exp