Banglish-to-Bangla Transliteration Model

Model Details

Model Description

This model is designed to transliterate Banglish (Bengali written in Roman script) into Bengali script. It is fine-tuned from the facebook/mbart-large-50-many-to-many-mmt model using the SKNahin/bengali-transliteration-data dataset.

  • Developed by: Md. Farhan Masud Shohag
  • Model type: Sequence-to-Sequence (Translation)
  • Language(s): Banglish → Bengali (bn_BD)
  • License: Apache 2.0
  • Fine-tuned from: facebook/mbart-large-50-many-to-many-mmt

Model Sources


Uses

Direct Use

  • Transliteration of Banglish text to Bengali script for social media, messaging, and formal communication.

Downstream Use

  • Fine-tuning for translation tasks between Bengali and other languages.
  • Integration into chatbots or virtual assistants.

Out-of-Scope Use

  • General-purpose language translation between unrelated languages.
  • Handling code-mixed languages (e.g., Banglish + English combinations).

Bias, Risks, and Limitations

Biases

  • The dataset may include informal phrases, potentially reducing performance on formal language.
  • Performance may degrade for long or complex sentences.

Limitations

  • Model performance may vary for rare phrases or slang.
  • Does not support mixed language inputs effectively.

Recommendations

Users should evaluate outputs for their specific use cases, especially in formal contexts. Additional filtering or pre-processing may be required.


How to Use

Example Code

from transformers import MBartForConditionalGeneration, MBart50TokenizerFast

model = MBartForConditionalGeneration.from_pretrained("your-username/banglish-to-bangla-mbart")
tokenizer = MBart50TokenizerFast.from_pretrained("your-username/banglish-to-bangla-mbart")

def translate(text):
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=64)
    outputs = model.generate(inputs.input_ids, max_length=64, num_beams=5, early_stopping=True)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

print(translate("ami tomake valobashi"))
Downloads last month
141
Safetensors
Model size
611M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.