File size: 3,167 Bytes

---
pipeline_tag: image-text-to-text
library_name: transformers
language:
- multilingual
tags:
- got
- vision-language
- ocr2.0
- custom_code
license: apache-2.0
---

# Nayana OCR(Alpha)

Nayana OCR is a state-of-the-art model finetuned for document-level Optical Character Recognition (OCR) across **10 Indian languages**:  
**Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu**  
while maintaining exceptional OCR capabilities in **English** and **Chinese**.

This model is built upon the robust **GOT OCR** base and offers features like advanced multilingual OCR, enhanced document rendering, and seamless GPU utilization.

We are training a better model with lot more data follows us to keep it update

for more information : [Cognitivelab](https://cognitivelab.in)

---

## Key Features

- **Multilingual OCR**: Supports OCR for 10 Indian languages alongside English and Chinese.
- **Document-Level OCR**: Designed for extracting text from complex document layouts.
- **Streamlined Deployment**: Optimized for GPU usage with support for safetensors.
- **Customizable OCR Type**: Switch between OCR modes and enable rendering.

---

## Installation

To use Nayana OCR, ensure you have the following prerequisites installed:

1. Python 3.8+
2. PyTorch (with GPU support)
3. Transformers library
4. PEFT library

Install the required libraries using:

```bash
pip install torch transformers peft
```

---

## Usage Example

Here's a quick example of how to use Nayana OCR for extracting text from an image:

```python
from transformers import AutoModel, AutoTokenizer
from peft import PeftModel
import torch

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(
    'Nayana-cognitivelab/Nayana_base_OCR', 
    trust_remote_code=True, 
    torch_dtype=torch.float16
)

model = AutoModel.from_pretrained(
    'Nayana-cognitivelab/Nayana_base_OCR', 
    trust_remote_code=True, 
    low_cpu_mem_usage=True, 
    device_map='cuda', 
    use_safetensors=True, 
    pad_token_id=tokenizer.eos_token_id, 
    torch_dtype=torch.float16
)

# Prepare the model for inference
model = model.eval().cuda()

# Perform OCR on an image
image_file = 'hindi.png'
result = model.chat(
    tokenizer, 
    image_file, 
    ocr_type='ocr', 
    render=True, 
    stream_flag=True
)

print(result)
```

---

## Parameters

| Parameter    | Description                                                                 | Default  |
|--------------|-----------------------------------------------------------------------------|----------|
| `ocr_type`   | Specify the type of OCR to use (`'ocr'`)                                    | `'ocr'`  |
| `render`     | Enable rendering of the extracted text on the image.                        | `True`   |
| `stream_flag`| Stream results for larger or multi-page documents.                          | `True`   |

---

## Base Model

This model is finetuned on the **GOT OCR** base, leveraging its vision-language capabilities to deliver unparalleled OCR performance.

---

## License

This project is licensed under the **Apache 2.0 License**. See the [LICENSE](LICENSE) file for details.

---