---
language:
- ar
- fr
- es
- it
- pt
- ja
- zh
- ru
- de
- en
metrics:
- accuracy
- code_eval
pipeline_tag: text-classification
---
# Language Detection Model

![image](https://github.com/jasserchtourou/Language_detection/assets/124272855/399d807a-0b58-4a21-baf7-9ad97b5271f1)


This repository contains a PyTorch-based model for language identification using multiple language detection methods. It combines predictions from various language detection libraries and models to determine the most probable language for a given text input.

## Overview
The LanguageIdentificationModel integrates scores from several language detection methods:

LangDetect: Language detection using the langdetect library.
LangID: Language identification with the langid library.
Hugging Face: Language classification using the papluca/xlm-roberta-base-language-detection model from Hugging Face Transformers.
FastText: Language prediction using the lid.176.bin model from FastText.
These methods are integrated into a PyTorch neural network model, enabling accurate language identification across various text inputs.

---
language:
- en
- ar
- fr
- es
- pt
- ja
- it
- de
- ru
- zh
metrics:
- accuracy
- code_eval
library_name: transformers
pipeline_tag: text-classification

## Installation
Clone this repository:
      git clone https://github.com/jasserchtourou/Language_detection.git
      cd Language_detection
      
Install the required dependencies:
      pip install -r requirements.txt

Download the necessary models:
      Download the papluca/xlm-roberta-base-language-detection model from Hugging Face Transformers.
      Download the lid.176.bin model from FastText and place it in the repository.

## Usage
Training the Model
    To train the model, customize the input_size, hidden_size, and output_size in train.py based on your data and run:
          python train.py

## Using the Model
After training, you can use the model for language identification:
    from language_identification_model import LanguageIdentificationModel

    model = LanguageIdentificationModel(input_size, hidden_size, output_size)
    model.load_state_dict(torch.load('model_weights.bin'))
    model.eval()
    text = "Bonjour tout le monde"
    language = model.predict(text)
    print(f"The identified language is: {language}")


## Contributing
Contributions are welcome! If you have any suggestions, improvements, or bug fixes, please submit a pull request or open an issue.

## License
This project is licensed under the MIT License - see the LICENSE file for details.

![image](https://github.com/jasserchtourou/Language_detection/assets/124272855/35c1d9b0-5a8a-43d8-b5c7-f5a082722812)