Jasser-Chtourou
/

Language-Identification-Model

Text Classification

LanguageIdentificationModel

Inference Endpoints

Model card Files Files and versions Community

Language-Identification-Model / README.md

Jasser-Chtourou's picture

Jasser-Chtourou

Update README.md

580b8d6 verified 8 months ago

|

history blame contribute delete

2.67 kB

	---
	language:
	- ar
	- fr
	- es
	- it
	- pt
	- ja
	- zh
	- ru
	- de
	- en
	metrics:
	- accuracy
	- code_eval
	pipeline_tag: text-classification
	---
	# Language Detection Model

	![image](https://github.com/jasserchtourou/Language_detection/assets/124272855/399d807a-0b58-4a21-baf7-9ad97b5271f1)


	This repository contains a PyTorch-based model for language identification using multiple language detection methods. It combines predictions from various language detection libraries and models to determine the most probable language for a given text input.

	## Overview
	The LanguageIdentificationModel integrates scores from several language detection methods:

	LangDetect: Language detection using the langdetect library.
	LangID: Language identification with the langid library.
	Hugging Face: Language classification using the papluca/xlm-roberta-base-language-detection model from Hugging Face Transformers.
	FastText: Language prediction using the lid.176.bin model from FastText.
	These methods are integrated into a PyTorch neural network model, enabling accurate language identification across various text inputs.

	---
	language:
	- en
	- ar
	- fr
	- es
	- pt
	- ja
	- it
	- de
	- ru
	- zh
	metrics:
	- accuracy
	- code_eval
	library_name: transformers
	pipeline_tag: text-classification

	## Installation
	Clone this repository:
	git clone https://github.com/jasserchtourou/Language_detection.git
	cd Language_detection

	Install the required dependencies:
	pip install -r requirements.txt

	Download the necessary models:
	Download the papluca/xlm-roberta-base-language-detection model from Hugging Face Transformers.
	Download the lid.176.bin model from FastText and place it in the repository.

	## Usage
	Training the Model
	To train the model, customize the input_size, hidden_size, and output_size in train.py based on your data and run:
	python train.py

	## Using the Model
	After training, you can use the model for language identification:
	from language_identification_model import LanguageIdentificationModel

	model = LanguageIdentificationModel(input_size, hidden_size, output_size)
	model.load_state_dict(torch.load('model_weights.bin'))
	model.eval()
	text = "Bonjour tout le monde"
	language = model.predict(text)
	print(f"The identified language is: {language}")



	## Contributing
	Contributions are welcome! If you have any suggestions, improvements, or bug fixes, please submit a pull request or open an issue.

	## License
	This project is licensed under the MIT License - see the LICENSE file for details.

	![image](https://github.com/jasserchtourou/Language_detection/assets/124272855/35c1d9b0-5a8a-43d8-b5c7-f5a082722812)