Spam Detection System

Lite Model

Introduction

The Lite model is a streamlined approach with optimized parameters and enhanced feature extraction designed for quick and efficient spam detection.

Features

  • Text Preprocessing: Lemmatization, removal of stop words and punctuation.
  • Feature Extraction: Text length, word count, unique word count, uppercase count, special character count.
  • Model Creation: Ensemble model using SVC, MultinomialNB, and ExtraTreesClassifier.
  • Visualization: Generates graphs for dataset insights, word clouds, and performance metrics.
  • Metrics Saving: Accuracy, precision, and F1 score.

How to Run

  1. Train the Model:
    python training/train_model_lite.py
    
  2. Use the Model:
    import joblib
    model = joblib.load('models/model.pkl')
    vectorizer = joblib.load('models/vectorizer.pkl')
    

Legacy Model

Introduction

The Legacy model retains the original model logic without optimization but updates the structure and adds visualizations for spam detection.

Features

  • Text Preprocessing: Porter Stemming, removal of stop words and punctuation.
  • Model Creation: Ensemble model using SVC, MultinomialNB, and ExtraTreesClassifier with original parameters.
  • Visualization: Generates graphs for dataset insights, word clouds, and performance metrics.
  • Metrics Saving: Accuracy and precision.

How to Run

  1. Train the Model:
    python training/train_model_legacy.py
    
  2. Use the Model:
    import joblib
    model = joblib.load('models/model.pkl')
    vectorizer = joblib.load('models/vectorizer.pkl')
    

Additional Information

  • Dependencies: Python 3.6 or higher, pip, and required packages listed in requirements.txt.
  • Dataset: The dataset used for training is spam.csv.
  • Contact and Support: For questions or support, please contact the project maintainers.

For more details, you can refer to the README.md and models.md.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train arkodeep/spam-classfication-model

Collection including arkodeep/spam-classfication-model