---
library_name: transformers
tags:
  - phishing-detection
  - binary-classification
  - bert
  - nlp
---

# Model Card for Fine-tuned BERT-Base-Uncased on Phishing Site Classification

## Model Details

### Model Description

This model is a fine-tuned version of [BERT-Base-Uncased](https://huggingface.co/google-bert/bert-base-uncased) for phishing site classification. The model predicts whether a website is classified as "Safe" or "Not Safe" based on textual input.

- **Developed by:** [shogun-the-great](https://huggingface.co/shogun-the-great)
- **Model type:** Binary Classification (Safe vs Not Safe)
- **Language(s):** English
- **License:** Apache-2.0 (or specify your license)
- **Finetuned from model:** `google/bert-base-uncased`
- ** **

### Model Sources

- **Dataset:** [shawhin/phishing-site-classification](https://huggingface.co/datasets/shawhin/phishing-site-classification)

## Uses

### Direct Use

This model can be directly used for phishing detection by classifying text into two categories: "Safe" and "Not Safe." Typical use cases include:

- Integrating with browser extensions for real-time website classification.
- Analyzing textual data for phishing indicators.

### Downstream Use

Users can fine-tune the model further for specific binary classification tasks or for datasets with similar domains.

### Out-of-Scope Use

This model might not perform well for:
- Non-English text.
- Adversarial phishing attacks or heavily obfuscated text.
- Tasks unrelated to text-based classification.

## Bias, Risks, and Limitations

### Bias

The model's predictions are influenced by the dataset used during fine-tuning. If the training data contains biases, these may reflect in the predictions.

### Risks

- False positives: Legitimate websites flagged as phishing.
- False negatives: Some phishing sites might not be detected.
- Potential vulnerabilities to adversarial examples.

### Recommendations

- Regularly update the dataset and model to stay aligned with emerging phishing patterns.
- Use in combination with other security measures for robust phishing detection.

## How to Get Started with the Model

You can load the fine-tuned model directly from the Hugging Face Hub:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load the tokenizer and model from Hugging Face Hub
model_name = "shogun-the-great/finetuned-bert-phishing-site-classification"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example usage
text = "Enter your login credentials to claim a free reward!"
inputs = tokenizer(text, return_tensors="pt", truncation=True)
outputs = model(**inputs)

# Get the predicted label
logits = outputs.logits
prediction = logits.argmax(dim=-1).item()
print("Prediction:", "Not Safe" if prediction == 1 else "Safe")