Model Card for Fine-tuned BERT-Base-Uncased on Phishing Site Classification

Model Details

Model Description

This model is a fine-tuned version of BERT-Base-Uncased for phishing site classification. The model predicts whether a website is classified as "Safe" or "Not Safe" based on textual input.

  • Developed by: shogun-the-great
  • Model type: Binary Classification (Safe vs Not Safe)
  • Language(s): English
  • License: Apache-2.0 (or specify your license)
  • Finetuned from model: google/bert-base-uncased

Model Sources

Uses

Direct Use

This model can be directly used for phishing detection by classifying text into two categories: "Safe" and "Not Safe." Typical use cases include:

  • Integrating with browser extensions for real-time website classification.
  • Analyzing textual data for phishing indicators.

Downstream Use

Users can fine-tune the model further for specific binary classification tasks or for datasets with similar domains.

Out-of-Scope Use

This model might not perform well for:

  • Non-English text.
  • Adversarial phishing attacks or heavily obfuscated text.
  • Tasks unrelated to text-based classification.

Bias, Risks, and Limitations

Bias

The model's predictions are influenced by the dataset used during fine-tuning. If the training data contains biases, these may reflect in the predictions.

Risks

  • False positives: Legitimate websites flagged as phishing.
  • False negatives: Some phishing sites might not be detected.
  • Potential vulnerabilities to adversarial examples.

Recommendations

  • Regularly update the dataset and model to stay aligned with emerging phishing patterns.
  • Use in combination with other security measures for robust phishing detection.

How to Get Started with the Model

You can load the fine-tuned model directly from the Hugging Face Hub:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load the tokenizer and model from Hugging Face Hub
model_name = "shogun-the-great/finetuned-bert-phishing-site-classification"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example usage
text = "Enter your login credentials to claim a free reward!"
inputs = tokenizer(text, return_tensors="pt", truncation=True)
outputs = model(**inputs)

# Get the predicted label
logits = outputs.logits
prediction = logits.argmax(dim=-1).item()
print("Prediction:", "Not Safe" if prediction == 1 else "Safe")
Downloads last month
4
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.