Model Card for Fine-tuned BERT-Base-Uncased on Phishing Site Classification
Model Details
Model Description
This model is a fine-tuned version of BERT-Base-Uncased for phishing site classification. The model predicts whether a website is classified as "Safe" or "Not Safe" based on textual input.
- Developed by: shogun-the-great
- Model type: Binary Classification (Safe vs Not Safe)
- Language(s): English
- License: Apache-2.0 (or specify your license)
- Finetuned from model:
google/bert-base-uncased
Model Sources
- Dataset: shawhin/phishing-site-classification
Uses
Direct Use
This model can be directly used for phishing detection by classifying text into two categories: "Safe" and "Not Safe." Typical use cases include:
- Integrating with browser extensions for real-time website classification.
- Analyzing textual data for phishing indicators.
Downstream Use
Users can fine-tune the model further for specific binary classification tasks or for datasets with similar domains.
Out-of-Scope Use
This model might not perform well for:
- Non-English text.
- Adversarial phishing attacks or heavily obfuscated text.
- Tasks unrelated to text-based classification.
Bias, Risks, and Limitations
Bias
The model's predictions are influenced by the dataset used during fine-tuning. If the training data contains biases, these may reflect in the predictions.
Risks
- False positives: Legitimate websites flagged as phishing.
- False negatives: Some phishing sites might not be detected.
- Potential vulnerabilities to adversarial examples.
Recommendations
- Regularly update the dataset and model to stay aligned with emerging phishing patterns.
- Use in combination with other security measures for robust phishing detection.
How to Get Started with the Model
You can load the fine-tuned model directly from the Hugging Face Hub:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load the tokenizer and model from Hugging Face Hub
model_name = "shogun-the-great/finetuned-bert-phishing-site-classification"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Example usage
text = "Enter your login credentials to claim a free reward!"
inputs = tokenizer(text, return_tensors="pt", truncation=True)
outputs = model(**inputs)
# Get the predicted label
logits = outputs.logits
prediction = logits.argmax(dim=-1).item()
print("Prediction:", "Not Safe" if prediction == 1 else "Safe")
- Downloads last month
- 4
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.