--- library_name: transformers tags: - phishing-detection - binary-classification - bert - nlp --- # Model Card for Fine-tuned BERT-Base-Uncased on Phishing Site Classification ## Model Details ### Model Description This model is a fine-tuned version of [BERT-Base-Uncased](https://huggingface.co/google-bert/bert-base-uncased) for phishing site classification. The model predicts whether a website is classified as "Safe" or "Not Safe" based on textual input. - **Developed by:** [shogun-the-great](https://huggingface.co/shogun-the-great) - **Model type:** Binary Classification (Safe vs Not Safe) - **Language(s):** English - **License:** Apache-2.0 (or specify your license) - **Finetuned from model:** `google/bert-base-uncased` - ** ** ### Model Sources - **Dataset:** [shawhin/phishing-site-classification](https://huggingface.co/datasets/shawhin/phishing-site-classification) ## Uses ### Direct Use This model can be directly used for phishing detection by classifying text into two categories: "Safe" and "Not Safe." Typical use cases include: - Integrating with browser extensions for real-time website classification. - Analyzing textual data for phishing indicators. ### Downstream Use Users can fine-tune the model further for specific binary classification tasks or for datasets with similar domains. ### Out-of-Scope Use This model might not perform well for: - Non-English text. - Adversarial phishing attacks or heavily obfuscated text. - Tasks unrelated to text-based classification. ## Bias, Risks, and Limitations ### Bias The model's predictions are influenced by the dataset used during fine-tuning. If the training data contains biases, these may reflect in the predictions. ### Risks - False positives: Legitimate websites flagged as phishing. - False negatives: Some phishing sites might not be detected. - Potential vulnerabilities to adversarial examples. ### Recommendations - Regularly update the dataset and model to stay aligned with emerging phishing patterns. - Use in combination with other security measures for robust phishing detection. ## How to Get Started with the Model You can load the fine-tuned model directly from the Hugging Face Hub: ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification # Load the tokenizer and model from Hugging Face Hub model_name = "shogun-the-great/finetuned-bert-phishing-site-classification" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Example usage text = "Enter your login credentials to claim a free reward!" inputs = tokenizer(text, return_tensors="pt", truncation=True) outputs = model(**inputs) # Get the predicted label logits = outputs.logits prediction = logits.argmax(dim=-1).item() print("Prediction:", "Not Safe" if prediction == 1 else "Safe")