--- license: mit library_name: sklearn tags: - tabular-classification - sklearn - phishing - onnx model_format: pickle model_file: model.pkl widget: - structuredData: domain_age: - 11039.0 - -1.0 - 5636.0 domain_registration_length: - 3571.0 - 0.0 - 208.0 google_index: - 0.0 - 0.0 - 0.0 nb_hyperlinks: - 97.0 - 168.0 - 52.0 page_rank: - 5.0 - 2.0 - 10.0 ratio_extHyperlinks: - 0.030927835 - 0.220238095 - 0.442307692 ratio_extRedirection: - 0.0 - 0.378378378 - 0.0 ratio_intHyperlinks: - 0.969072165 - 0.779761905 - 0.557692308 safe_anchor: - 25.0 - 24.32432432 - 0.0 status: - legitimate - legitimate - legitimate web_traffic: - 178542.0 - 0.0 - 2.0 inference: false pipeline_tag: tabular-classification --- # Model description ## Training Procedure ### Hyperparameters

Click to expand

| Hyperparameter | Value | |-------------------------------------|--------------------------| | base_estimator | deprecated | | cv | 5 | | ensemble | True | | estimator__bootstrap | True | | estimator__ccp_alpha | 0.0 | | estimator__class_weight | | | estimator__criterion | gini | | estimator__max_depth | | | estimator__max_features | sqrt | | estimator__max_leaf_nodes | | | estimator__max_samples | | | estimator__min_impurity_decrease | 0.0 | | estimator__min_samples_leaf | 1 | | estimator__min_samples_split | 2 | | estimator__min_weight_fraction_leaf | 0.0 | | estimator__n_estimators | 100 | | estimator__n_jobs | | | estimator__oob_score | False | | estimator__random_state | | | estimator__verbose | 0 | | estimator__warm_start | False | | estimator | RandomForestClassifier() | | method | isotonic | | n_jobs | |

### Model Plot This is the architecture of the model loaded by joblib.

CalibratedClassifierCV(cv=5, estimator=RandomForestClassifier(),method='isotonic')

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

## Evaluation Results | Metric | Value | |-----------|----------| | accuracy | 0.945652 | | f1-score | 0.945114 | | precision | 0.951996 | | recall | 0.938331 | # How to Get Started with the Model Below are some code snippets to load the model. ## With ONNX (recommended) ### Python ```python import onnxruntime import pandas as pd from huggingface_hub import hf_hub_download REPO_ID = "pirocheto/phishing-url-detection" FILENAME = "model.onnx" # Initializing the ONNX Runtime session with the pre-trained model sess = onnxruntime.InferenceSession( hf_hub_download(repo_id=REPO_ID, filename=FILENAME), providers=["CPUExecutionProvider"], ) # Defining a list of URLs with characteristics data = [ { "url": "https://www.rga.com/about/workplace", "nb_hyperlinks": 97, "ratio_intHyperlinks": 0.969072165, "ratio_extHyperlinks": 0.030927835, "ratio_extRedirection": 0, "safe_anchor": 25, "domain_registration_length": 3571, "domain_age": 11039, "web_traffic": 178542, "google_index": 0, "page_rank": 5, }, ] # Converting data to a float32 NumPy array df = pd.DataFrame(data).set_index("url") inputs = df.to_numpy(dtype="float32") # Using the ONNX model to make predictions on the input data probas = sess.run(None, {"X": inputs})[1] # Displaying the results for url, proba in zip(data, probas): print(f"URL: {url['url']}") print(f"Likelihood of being a phishing site: {proba[1] * 100:.2f}%") print("----") # Output: # URL: https://www.rga.com/about/workplace # Likelihood of being a phishing site: 0.89% # ---- ```