KingKazma's picture
Add BERTopic model
b505615
---
tags:
- bertopic
library_name: bertopic
pipeline_tag: text-classification
---
# cnn_dailymail_108_3000_1500_train
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
## Usage
To use this model, please install BERTopic:
```
pip install -U bertopic
```
You can use the model as follows:
```python
from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_108_3000_1500_train")
topic_model.get_topic_info()
```
## Topic overview
* Number of topics: 51
* Number of training documents: 3000
<details>
<summary>Click here for an overview of all topics.</summary>
| Topic ID | Topic Keywords | Topic Frequency | Label |
|----------|----------------|-----------------|-------|
| -1 | said - one - people - year - would | 10 | -1_said_one_people_year |
| 0 | league - player - cup - club - game | 954 | 0_league_player_cup_club |
| 1 | police - said - court - told - murder | 308 | 1_police_said_court_told |
| 2 | dog - animal - cat - elephant - zoo | 290 | 2_dog_animal_cat_elephant |
| 3 | mr - minister - labour - cameron - prime | 113 | 3_mr_minister_labour_cameron |
| 4 | obama - clinton - president - republican - campaign | 104 | 4_obama_clinton_president_republican |
| 5 | school - teacher - student - nfl - said | 84 | 5_school_teacher_student_nfl |
| 6 | food - milk - drink - wine - bottle | 72 | 6_food_milk_drink_wine |
| 7 | flight - plane - passenger - pilot - aircraft | 49 | 7_flight_plane_passenger_pilot |
| 8 | user - facebook - google - ipad - device | 48 | 8_user_facebook_google_ipad |
| 9 | olympic - gold - race - games - medal | 46 | 9_olympic_gold_race_games |
| 10 | doll - dress - fashion - look - style | 44 | 10_doll_dress_fashion_look |
| 11 | afghan - afghanistan - taliban - military - pakistan | 43 | 11_afghan_afghanistan_taliban_military |
| 12 | transplant - patient - heart - hospital - cancer | 42 | 12_transplant_patient_heart_hospital |
| 13 | iran - syrian - said - president - egypt | 42 | 13_iran_syrian_said_president |
| 14 | show - film - million - like - movie | 39 | 14_show_film_million_like |
| 15 | property - house - price - home - apartment | 38 | 15_property_house_price_home |
| 16 | earth - asteroid - moon - volcano - planet | 34 | 16_earth_asteroid_moon_volcano |
| 17 | federer - djokovic - match - murray - seed | 33 | 17_federer_djokovic_match_murray |
| 18 | jackson - jacksons - album - song - music | 31 | 18_jackson_jacksons_album_song |
| 19 | ship - boat - coast - said - vessel | 30 | 19_ship_boat_coast_said |
| 20 | russia - russian - putin - ukraine - moscow | 30 | 20_russia_russian_putin_ukraine |
| 21 | snow - weather - temperature - climate - water | 29 | 21_snow_weather_temperature_climate |
| 22 | police - station - mr - man - gang | 28 | 22_police_station_mr_man |
| 23 | ebola - disease - vaccine - virus - health | 28 | 23_ebola_disease_vaccine_virus |
| 24 | weight - fat - diet - burn - exercise | 28 | 24_weight_fat_diet_burn |
| 25 | syria - isis - islamic - muslims - alqudsi | 23 | 25_syria_isis_islamic_muslims |
| 26 | boko - haram - nigeria - nigerian - turkana | 23 | 26_boko_haram_nigeria_nigerian |
| 27 | korea - north - korean - kim - pyongyang | 22 | 27_korea_north_korean_kim |
| 28 | driver - driving - road - car - speed | 22 | 28_driver_driving_road_car |
| 29 | school - child - education - internet - english | 21 | 29_school_child_education_internet |
| 30 | mcilroy - woods - pga - tournament - round | 20 | 30_mcilroy_woods_pga_tournament |
| 31 | race - car - driver - team - f1 | 19 | 31_race_car_driver_team |
| 32 | princess - prince - diana - royal - palace | 18 | 32_princess_prince_diana_royal |
| 33 | climbing - climb - mountain - everest - ang | 18 | 33_climbing_climb_mountain_everest |
| 34 | wedding - bieber - couple - together - love | 18 | 34_wedding_bieber_couple_together |
| 35 | nhs - care - patient - hospital - health | 17 | 35_nhs_care_patient_hospital |
| 36 | iraq - iraqi - isis - baghdad - kurdish | 16 | 36_iraq_iraqi_isis_baghdad |
| 37 | cartel - drug - mexican - mexico - crack | 15 | 37_cartel_drug_mexican_mexico |
| 38 | painting - picasso - art - artist - gogh | 15 | 38_painting_picasso_art_artist |
| 39 | castro - zelaya - fidel - micheletti - president | 14 | 39_castro_zelaya_fidel_micheletti |
| 40 | french - ford - traveller - southampton - taxi | 14 | 40_french_ford_traveller_southampton |
| 41 | fire - florissant - bell - firefighter - burned | 14 | 41_fire_florissant_bell_firefighter |
| 42 | fight - ali - heavyweight - pacquiao - title | 13 | 42_fight_ali_heavyweight_pacquiao |
| 43 | fish - sea - jellyfish - manta - swell | 13 | 43_fish_sea_jellyfish_manta |
| 44 | pope - francis - vatican - falkland - islands | 12 | 44_pope_francis_vatican_falkland |
| 45 | gay - samesex - lgbt - marriage - state | 12 | 45_gay_samesex_lgbt_marriage |
| 46 | castle - tower - building - brent - lego | 12 | 46_castle_tower_building_brent |
| 47 | chinese - china - xinhua - chinas - communist | 12 | 47_chinese_china_xinhua_chinas |
| 48 | delivery - customer - market - vacuum - coin | 10 | 48_delivery_customer_market_vacuum |
| 49 | water - rain - storm - flooding - methane | 10 | 49_water_rain_storm_flooding |
</details>
## Training hyperparameters
* calculate_probabilities: True
* language: english
* low_memory: False
* min_topic_size: 10
* n_gram_range: (1, 1)
* nr_topics: None
* seed_topic_list: None
* top_n_words: 10
* verbose: False
## Framework versions
* Numpy: 1.22.4
* HDBSCAN: 0.8.33
* UMAP: 0.5.3
* Pandas: 1.5.3
* Scikit-Learn: 1.2.2
* Sentence-transformers: 2.2.2
* Transformers: 4.31.0
* Numba: 0.56.4
* Plotly: 5.13.1
* Python: 3.10.6