File size: 5,872 Bytes
b505615
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119

---
tags:
- bertopic
library_name: bertopic
pipeline_tag: text-classification
---

# cnn_dailymail_108_3000_1500_train

This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model. 
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets. 

## Usage 

To use this model, please install BERTopic:

```
pip install -U bertopic
```

You can use the model as follows:

```python
from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_108_3000_1500_train")

topic_model.get_topic_info()
```

## Topic overview

* Number of topics: 51
* Number of training documents: 3000

<details>
  <summary>Click here for an overview of all topics.</summary>
  
  | Topic ID | Topic Keywords | Topic Frequency | Label | 
|----------|----------------|-----------------|-------| 
| -1 | said - one - people - year - would | 10 | -1_said_one_people_year | 
| 0 | league - player - cup - club - game | 954 | 0_league_player_cup_club | 
| 1 | police - said - court - told - murder | 308 | 1_police_said_court_told | 
| 2 | dog - animal - cat - elephant - zoo | 290 | 2_dog_animal_cat_elephant | 
| 3 | mr - minister - labour - cameron - prime | 113 | 3_mr_minister_labour_cameron | 
| 4 | obama - clinton - president - republican - campaign | 104 | 4_obama_clinton_president_republican | 
| 5 | school - teacher - student - nfl - said | 84 | 5_school_teacher_student_nfl | 
| 6 | food - milk - drink - wine - bottle | 72 | 6_food_milk_drink_wine | 
| 7 | flight - plane - passenger - pilot - aircraft | 49 | 7_flight_plane_passenger_pilot | 
| 8 | user - facebook - google - ipad - device | 48 | 8_user_facebook_google_ipad | 
| 9 | olympic - gold - race - games - medal | 46 | 9_olympic_gold_race_games | 
| 10 | doll - dress - fashion - look - style | 44 | 10_doll_dress_fashion_look | 
| 11 | afghan - afghanistan - taliban - military - pakistan | 43 | 11_afghan_afghanistan_taliban_military | 
| 12 | transplant - patient - heart - hospital - cancer | 42 | 12_transplant_patient_heart_hospital | 
| 13 | iran - syrian - said - president - egypt | 42 | 13_iran_syrian_said_president | 
| 14 | show - film - million - like - movie | 39 | 14_show_film_million_like | 
| 15 | property - house - price - home - apartment | 38 | 15_property_house_price_home | 
| 16 | earth - asteroid - moon - volcano - planet | 34 | 16_earth_asteroid_moon_volcano | 
| 17 | federer - djokovic - match - murray - seed | 33 | 17_federer_djokovic_match_murray | 
| 18 | jackson - jacksons - album - song - music | 31 | 18_jackson_jacksons_album_song | 
| 19 | ship - boat - coast - said - vessel | 30 | 19_ship_boat_coast_said | 
| 20 | russia - russian - putin - ukraine - moscow | 30 | 20_russia_russian_putin_ukraine | 
| 21 | snow - weather - temperature - climate - water | 29 | 21_snow_weather_temperature_climate | 
| 22 | police - station - mr - man - gang | 28 | 22_police_station_mr_man | 
| 23 | ebola - disease - vaccine - virus - health | 28 | 23_ebola_disease_vaccine_virus | 
| 24 | weight - fat - diet - burn - exercise | 28 | 24_weight_fat_diet_burn | 
| 25 | syria - isis - islamic - muslims - alqudsi | 23 | 25_syria_isis_islamic_muslims | 
| 26 | boko - haram - nigeria - nigerian - turkana | 23 | 26_boko_haram_nigeria_nigerian | 
| 27 | korea - north - korean - kim - pyongyang | 22 | 27_korea_north_korean_kim | 
| 28 | driver - driving - road - car - speed | 22 | 28_driver_driving_road_car | 
| 29 | school - child - education - internet - english | 21 | 29_school_child_education_internet | 
| 30 | mcilroy - woods - pga - tournament - round | 20 | 30_mcilroy_woods_pga_tournament | 
| 31 | race - car - driver - team - f1 | 19 | 31_race_car_driver_team | 
| 32 | princess - prince - diana - royal - palace | 18 | 32_princess_prince_diana_royal | 
| 33 | climbing - climb - mountain - everest - ang | 18 | 33_climbing_climb_mountain_everest | 
| 34 | wedding - bieber - couple - together - love | 18 | 34_wedding_bieber_couple_together | 
| 35 | nhs - care - patient - hospital - health | 17 | 35_nhs_care_patient_hospital | 
| 36 | iraq - iraqi - isis - baghdad - kurdish | 16 | 36_iraq_iraqi_isis_baghdad | 
| 37 | cartel - drug - mexican - mexico - crack | 15 | 37_cartel_drug_mexican_mexico | 
| 38 | painting - picasso - art - artist - gogh | 15 | 38_painting_picasso_art_artist | 
| 39 | castro - zelaya - fidel - micheletti - president | 14 | 39_castro_zelaya_fidel_micheletti | 
| 40 | french - ford - traveller - southampton - taxi | 14 | 40_french_ford_traveller_southampton | 
| 41 | fire - florissant - bell - firefighter - burned | 14 | 41_fire_florissant_bell_firefighter | 
| 42 | fight - ali - heavyweight - pacquiao - title | 13 | 42_fight_ali_heavyweight_pacquiao | 
| 43 | fish - sea - jellyfish - manta - swell | 13 | 43_fish_sea_jellyfish_manta | 
| 44 | pope - francis - vatican - falkland - islands | 12 | 44_pope_francis_vatican_falkland | 
| 45 | gay - samesex - lgbt - marriage - state | 12 | 45_gay_samesex_lgbt_marriage | 
| 46 | castle - tower - building - brent - lego | 12 | 46_castle_tower_building_brent | 
| 47 | chinese - china - xinhua - chinas - communist | 12 | 47_chinese_china_xinhua_chinas | 
| 48 | delivery - customer - market - vacuum - coin | 10 | 48_delivery_customer_market_vacuum | 
| 49 | water - rain - storm - flooding - methane | 10 | 49_water_rain_storm_flooding |
  
</details>

## Training hyperparameters

* calculate_probabilities: True
* language: english
* low_memory: False
* min_topic_size: 10
* n_gram_range: (1, 1)
* nr_topics: None
* seed_topic_list: None
* top_n_words: 10
* verbose: False

## Framework versions

* Numpy: 1.22.4
* HDBSCAN: 0.8.33
* UMAP: 0.5.3
* Pandas: 1.5.3
* Scikit-Learn: 1.2.2
* Sentence-transformers: 2.2.2
* Transformers: 4.31.0
* Numba: 0.56.4
* Plotly: 5.13.1
* Python: 3.10.6