stecas commited on
Commit
5083808
·
verified ·
1 Parent(s): f2acc1a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -11,7 +11,9 @@ thumbnail: >-
11
 
12
  # Model Tampering Attacks Enable More Rigorous Evlauations of LLM Capabilities
13
 
14
- Zora Che*, Stephen Casper*, Robert Kirk, Anirudh Satheesh, Stewart Slocum, Lev E McKinney, Rohit Gandikota, Aidan Ewart, Domenic Rosati, Zichu Wu, Zikui Cai, Bilal Chughtai, Yarin Gal, Furong Huang, Dylan Hadfield-Menell
 
 
15
 
16
  Paper: COMING SOON
17
 
@@ -57,7 +59,7 @@ So we evaluated models using multiple benchmarks.
57
  * **WMDP-Bio** (Bio capabilities)
58
  * **MMLU** (General capabilities)
59
  * **AGIEval** (General capabilities)
60
- * **T-Bench** (General capabilities)
61
 
62
  We then calculated the unlearning score which gives a normalized measure of how much WMDP-bio capabilities go down disproportionately compared to general capabilities.
63
 
 
11
 
12
  # Model Tampering Attacks Enable More Rigorous Evlauations of LLM Capabilities
13
 
14
+ Zora Che*, Stephen Casper*,
15
+ Robert Kirk, Anirudh Satheesh, Stewart Slocum, Lev E McKinney, Rohit Gandikota, Aidan Ewart, Domenic Rosati, Zichu Wu, Zikui Cai, Bilal Chughtai,
16
+ Yarin Gal, Furong Huang, Dylan Hadfield-Menell
17
 
18
  Paper: COMING SOON
19
 
 
59
  * **WMDP-Bio** (Bio capabilities)
60
  * **MMLU** (General capabilities)
61
  * **AGIEval** (General capabilities)
62
+ * **MT-Bench** (General capabilities)
63
 
64
  We then calculated the unlearning score which gives a normalized measure of how much WMDP-bio capabilities go down disproportionately compared to general capabilities.
65