Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,9 @@ thumbnail: >-
|
|
11 |
|
12 |
# Model Tampering Attacks Enable More Rigorous Evlauations of LLM Capabilities
|
13 |
|
14 |
-
Zora Che*, Stephen Casper*,
|
|
|
|
|
15 |
|
16 |
Paper: COMING SOON
|
17 |
|
@@ -57,7 +59,7 @@ So we evaluated models using multiple benchmarks.
|
|
57 |
* **WMDP-Bio** (Bio capabilities)
|
58 |
* **MMLU** (General capabilities)
|
59 |
* **AGIEval** (General capabilities)
|
60 |
-
* **
|
61 |
|
62 |
We then calculated the unlearning score which gives a normalized measure of how much WMDP-bio capabilities go down disproportionately compared to general capabilities.
|
63 |
|
|
|
11 |
|
12 |
# Model Tampering Attacks Enable More Rigorous Evlauations of LLM Capabilities
|
13 |
|
14 |
+
Zora Che*, Stephen Casper*,
|
15 |
+
Robert Kirk, Anirudh Satheesh, Stewart Slocum, Lev E McKinney, Rohit Gandikota, Aidan Ewart, Domenic Rosati, Zichu Wu, Zikui Cai, Bilal Chughtai,
|
16 |
+
Yarin Gal, Furong Huang, Dylan Hadfield-Menell
|
17 |
|
18 |
Paper: COMING SOON
|
19 |
|
|
|
59 |
* **WMDP-Bio** (Bio capabilities)
|
60 |
* **MMLU** (General capabilities)
|
61 |
* **AGIEval** (General capabilities)
|
62 |
+
* **MT-Bench** (General capabilities)
|
63 |
|
64 |
We then calculated the unlearning score which gives a normalized measure of how much WMDP-bio capabilities go down disproportionately compared to general capabilities.
|
65 |
|