-
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Paper • 2203.05482 • Published • 6 -
Diverse Weight Averaging for Out-of-Distribution Generalization
Paper • 2205.09739 • Published • 1 -
Fusing finetuned models for better pretraining
Paper • 2204.03044 • Published • 5 -
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs
Paper • 2309.07311 • Published • 3
Niels Horn
nilq
AI & ML interests
Natural language understanding, synthetic emotional speech, mechanistic interpretability.
Recent Activity
upvoted
a
collection
30 days ago
Cosmos
liked
a dataset
about 2 months ago
google/MusicCaps
updated
a collection
4 months ago
Dynamics of Transformer Language Model Features
Organizations
Collections
4
Papers
1
models
16
![](https://cdn-avatars.huggingface.co/v1/production/uploads/63ca9515145e6c9716fbf376/BXt9rJ5nIisijzFVLymai.png)
nilq/baby-python-mistral-1L-tiny-TinyStories-ft
Text Generation
•
Updated
•
146
•
1
![](https://cdn-avatars.huggingface.co/v1/production/uploads/63ca9515145e6c9716fbf376/BXt9rJ5nIisijzFVLymai.png)
nilq/baby-python-mistral-1L-tiny-lua-ft
Text Generation
•
Updated
•
138
![](https://cdn-avatars.huggingface.co/v1/production/uploads/63ca9515145e6c9716fbf376/BXt9rJ5nIisijzFVLymai.png)
nilq/baby-python-1L-mistral-lua-stories-slerp
Text Generation
•
Updated
•
168
![](https://cdn-avatars.huggingface.co/v1/production/uploads/63ca9515145e6c9716fbf376/BXt9rJ5nIisijzFVLymai.png)
nilq/baby-python-mistral-1L-tiny-base
Text Generation
•
Updated
•
138
![](https://cdn-avatars.huggingface.co/v1/production/uploads/63ca9515145e6c9716fbf376/BXt9rJ5nIisijzFVLymai.png)
nilq/lua-stories-slerp-mistral-1L-tiny
Text Generation
•
Updated
•
92
![](https://cdn-avatars.huggingface.co/v1/production/uploads/63ca9515145e6c9716fbf376/BXt9rJ5nIisijzFVLymai.png)
nilq/lua-stories-slerp-mistral-2L-tiny
Text Generation
•
Updated
•
93
![](https://cdn-avatars.huggingface.co/v1/production/uploads/63ca9515145e6c9716fbf376/BXt9rJ5nIisijzFVLymai.png)
nilq/mistral-2L-tiny
Text Generation
•
Updated
•
10
![](https://cdn-avatars.huggingface.co/v1/production/uploads/63ca9515145e6c9716fbf376/BXt9rJ5nIisijzFVLymai.png)
nilq/lua-stories-linear-mistral-1L-tiny
Text Generation
•
Updated
•
3
![](https://cdn-avatars.huggingface.co/v1/production/uploads/63ca9515145e6c9716fbf376/BXt9rJ5nIisijzFVLymai.png)
nilq/python-mistral-1L-mini
Text Generation
•
Updated
•
119
![](https://cdn-avatars.huggingface.co/v1/production/uploads/63ca9515145e6c9716fbf376/BXt9rJ5nIisijzFVLymai.png)
nilq/mistral-1L-tiny
Text Generation
•
Updated
•
258
•
5
datasets
9
nilq/baby-python-and-tiny-stories-and-lua
Viewer
•
Updated
•
12.3M
•
38
nilq/baby-python-and-lua
Viewer
•
Updated
•
12.3M
•
46
•
1
nilq/baby-python-and-tiny-stories
Viewer
•
Updated
•
13.9M
•
60
nilq/python-and-tiny-stories
Updated
•
7
nilq/baby-python
Viewer
•
Updated
•
11.7M
•
52
•
1
nilq/small-lua-stack
Viewer
•
Updated
•
559k
•
117
•
2
nilq/small-python-stack
Viewer
•
Updated
•
2.59M
•
72
nilq/babylm-100M
Viewer
•
Updated
•
12.7M
•
44
nilq/babylm-10M
Viewer
•
Updated
•
3.14M
•
71