Thanks for your effort in energy efficiency. You worked up my curiosity!
Why do smolLM-135m and smolLm-1.7B nearly have the same score besides a 10 times model size difference? Does the identical context size mostly cause it?
Could you please enable encoder-decoder models? They should be in theory more efficient because the input has to be encoded only once and can be reused in every decoding step.
Kalle Hilsenbek
Bachstelze
AI & ML interests
Combining BERT with instructions for explainable AI: gitlab.com/Bachstelze/instructionbert
Recent Activity
commented on
an
article
3 days ago
Announcing AI Energy Score Ratings
commented on
a paper
16 days ago
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
commented on
an
article
16 days ago
Is Attention Interpretable in Transformer-Based Large Language Models? Let’s Unpack the Hype
Organizations
None yet
Bachstelze's activity
commented on
Announcing AI Energy Score Ratings
3 days ago
commented on
Is Attention Interpretable in Transformer-Based Large Language Models? Let’s Unpack the Hype
16 days ago
Good write-up, though it is missing the dominant attention sink in current decoder-only models:
https://colab.research.google.com/drive/1Fcgug4a6rv9F-Wej0rNveiM_SMNZOtrr?usp=sharing
upvoted
an
article
16 days ago
Article
Is Attention Interpretable in Transformer-Based Large Language Models? Let’s Unpack the Hype
By
•
•
4ModernBART wen?
6
#38 opened about 1 month ago
by
Fizzarolli
Goldfish model
#5 opened about 2 months ago
by
Bachstelze
Adding Evaluation Results
1
#6 opened 3 months ago
by
leaderboard-pr-bot
upvoted
a
paper
4 months ago
Benchmark results
#17 opened 4 months ago
by
Bachstelze
upvoted
a
paper
5 months ago
upvoted
a
paper
5 months ago
Readme
#1 opened 5 months ago
by
Bachstelze