|
--- |
|
datasets: |
|
- stanfordnlp/imdb |
|
language: |
|
- en |
|
library_name: swarmformer |
|
--- |
|
# Model Card for SwarmFormer-Small |
|
|
|
SwarmFormer-Small is a lightweight variant of the SwarmFormer architecture, designed for efficient text classification with minimal computational requirements. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
Compact version of SwarmFormer with: |
|
- Token embedding layer with dropout (0.3) |
|
- Two SwarmFormer layers |
|
- Mean pooling and classification |
|
- Optimized for shorter sequences |
|
|
|
- **Developed by**: Jordan Legg, Mikus Sturmanis, Takara.ai |
|
- **Funded by**: Takara.ai |
|
- **Shared by**: Takara.ai |
|
- **Model type**: Hierarchical transformer |
|
- **Language(s)**: English |
|
- **License**: Not specified |
|
- **Finetuned from model**: Trained from scratch |
|
|
|
### Model Sources |
|
- **Repository**: https://github.com/takara-ai/SwarmFormer |
|
- **Paper**: Takara.ai Research |
|
- **Demo**: Not available |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
- Text classification |
|
- Sentiment analysis |
|
- Resource-constrained environments |
|
|
|
### Out-of-Scope Use |
|
- Text generation |
|
- Machine translation |
|
- Tasks requiring >256 tokens |
|
- Tasks requiring high precision |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
- Dataset: IMDB Movie Review |
|
- Size: 50,000 samples |
|
- Augmentation techniques applied |
|
|
|
### Training Procedure |
|
|
|
#### Model Architecture Details |
|
1. **Token Embedding Layer**: |
|
```python |
|
- Embedding layer (vocab_size β 128) |
|
- Dropout rate: 0.3 |
|
``` |
|
|
|
2. **Local Swarm Aggregator**: |
|
```python |
|
- Input dropout: 0.3 |
|
- Local MLP: |
|
- Linear(128 β 128) |
|
- GELU |
|
- Dropout(0.3) |
|
- Linear(128 β 128) |
|
- Gate network with GELU |
|
``` |
|
|
|
3. **Clustering Mechanism**: |
|
- Cluster size: 8 tokens |
|
- Mean pooling per cluster |
|
|
|
4. **Global Cluster Attention**: |
|
```python |
|
- Q/K/V projections: Linear(128 β 128) |
|
- Attention dropout: 0.3 |
|
``` |
|
|
|
#### Training Hyperparameters |
|
- Embedding dimension: 128 |
|
- Number of layers: 2 |
|
- Local update steps: 3 |
|
- Cluster size: 8 |
|
- Sequence length: 256 |
|
- Batch size: 96 |
|
- Learning rate: 4.76 Γ 10β»β΄ |
|
- Weight decay: 0.0541 |
|
- Dropout: 0.30 |
|
|
|
## Evaluation |
|
|
|
### Results |
|
- Accuracy: 86.20% |
|
- Precision: 83.46% |
|
- Recall: 90.31% |
|
- F1: 86.75% |
|
- Inference time: 0.36s (25k samples) |
|
- Mean batch latency: 3.67ms |
|
- Throughput: 45k samples/s |
|
- Peak memory: 8GB |
|
|
|
## Technical Specifications |
|
|
|
### Compute Infrastructure |
|
- GPU: NVIDIA RTX 2080 Ti |
|
- VRAM: 8GB minimum |
|
- Training time: 3.6 minutes |
|
|
|
### How to Get Started |
|
```python |
|
from swarmformer import SwarmFormerModel |
|
|
|
model = SwarmFormerModel( |
|
vocab_size=30000, |
|
d_model=128, |
|
seq_len=256, |
|
cluster_size=8, |
|
num_layers=2, |
|
T_local=3 |
|
) |
|
``` |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@article{legg2025swarmformer, |
|
title={SwarmFormer: Local-Global Hierarchical Attention via Swarming Token Representations}, |
|
author={Legg, Jordan and Sturmanis, Mikus and {Takara.ai}}, |
|
journal={Takara.ai Research}, |
|
year={2025}, |
|
url={https://takara.ai/papers/SwarmFormer-Local-Global-Hierarchical-Attention-via-Swarming-Token-Representations.pdf} |
|
} |
|
``` |
|
|
|
## Model Card Authors |
|
Jordan Legg, Mikus Sturmanis, Takara.ai Research Team |
|
|
|
## Model Card Contact |
|
[email protected] |