not-lain's picture
add library
6b34045 verified
|
raw
history blame
3.13 kB
---
datasets:
- stanfordnlp/imdb
language:
- en
library_name: swarmformer
---
# Model Card for SwarmFormer-Small
SwarmFormer-Small is a lightweight variant of the SwarmFormer architecture, designed for efficient text classification with minimal computational requirements.
## Model Details
### Model Description
Compact version of SwarmFormer with:
- Token embedding layer with dropout (0.3)
- Two SwarmFormer layers
- Mean pooling and classification
- Optimized for shorter sequences
- **Developed by**: Jordan Legg, Mikus Sturmanis, Takara.ai
- **Funded by**: Takara.ai
- **Shared by**: Takara.ai
- **Model type**: Hierarchical transformer
- **Language(s)**: English
- **License**: Not specified
- **Finetuned from model**: Trained from scratch
### Model Sources
- **Repository**: https://github.com/takara-ai/SwarmFormer
- **Paper**: Takara.ai Research
- **Demo**: Not available
## Uses
### Direct Use
- Text classification
- Sentiment analysis
- Resource-constrained environments
### Out-of-Scope Use
- Text generation
- Machine translation
- Tasks requiring >256 tokens
- Tasks requiring high precision
## Training Details
### Training Data
- Dataset: IMDB Movie Review
- Size: 50,000 samples
- Augmentation techniques applied
### Training Procedure
#### Model Architecture Details
1. **Token Embedding Layer**:
```python
- Embedding layer (vocab_size β†’ 128)
- Dropout rate: 0.3
```
2. **Local Swarm Aggregator**:
```python
- Input dropout: 0.3
- Local MLP:
- Linear(128 β†’ 128)
- GELU
- Dropout(0.3)
- Linear(128 β†’ 128)
- Gate network with GELU
```
3. **Clustering Mechanism**:
- Cluster size: 8 tokens
- Mean pooling per cluster
4. **Global Cluster Attention**:
```python
- Q/K/V projections: Linear(128 β†’ 128)
- Attention dropout: 0.3
```
#### Training Hyperparameters
- Embedding dimension: 128
- Number of layers: 2
- Local update steps: 3
- Cluster size: 8
- Sequence length: 256
- Batch size: 96
- Learning rate: 4.76 Γ— 10⁻⁴
- Weight decay: 0.0541
- Dropout: 0.30
## Evaluation
### Results
- Accuracy: 86.20%
- Precision: 83.46%
- Recall: 90.31%
- F1: 86.75%
- Inference time: 0.36s (25k samples)
- Mean batch latency: 3.67ms
- Throughput: 45k samples/s
- Peak memory: 8GB
## Technical Specifications
### Compute Infrastructure
- GPU: NVIDIA RTX 2080 Ti
- VRAM: 8GB minimum
- Training time: 3.6 minutes
### How to Get Started
```python
from swarmformer import SwarmFormerModel
model = SwarmFormerModel(
vocab_size=30000,
d_model=128,
seq_len=256,
cluster_size=8,
num_layers=2,
T_local=3
)
```
## Citation
```bibtex
@article{legg2025swarmformer,
title={SwarmFormer: Local-Global Hierarchical Attention via Swarming Token Representations},
author={Legg, Jordan and Sturmanis, Mikus and {Takara.ai}},
journal={Takara.ai Research},
year={2025},
url={https://takara.ai/papers/SwarmFormer-Local-Global-Hierarchical-Attention-via-Swarming-Token-Representations.pdf}
}
```
## Model Card Authors
Jordan Legg, Mikus Sturmanis, Takara.ai Research Team
## Model Card Contact
[email protected]