takara-ai
/

SwarmFormer-Sentiment-Small

Model card Files Files and versions Community

SwarmFormer-Sentiment-Small / README.md

not-lain's picture

add library

6b34045 verified 12 days ago

|

3.13 kB

	---
	datasets:
	- stanfordnlp/imdb
	language:
	- en
	library_name: swarmformer
	---
	# Model Card for SwarmFormer-Small

	SwarmFormer-Small is a lightweight variant of the SwarmFormer architecture, designed for efficient text classification with minimal computational requirements.

	## Model Details

	### Model Description
	Compact version of SwarmFormer with:
	- Token embedding layer with dropout (0.3)
	- Two SwarmFormer layers
	- Mean pooling and classification
	- Optimized for shorter sequences

	- Developed by: Jordan Legg, Mikus Sturmanis, Takara.ai
	- Funded by: Takara.ai
	- Shared by: Takara.ai
	- Model type: Hierarchical transformer
	- Language(s): English
	- License: Not specified
	- Finetuned from model: Trained from scratch

	### Model Sources
	- Repository: https://github.com/takara-ai/SwarmFormer
	- Paper: Takara.ai Research
	- Demo: Not available

	## Uses

	### Direct Use
	- Text classification
	- Sentiment analysis
	- Resource-constrained environments

	### Out-of-Scope Use
	- Text generation
	- Machine translation
	- Tasks requiring >256 tokens
	- Tasks requiring high precision

	## Training Details

	### Training Data
	- Dataset: IMDB Movie Review
	- Size: 50,000 samples
	- Augmentation techniques applied

	### Training Procedure

	#### Model Architecture Details
	1. Token Embedding Layer:
	```python
	- Embedding layer (vocab_size → 128)
	- Dropout rate: 0.3
	```

	2. Local Swarm Aggregator:
	```python
	- Input dropout: 0.3
	- Local MLP:
	- Linear(128 → 128)
	- GELU
	- Dropout(0.3)
	- Linear(128 → 128)
	- Gate network with GELU
	```

	3. Clustering Mechanism:
	- Cluster size: 8 tokens
	- Mean pooling per cluster

	4. Global Cluster Attention:
	```python
	- Q/K/V projections: Linear(128 → 128)
	- Attention dropout: 0.3
	```

	#### Training Hyperparameters
	- Embedding dimension: 128
	- Number of layers: 2
	- Local update steps: 3
	- Cluster size: 8
	- Sequence length: 256
	- Batch size: 96
	- Learning rate: 4.76 × 10⁻⁴
	- Weight decay: 0.0541
	- Dropout: 0.30

	## Evaluation

	### Results
	- Accuracy: 86.20%
	- Precision: 83.46%
	- Recall: 90.31%
	- F1: 86.75%
	- Inference time: 0.36s (25k samples)
	- Mean batch latency: 3.67ms
	- Throughput: 45k samples/s
	- Peak memory: 8GB

	## Technical Specifications

	### Compute Infrastructure
	- GPU: NVIDIA RTX 2080 Ti
	- VRAM: 8GB minimum
	- Training time: 3.6 minutes

	### How to Get Started
	```python
	from swarmformer import SwarmFormerModel

	model = SwarmFormerModel(
	vocab_size=30000,
	d_model=128,
	seq_len=256,
	cluster_size=8,
	num_layers=2,
	T_local=3
	)
	```

	## Citation

	```bibtex
	@article{legg2025swarmformer,
	title={SwarmFormer: Local-Global Hierarchical Attention via Swarming Token Representations},
	author={Legg, Jordan and Sturmanis, Mikus and {Takara.ai}},
	journal={Takara.ai Research},
	year={2025},
	url={https://takara.ai/papers/SwarmFormer-Local-Global-Hierarchical-Attention-via-Swarming-Token-Representations.pdf}
	}
	```

	## Model Card Authors
	Jordan Legg, Mikus Sturmanis, Takara.ai Research Team

	## Model Card Contact
	[email protected]