QuantFactory
/

llama3.1-s-base-v0.2-GGUF

sound language model

Inference Endpoints

Model card Files Files and versions Community

llama3.1-s-base-v0.2-GGUF / README.md

aashish1904's picture

Upload README.md with huggingface_hub

8fa45bd verified 6 months ago

|

history blame contribute delete

3.05 kB


	---

	datasets:
	- homebrewltd/instruction-speech-whispervq-v2
	language:
	- en
	license: apache-2.0
	tags:
	- sound language model

	---

	![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)

	# QuantFactory/llama3.1-s-base-v0.2-GGUF
	This is quantized version of [homebrewltd/llama3.1-s-base-v0.2](https://huggingface.co/homebrewltd/llama3.1-s-base-v0.2) created using llama.cpp

	# Original Model Card


	## Model Details

	We have developed and released the family [llama3s](https://huggingface.co/collections/homebrew-research/llama3-s-669df2139f0576abc6eb7405). This family is natively understanding audio and text input.

	We continual pretrain on the expanded vocabulary [homebrewltd/llama3.1-s-whispervq-init](https://huggingface.co/homebrewltd/llama3.1-s-whispervq-init) with 900M tokens from [homebrewltd/raw-speech-whispervq-v1](https://huggingface.co/datasets/homebrewltd/raw-speech-whispervq-v1) dataset.

	Model developers Homebrew Research.

	Input Text and sound.

	Output Text.

	Model Architecture Llama-3.

	Language(s): English.

	## Intended Use

	Intended Use Cases This family is primarily intended for research applications. This version aims to further improve the LLM on sound understanding capabilities.

	Out-of-scope The use of llama3-s in any manner that violates applicable laws or regulations is strictly prohibited.

	## Training process
	Training Metrics Image: Below is a snapshot of the training loss curve visualized.

	![train_log](https://cdn-uploads.huggingface.co/production/uploads/65713d70f56f9538679e5a56/iAbaP7SCoyZ8tz2hyK8k0.png)

	### Hardware

	GPU Configuration: Cluster of 10x NVIDIA A6000-48GB.

	GPU Usage:
	- Continual Training: 30 hours.

	### Training Arguments

	We utilize [torchtune](https://github.com/pytorch/torchtune) library for the latest FSDP2 training code implementation.

	\| Parameter \| Continual Training \|
	\|----------------------------\|-------------------------\|
	\| Epoch \| 1 \|
	\| Global batch size \| 480 \|
	\| Learning Rate \| 2e-4 \|
	\| Learning Scheduler \| Cosine with warmup \|
	\| Optimizer \| AdamW fused \|
	\| Warmup Steps \| 50 \|
	\| Weight Decay \| 0.01 \|
	\| Max Sequence Length \| 512 \|


	## Citation Information

	BibTeX:

	```
	@article{Llama3-S: Sound Instruction Language Model 2024,
	title={Llama3-S},
	author={Homebrew Research},
	year=2024,
	month=August},
	url={https://huggingface.co/homebrewltd/llama3.1-s-2024-08-15}
	```

	## Acknowledgement

	- [WhisperSpeech](https://github.com/collabora/WhisperSpeech)

	- [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)