π₯ Gilgamesh 72B π₯
Gilgamesh (GGM) 72B is a finetune of Alibaba's Qwen 2.5 72B Instruct model.
Model Details
- Developed by: Ruben Roy
- Funded by: The Ovantage Society
- License: Qwen
- Base Model: Qwen/Qwen2.5-72B-Instruct
- Type: Causal Language Models
- Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
- Number of Parameters: 72.7B
- Number of Paramaters (Non-Embedding): 70.0B
- Number of Layers: 80
- Number of Attention Heads (GQA): 64 for Q and 8 for KV
Qwen is licensed under the Qwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.
Datasets used
Gilgamesh 72B was trained on a mixture of specialised datasets designed for factual accuracy, mathematical capabilities and reasoning. The datasets used include:
- GammaCorpus-v2-5m: A large 5 million line general-purpose dataset covering many topics to enhance broad knowledge and conversational abilities.
- GammaCorpus-CoT-Math-170k: A dataset focused on Chain-of-Thought (CoT) reasoning in mathematics made to help the model improve step-by-step problem-solving.
- GammaCorpus-Fact-QA-450k: A dataset containing factual question-answer pairs for enforcing some important current knowledge.
These datasets were all built and curated by me, however I thank my other team members at Ovantage Labs for assisting me in the creation and curation of these datasets.
Usage
You can test out Gilgamesh 72B with the example usage using the Transformers library:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "rubenroy/Gilgamesh-72B"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "What are some largely unsolved questions in philosophy that still affect our lives today?"
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=2048
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
License
This model follows the Qwen License Agreement by Alibaba Cloud. See the LICENSE file for more information.
Special Thanks
A huge thanks to my fellow team members at Ovantage Labs for providing the H100s that made this training possible.
- Downloads last month
- 28