metadata

license: other
license_name: qwen
license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
datasets:
  - rubenroy/GammaCorpus-v2-5m
  - rubenroy/GammaCorpus-CoT-Math-170k
  - rubenroy/GammaCorpus-Fact-QA-450k
language:
  - en
base_model:
  - Qwen/Qwen2.5-72B-Instruct
pipeline_tag: text-generation
tags:
  - qwen2
  - chat
  - conversational
  - gilgamesh
  - gammacorpus

🔥 Gilgamesh 72B 🔥

Gilgamesh (GGM) 72B is a finetune of Alibaba's Qwen 2.5 72B Instruct model.

Model Details

Developed by: Ruben Roy
Funded by: The Ovantage Society
License: Qwen
Base Model: Qwen/Qwen2.5-72B-Instruct
Type: Causal Language Models
Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
Number of Parameters: 72.7B
Number of Paramaters (Non-Embedding): 70.0B
Number of Layers: 80
Number of Attention Heads (GQA): 64 for Q and 8 for KV

Qwen is licensed under the Qwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.

Datasets used

Gilgamesh 72B was trained on a mixture of specialised datasets designed for factual accuracy, mathematical capabilities and reasoning. The datasets used include:

GammaCorpus-v2-5m: A large 5 million line general-purpose dataset covering many topics to enhance broad knowledge and conversational abilities.
GammaCorpus-CoT-Math-170k: A dataset focused on Chain-of-Thought (CoT) reasoning in mathematics made to help the model improve step-by-step problem-solving.
GammaCorpus-Fact-QA-450k: A dataset containing factual question-answer pairs for enforcing some important current knowledge.

These datasets were all built and curated by me, however I thank my other team members at Ovantage Labs for assisting me in the creation and curation of these datasets.

Usage

You can test out Gilgamesh 72B with the example usage using the Transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "rubenroy/Gilgamesh-72B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "What are some largely unsolved questions in philosophy that still affect our lives today?"

messages = [
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=2048
)

generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

License

This model follows the Qwen License Agreement by Alibaba Cloud. See the LICENSE file for more information.

Special Thanks

A huge thanks to my fellow team members at Ovantage Labs for providing the H100s that made this training possible.