Text Generation
Safetensors
English
qwen2
chat
conversational
gilgamesh
gammacorpus

πŸ”₯ Gilgamesh 72B πŸ”₯

Gilgamesh (GGM) 72B is a finetune of Alibaba's Qwen 2.5 72B Instruct model.

GIlgamesh AI Art

Model Details

  • Developed by: Ruben Roy
  • Funded by: The Ovantage Society
  • License: Qwen
  • Base Model: Qwen/Qwen2.5-72B-Instruct
  • Type: Causal Language Models
  • Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
  • Number of Parameters: 72.7B
  • Number of Paramaters (Non-Embedding): 70.0B
  • Number of Layers: 80
  • Number of Attention Heads (GQA): 64 for Q and 8 for KV

Qwen is licensed under the Qwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.

Datasets used

Gilgamesh 72B was trained on a mixture of specialised datasets designed for factual accuracy, mathematical capabilities and reasoning. The datasets used include:

  • GammaCorpus-v2-5m: A large 5 million line general-purpose dataset covering many topics to enhance broad knowledge and conversational abilities.
  • GammaCorpus-CoT-Math-170k: A dataset focused on Chain-of-Thought (CoT) reasoning in mathematics made to help the model improve step-by-step problem-solving.
  • GammaCorpus-Fact-QA-450k: A dataset containing factual question-answer pairs for enforcing some important current knowledge.

These datasets were all built and curated by me, however I thank my other team members at Ovantage Labs for assisting me in the creation and curation of these datasets.

Usage

You can test out Gilgamesh 72B with the example usage using the Transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "rubenroy/Gilgamesh-72B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "What are some largely unsolved questions in philosophy that still affect our lives today?"

messages = [
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=2048
)

generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

License

This model follows the Qwen License Agreement by Alibaba Cloud. See the LICENSE file for more information.

Special Thanks

A huge thanks to my fellow team members at Ovantage Labs for providing the H100s that made this training possible.

Downloads last month
28
Safetensors
Model size
72.7B params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for rubenroy/Gilgamesh-72B

Base model

Qwen/Qwen2.5-72B
Finetuned
(32)
this model
Quantizations
1 model

Datasets used to train rubenroy/Gilgamesh-72B

Collection including rubenroy/Gilgamesh-72B