Text Generation
Safetensors
English
qwen2
chat
conversational
gilgamesh
gammacorpus
Gilgamesh-72B / README.md
rubenroy's picture
Update README.md
4a54b3a verified
metadata
license: other
license_name: qwen
license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
datasets:
  - rubenroy/GammaCorpus-v2-5m
  - rubenroy/GammaCorpus-CoT-Math-170k
  - rubenroy/GammaCorpus-Fact-QA-450k
language:
  - en
base_model:
  - Qwen/Qwen2.5-72B-Instruct
pipeline_tag: text-generation
tags:
  - qwen2
  - chat
  - conversational
  - gilgamesh
  - gammacorpus

πŸ”₯ Gilgamesh 72B πŸ”₯

Gilgamesh (GGM) 72B is a finetune of Alibaba's Qwen 2.5 72B Instruct model.

GIlgamesh AI Art

Model Details

  • Developed by: Ruben Roy
  • Funded by: The Ovantage Society
  • License: Qwen
  • Base Model: Qwen/Qwen2.5-72B-Instruct
  • Type: Causal Language Models
  • Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
  • Number of Parameters: 72.7B
  • Number of Paramaters (Non-Embedding): 70.0B
  • Number of Layers: 80
  • Number of Attention Heads (GQA): 64 for Q and 8 for KV

Qwen is licensed under the Qwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.

Datasets used

Gilgamesh 72B was trained on a mixture of specialised datasets designed for factual accuracy, mathematical capabilities and reasoning. The datasets used include:

  • GammaCorpus-v2-5m: A large 5 million line general-purpose dataset covering many topics to enhance broad knowledge and conversational abilities.
  • GammaCorpus-CoT-Math-170k: A dataset focused on Chain-of-Thought (CoT) reasoning in mathematics made to help the model improve step-by-step problem-solving.
  • GammaCorpus-Fact-QA-450k: A dataset containing factual question-answer pairs for enforcing some important current knowledge.

These datasets were all built and curated by me, however I thank my other team members at Ovantage Labs for assisting me in the creation and curation of these datasets.

Usage

You can test out Gilgamesh 72B with the example usage using the Transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "rubenroy/Gilgamesh-72B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "What are some largely unsolved questions in philosophy that still affect our lives today?"

messages = [
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=2048
)

generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

License

This model follows the Qwen License Agreement by Alibaba Cloud. See the LICENSE file for more information.

Special Thanks

A huge thanks to my fellow team members at Ovantage Labs for providing the H100s that made this training possible.