---
language:
- en
- fr
- de
- es
- it
- pt
- zh
- ja
- ru
- ko
license: other
license_name: mrl
license_link: https://mistral.ai/licenses/MRL-0.1.md
base_model:
- mistralai/Ministral-8B-Instruct-2410
library_name: transformers
tags:
- reasoning
- hybrid
- gemini-2.0
- deepseek-r1
- synthetic data
- unsloth
- trl
- hybrid
---

# **DeepNeo: A hybrid model with precision and power**
## **Overview**
DeepNeo is a hybrid model that can be used like any other LLM, but DeepNeo has a mode that is inspired by [NousResearch/DeepHermes-3-Llama-3-8B-Preview](https://huggingface.co/NousResearch/DeepHermes-3-Llama-3-8B-Preview), which allows the model to activate a CoT-like response. This is done by toggling the system prompt. Unlike [NousResearch/DeepHermes-3-Llama-3-8B-Preview](https://huggingface.co/NousResearch/DeepHermes-3-Llama-3-8B-Preview), DeepNeo is slightly more flexible in its sizes. We have introduced an 8B and 12B model; both of them are based on **Mistral AI's models**
## **Model Details**
## DeepNeo 8B Key features
- Developed by: [Spestly (Open-Neo)](https://x.com/Spestly) & [Kazex (Open-Neo)](https://x.com/32GIGABYTES_YT)
- Released under the **Mistral Research License**, reach out to **Mistral AI** for a commercial license
- Trained with a **128k context window** with **interleaved sliding-window attention**
- Trained on a large proportion of **multilingual and synthetic reasoning data**
- Supports **function calling**
| Feature | Value |
|:---------------------:|:--------------------:|
| **Architecture** | Dense Transformer |
| **Parameters** | 8,019,808,256 |
| **Layers** | 36 |
| **Heads** | 32 |
| **Dim** | 4096 |
| **KV Heads (GQA)** | 8 |
| **Hidden Dim** | 12288 |
| **Head Dim** | 128 |
| **Vocab Size** | 131,072 |
| **Context Length** | 128k |
| **Attention Pattern** | Ragged (128k,32k,32k,32k) |
## **Usage**
### **Intuitive mode**
By default, this mode is activated, and you do not need to change anything. This means you are allowed to use any system prompt! We have given an example below.
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("")
model = AutoModelForCausalLM.from_pretrained(
"open-neo/DeepNeo-1-8B-Preview",
torch_dtype=torch.float16,
device_map="auto"
)
messages = [
{"role": "user", "content": "What are the most interesting things to do in Paris?"}
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, do_sample=True)
print(f"Generated Tokens: {generated_ids.shape[-1]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(f"Response: {response}")
```
### **Reasoning mode**
To activate this mode, we need to do some extra steps. Almost all system instructions should work as long as they mention `` and ``. An example of this system prompt is given below. Please note that it may require tweaking for your specific use case.
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("")
model = AutoModelForCausalLM.from_pretrained(
"open-neo/DeepNeo-1-8B-Preview",
torch_dtype=torch.float16,
device_map="auto"
)
messages = [
{"role": "system", "content": "You are a deep-thinking AI model. You must put your thoughts in the tags and your output in the