---
language:
- en
- fr
- de
- es
- it
- pt
- zh
- ja
- ru
- ko
license: other
license_name: mrl
license_link: https://mistral.ai/licenses/MRL-0.1.md
base_model:
- mistralai/Ministral-8B-Instruct-2410
library_name: transformers
tags:
- reasoning
- hybrid
- gemini-2.0
- deepseek-r1
- synthetic data
- unsloth
- trl
- hybrid
---
![Header](./DeepNeo-Banner.png)

# **DeepNeo: A hybrid model with precision and power**

## **Overview**
DeepNeo is a hybrid model that can be used like any other LLM, but DeepNeo has a mode that is inspired by [NousResearch/DeepHermes-3-Llama-3-8B-Preview](https://huggingface.co/NousResearch/DeepHermes-3-Llama-3-8B-Preview), which allows the model to activate a CoT-like response. This is done by toggling the system prompt. Unlike [NousResearch/DeepHermes-3-Llama-3-8B-Preview](https://huggingface.co/NousResearch/DeepHermes-3-Llama-3-8B-Preview), DeepNeo is slightly more flexible in its sizes. We have introduced an 8B and 12B model; both of them are based on **Mistral AI's models**  

## **Model Details**

## DeepNeo 8B Key features
- Developed by: [Spestly (Open-Neo)](https://x.com/Spestly) & [Kazex (Open-Neo)](https://x.com/32GIGABYTES_YT)
- Released under the **Mistral Research License**, reach out to **Mistral AI** for a commercial license
- Trained with a **128k context window** with **interleaved sliding-window attention**
- Trained on a large proportion of **multilingual and synthetic reasoning data**
- Supports **function calling**
  
| Feature               | Value                |
|:---------------------:|:--------------------:|
| **Architecture**      | Dense Transformer    |
| **Parameters**        | 8,019,808,256        |
| **Layers**            | 36                   |
| **Heads**             | 32                   |
| **Dim**               | 4096                 |
| **KV Heads (GQA)**    | 8                    |
| **Hidden Dim**        | 12288                |
| **Head Dim**          | 128                  |
| **Vocab Size**        | 131,072              |
| **Context Length**    | 128k                 |
| **Attention Pattern** | Ragged (128k,32k,32k,32k) |

## **Usage**

### **Intuitive mode**
By default, this mode is activated, and you do not need to change anything. This means you are allowed to use any system prompt! We have given an example below.

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("")
model = AutoModelForCausalLM.from_pretrained(
    "open-neo/DeepNeo-1-8B-Preview",
    torch_dtype=torch.float16,
    device_map="auto"
)

messages = [
    {"role": "user", "content": "What are the most interesting things to do in Paris?"}
]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, do_sample=True)

print(f"Generated Tokens: {generated_ids.shape[-1]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(f"Response: {response}")

```

### **Reasoning mode**

To activate this mode, we need to do some extra steps. Almost all system instructions should work as long as they mention `<Thought></Thought>` and `<Output></Output>`. An example of this system prompt is given below. Please note that it may require tweaking for your specific use case.

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("")
model = AutoModelForCausalLM.from_pretrained(
    "open-neo/DeepNeo-1-8B-Preview",
    torch_dtype=torch.float16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are a deep-thinking AI model. You must put your thoughts in the <Thought> tags and your output in the <Output> tags."},
    {"role": "user", "content": "What are the most interesting things to do in Paris?"}
]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, do_sample=True)

print(f"Generated Tokens: {generated_ids.shape[-1]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(f"Response: {response}")

```

## **Citations**

```bibtex
@misc{deepneo-1,
      title={DeepNeo: A hybrid model with precision and power}, 
      author={Aayan Mishra and Krish Thumar},
      howpublished={https://huggingface.co/collections/open-neo/deepneo-1-67aea4c0f086ab0f70ed5720},
      year={2025}
}
```