metadata

license: llama2
base_model:
  - meta-llama/CodeLlama-13b-hf
base_model_relation: adapter
tags:
  - QML
  - Code-Completion

Model Overview

Description:

CodeLlama-13B-QML is a large language model customized by the Qt Company for Fill-In-The-Middle code completion tasks in the QML programming language, especially for Qt Quick Controls compliant with Qt 6 releases. The CodeLlama-13B-QML model is designed for companies and individuals that want to self-host their LLM for HMI (Human Machine Interface) software development instead of relying on third-party hosted LLMs.

This model reaches a score of 79% on the QML100 Fill-In-the-Middle code completion benchmark for Qt 6-compliant code. In comparison, Claude 3.5 Sonnet scored 68%, the base CodeLlama-13B scored 66%, and GPT-4o scored 62%. This model was fine-tuned based on raw data from over 4000 human-created QML code snippets using the LoRa fine-tuning method. CodeLlama-13B-QML is not optimised for the creation of Qt5-release compliant, C++, or Python code.

Terms of use:

By accessing this model, you are agreeing to the Llama 2 terms and conditions of the license, acceptable use policy and Meta’s privacy policy. By using this model, you are furthermore agreeing to the Qt AI Model terms & conditions.

Usage:

CodeLlama-13B-QML is a medium-sized Language Model that requires significant computing resources to perform with inference (response) times suitable for automatic code completion. Therefore, it should be used with a GPU accelerator, either in the cloud environment such as AWS, Google Cloud, Microsoft Azure, or locally.

Large Language Models, including CodeLlama-13B-QML, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional safety guardrails as required. Developers are expected to deploy system safeguards when building AI systems.

The repository contains multiple files with adapters. Please notice that .gguf file, that is used by Ollama, is quantised (q_4 quant type).

How to run CodeLlama-13B-QML in cloud deployment:

The configuration depends on the chosen cloud technology.

Running a CodeLlama-13b-QML in the cloud requires working with Docker and vLLM for optimal performance. Make sure all required dependencies are installed (transformers, accelerate and peft modules). Use bfloat16 precision. The setup leverages the base model from Hugging Face (requiring an access token) combined with adapter weights from the repository. Using vLLM enables efficient inference with an OpenAI-compatible API endpoint, making integration straightforward. vLLM serves as a highly optimized backend that implements request batching and queuing mechanisms, providing excellent serving optimization. The docker container should be run on an instance with GPU accelerator. The configuration has been thoroughly tested on Ubuntu 22.04 LTS running NVIDIA driver with A100 80GB GPUs, demonstrating stable and efficient performance.

How to run CodeLlama-13B-QML in ollama:

1. Install ollama

https://ollama.com/download

2. Clone the model repository

3. Open the terminal and go to the repository

4. Build the model in ollama

ollama create codellama:13b-code-qml -f Modelfile

The model's name must be exactly as above if one wants to use the model in the Qt Creator

5. Run the model

ollama run codellama:13b-code-qml

You can start writing prompts in the terminal or send curl requests now.

Here is a curl request example:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "codellama:13b-code-qml",
  "Prompt": "<SUF>\n    title: qsTr(\"Hello World\")\n}<PRE>import QtQuick\n\nWindow {\n    width: 640\n    height: 480\n    visible: true\n<MID>",
  "stream": false,
  "temperature": 0,
  "top_p": 0.9,
  "repeat_penalty": 1.1,
  "num_predict": 300,
  "stop": ["<SUF>", "<PRE>", "</PRE>", "</SUF>", "< EOT >", "\\end", "<MID>", "</MID>", "##"]
}'

The prompt format:

"<SUF>{suffix}<PRE>{prefix}<MID>"

If there is no suffix, please use:

"<PRE>{prefix}<MID>"

Model Version:

v1.0

QtGroup
/

CodeLlama-13B-QML