--- license: apache-2.0 datasets: - krasserm/gba-trajectories library_name: peft --- A planner LLM [fine-tuned on synthetic trajectories](https://krasserm.github.io/2024/05/31/planner-fine-tuning/) from an agent simulation. It can be used in [ReAct](https://arxiv.org/abs/2210.03629)-style LLM agents where [planning is separated from function calling](https://krasserm.github.io/2024/03/06/modular-agent/). Trajectory generation and planner fine-tuning are described in the [bot-with-plan](https://github.com/krasserm/bot-with-plan) project. The planner has been fine-tuned on the [krasserm/gba-trajectories](https://huggingface.co/datasets/krasserm/gba-trajectories) dataset with a [loss over the full sequence](https://github.com/krasserm/bot-with-plan/tree/master/train#gba-planner-7b-v02) (i.e. over prompt and completion). An 8-bit quantized GGUF version of this model is available at [krasserm/gba-planner-7B-v0.2-GGUF](https://huggingface.co/krasserm/gba-planner-7B-completion-only-v0.2-GGUF) ## Usage example Load the model and the tokenizer. ```python import json import torch from transformers import ( AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, GenerationConfig, ) device = "cuda:0" repo_id = "krasserm/gba-planner-7B-v0.2" bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16, ) tokenizer = AutoTokenizer.from_pretrained(repo_id) model = AutoModelForCausalLM.from_pretrained( repo_id, quantization_config=bnb_config, device_map=device, ) ``` Define a prompt that contains the *user request* and past task-observation pairs of the current trajectory (*context information*). ````python prompt = """User request: ``` Get the average Rotten Tomatoes scores for DreamWorks' last 5 movies. ``` Context information: ``` Task: Find the last 5 movies released by DreamWorks. Result: The last five movies released by DreamWorks are "The Bad Guys" (2022), "Boss Baby: Family Business" (2021), "Trolls World Tour" (2020), "Abominable" (2019), and "How to Train Your Dragon: The Hidden World" (2019). Task: Search the internet for the Rotten Tomatoes score of "The Bad Guys" (2022) Result: The Rotten Tomatoes score of "The Bad Guys" (2022) is 88%. ``` Plan the next step.""" ```` Then generate a plan for the next step in the trajectory. ```python instruct_template = "[INST] {prompt} [/INST]" instruct_prompt = instruct_template.format(prompt=prompt) input_ids = tokenizer(instruct_prompt, return_tensors="pt", max_length=1024, truncation=True)["input_ids"] input_ids = input_ids.to("cuda:0") generation_config = GenerationConfig( max_new_tokens=512, do_sample=False, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id, ) with torch.no_grad(): result = model.generate(input_ids, generation_config=generation_config) result = result[:, input_ids.shape[1] :] decoded = tokenizer.batch_decode(result, skip_special_tokens=True) decoded_dict = json.loads(decoded[0]) print(json.dumps(decoded_dict, indent=2)) ``` ```json { "context_information_summary": "The last five movies released by DreamWorks are \"The Bad Guys\" (2022), \"Boss Baby: Family Business\" (2021), \"Trolls World Tour\" (2020), \"Abominable\" (2019), and \"How to Train Your Dragon: The Hidden World\" (2019). The Rotten Tomatoes score of \"The Bad Guys\" (2022) is 88%.", "thoughts": "Since we have the Rotten Tomatoes score for \"The Bad Guys\", the next logical step is to find the score for the next movie in the list, \"Boss Baby: Family Business\". After obtaining this score, we can proceed to find the scores for the remaining movies in the same manner.", "task": "Search the internet for the Rotten Tomatoes score of \"Boss Baby: Family Business\" (2021).", "selected_tool": "search_internet" } ``` The planner selects a tool and generates a task for the next step. The task is tool-specific and executed by the tool, in this case the [search_internet](https://github.com/krasserm/bot-with-plan/tree/master/gba/tools/search#search-internet-tool) tool, which results in the next observation on the trajectory. If the `final_answer` tool is selected, a final answer is available or can be generated from the trajectory. ## Tools The planner learned a (static) set of available tools during fine-tuning. These are: | Tool name | Tool description | |--------------------|-------------------------------------------------------------------------------------------| | `ask_user` | Useful for asking user about information missing in the request. | | `calculate_number` | Useful for numerical tasks that result in a single number. | | `create_event` | Useful for adding a single entry to my calendar at given date and time. | | `search_wikipedia` | Useful for searching factual information in Wikipedia. | | `search_internet` | Useful for up-to-date information on the internet. | | `send_email` | Useful for sending an email to a single recipient. | | `use_bash` | Useful for executing commands in a Linux bash. | | `final_answer` | Useful for providing the final answer to a request. Must always be used in the last step. | The framework provided by the [bot-with-plan](https://github.com/krasserm/bot-with-plan) project can easily be adjusted to a different set of tools for specialization to other application domains.