|
import reflex as rx |
|
from datasets import load_dataset |
|
|
|
dataset = load_dataset("derek-thomas/labeled-multiple-choice-explained-falcon-tokenized", split='train') |
|
df = dataset.to_pandas() |
|
|
|
|
|
p1 = ''' |
|
# Prompt Order Experiment |
|
This experiment aims to explore various scenarios for **prompt fine-tuning** using structured generation. We'll test how the order of elements in a prompt affects model performance. The elements we consider are: |
|
- **(Q)**: Question |
|
- **(AC)**: Answer Choices |
|
- **(R)**: Reasoning |
|
- **(FA)**: Final Answer |
|
|
|
## Scenarios |
|
We will evaluate the following prompt orders: |
|
|
|
### **Scenario 1: Q - AC - R - FA** (Falcon and GPT3.5) |
|
|
|
This is the most natural order. The model generates reasoning before the final answer, providing the most information prior to making a selection. This order leverages decoding mechanics effectively. |
|
|
|
This is our user message, we can see the question and answer choices |
|
|
|
<details> |
|
<summary>Click to show prompt!</summary> |
|
''' |
|
p2 = f''' |
|
|
|
```json |
|
{df['conversation_RFA_gpt3_5'].iloc[0][0]} |
|
``` |
|
|
|
This is our assistant message, you can see that we are forcing a JSON (note I added spacing for visual purposes), and we are putting the reasoning first. Using a JSON in fine-tuning will improve our structured generation results as the model will get used to responding in that "space". |
|
```json |
|
{df['conversation_RFA_gpt3_5'].iloc[0][1]} |
|
``` |
|
</details> |
|
''' |
|
|
|
p3 = f''' |
|
|
|
### **Scenario 2: Q - AC - FA - R** (Falcon and GPT3.5) |
|
|
|
An awkward order, placing reasoning after the final answer. While it is faster, it assumes the model can "know" reasoning internally before generating it. This approach saves tokens but is a skeptical case worth testing. |
|
|
|
<details> |
|
<summary>Click to show prompt!</summary> |
|
|
|
```json |
|
{df['conversation_FAR_gpt3_5'].iloc[0][0]} |
|
``` |
|
|
|
```json |
|
{df['conversation_FAR_gpt3_5'].iloc[0][1]} |
|
``` |
|
</details> |
|
''' |
|
p4 = ''' |
|
|
|
### **Scenario 3: Q - AC - FA** |
|
|
|
This serves as a fine-tuning control. No reasoning is provided in the output. |
|
|
|
### **Scenario 4: Base** |
|
|
|
An un-fine-tuned control for comparison purposes. |
|
|
|
### Structured Generation |
|
Structured generation ensures consistent response formats, which is crucial for reliable fine-tuning. Initial experiments faced difficulties with response consistency and structured generation can solve this. |
|
''' |
|
|
|
|
|
def page(): |
|
return rx.vstack( |
|
rx.markdown(p1+p2+p3+p4), |
|
) |
|
|