derek-thomas's picture
Spacing
367c6b5 verified
import reflex as rx
from datasets import load_dataset
dataset = load_dataset("derek-thomas/labeled-multiple-choice-explained-falcon-tokenized", split='train')
df = dataset.to_pandas()
p1 = '''
# Prompt Order Experiment
This experiment aims to explore various scenarios for **prompt fine-tuning** using structured generation. We'll test how the order of elements in a prompt affects model performance. The elements we consider are:
- **(Q)**: Question
- **(AC)**: Answer Choices
- **(R)**: Reasoning
- **(FA)**: Final Answer
## Scenarios
We will evaluate the following prompt orders:
### **Scenario 1: Q - AC - R - FA** (Falcon and GPT3.5)
This is the most natural order. The model generates reasoning before the final answer, providing the most information prior to making a selection. This order leverages decoding mechanics effectively.
This is our user message, we can see the question and answer choices
<details>
<summary>Click to show prompt!</summary>
'''
p2 = f'''
```json
{df['conversation_RFA_gpt3_5'].iloc[0][0]}
```
This is our assistant message, you can see that we are forcing a JSON (note I added spacing for visual purposes), and we are putting the reasoning first. Using a JSON in fine-tuning will improve our structured generation results as the model will get used to responding in that "space".
```json
{df['conversation_RFA_gpt3_5'].iloc[0][1]}
```
</details>
'''
p3 = f'''
### **Scenario 2: Q - AC - FA - R** (Falcon and GPT3.5)
An awkward order, placing reasoning after the final answer. While it is faster, it assumes the model can "know" reasoning internally before generating it. This approach saves tokens but is a skeptical case worth testing.
<details>
<summary>Click to show prompt!</summary>
```json
{df['conversation_FAR_gpt3_5'].iloc[0][0]}
```
```json
{df['conversation_FAR_gpt3_5'].iloc[0][1]}
```
</details>
'''
p4 = '''
### **Scenario 3: Q - AC - FA**
This serves as a fine-tuning control. No reasoning is provided in the output.
### **Scenario 4: Base**
An un-fine-tuned control for comparison purposes.
### Structured Generation
Structured generation ensures consistent response formats, which is crucial for reliable fine-tuning. Initial experiments faced difficulties with response consistency and structured generation can solve this.
'''
def page():
return rx.vstack(
rx.markdown(p1+p2+p3+p4),
)