File size: 5,934 Bytes

Documentation for the scripts in the `scripts` directory, starting with `batch-caption.py`, which is used to run JoyCaption in bulk. Other scripts might be added in the future.

# batch-caption.py

## Basic Command

To run the script, use the following command:

```sh
./batch-caption.py --glob "path/to/images/*.jpg" --prompt "Write a descriptive caption for this image in a formal tone."
```

This command will caption all the `.jpg` images in the specified directory using the provided prompt, writing `.txt` files alongside each image.

## Command-Line Arguments

**Note**: You must specify either `--glob` or `--filelist` or `--input` to provide images, and either `--prompt` or `--prompt-file` to provide a prompt for caption generation.

| Argument           | Description                                                | Default                                                                                                                     |
| ------------------ | ---------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- |
| `--input`          | Input images                                               | N/A                                                                                                                         |
| `--glob`           | Glob pattern to find images                                | N/A                                                                                                                         |
| `--filelist`       | File containing a list of images                           | N/A                                                                                                                         |
| `--prompt`         | Prompt to use for caption generation                       | N/A                                                                                                                         |
| `--prompt-file`    | JSON file containing prompts                               | N/A                                                                                                                         |
| `--batch-size`     | Batch size for image processing                            | 1                                                                                                                           |
| `--greedy`         | Use greedy decoding instead of sampling                    | False                                                                                                                       |
| `--temperature`    | Sampling temperature (used when not using greedy decoding) | 0.6                                                                                                                         |
| `--top-p`          | Top-p sampling value (nucleus sampling)                    | 0.9                                                                                                                         |
| `--top-k`          | Top-k sampling value                                       | None                                                                                                                        |
| `--max-new-tokens` | Maximum length of the generated caption (in tokens)        | 256                                                                                                                         |
| `--num-workers`    | Number of workers loading images in parallel               | 4                                                                                                                           |
| `--model`          | Pre-trained model to use                                   | [John6666/llama-joycaption-alpha-two-hf-llava-nf4](https://huggingface.co/John6666/llama-joycaption-alpha-two-hf-llava-nf4) |
| `--bf16`           | Load model on torch.bfloat16                               | False                                                                                                                       |


### Examples

1. **Caption images with a specific prompt**

   ```sh
   ./batch-caption.py --glob "images/*.png" --prompt "Write a descriptive caption for this image in a formal tone."
   ```
   or
   ```sh
   ./batch-caption.py --input "images/dog.png" --prompt "Write a descriptive caption for this image in a formal tone."
   ```

2. **Use a JSON file for prompts**

   ```sh
   python batch-caption.py --filelist "image_paths.txt" --prompt-file "prompts.json"
   ```

3. **Use Greedy Decoding**

   ```sh
   python batch-caption.py --glob "images/*.jpg" --prompt "Write a descriptive caption for this image in a formal tone." --greedy
   ```

## Prompt Handling

- For a list of prompts that the model understands, please refer to the project's root README.

- You can specify a prompt directly using the `--prompt` argument or use a JSON file containing a list of prompts with weights using `--prompt-file`.

- If multiple prompts are specified in the prompt file, the prompt used for each image will be randomly selected.

- **Prompt File Format**: The JSON file should contain either strings or objects with `prompt` and `weight` fields.

  - **Weighting**: The `weight` field indicates the probability of selecting a particular prompt during caption generation. Higher weights make a prompt more likely to be chosen. For example, if one prompt has a weight of 2.0 and another has a weight of 1.0, the first prompt will be twice as likely to be used.

Example `prompts.json`:

```json
[
  { "prompt": "Describe the scene in detail.", "weight": 2.0 },
  { "prompt": "Summarize the main elements of the image.", "weight": 1.0 }
]
```

## Output

- Captions are saved as `.txt` files in the same directory as the corresponding image.
- If a `.txt` caption file already exists for an image, the script will skip that image.