|
# GraphRAG Indexer Application |
|
|
|
## Table of Contents |
|
1. [Introduction](#introduction) |
|
2. [Setup](#setup) |
|
3. [Application Structure](#application-structure) |
|
4. [Indexing](#indexing) |
|
5. [Prompt Tuning](#prompt-tuning) |
|
6. [Data Management](#data-management) |
|
7. [Configuration](#configuration) |
|
8. [API Integration](#api-integration) |
|
9. [Troubleshooting](#troubleshooting) |
|
|
|
## Introduction |
|
|
|
The GraphRAG Indexer Application is a Gradio-based user interface for managing the indexing and prompt tuning processes of the GraphRAG (Graph Retrieval-Augmented Generation) system. This application provides an intuitive way to configure, run, and monitor indexing and prompt tuning tasks, as well as manage related data files. |
|
|
|
## Setup |
|
|
|
1. Ensure you have Python 3.7+ installed. |
|
2. Install required dependencies: |
|
``` |
|
pip install gradio requests pydantic python-dotenv pyyaml pandas lancedb |
|
``` |
|
3. Set up environment variables in `indexing/.env`: |
|
``` |
|
API_BASE_URL=http://localhost:8012 |
|
LLM_API_BASE=http://localhost:11434 |
|
EMBEDDINGS_API_BASE=http://localhost:11434 |
|
ROOT_DIR=indexing |
|
``` |
|
4. Run the application: |
|
``` |
|
python index_app.py |
|
``` |
|
|
|
## Application Structure |
|
|
|
The application is divided into three main tabs: |
|
1. Indexing |
|
2. Prompt Tuning |
|
3. Data Management |
|
|
|
Each tab provides specific functionality related to its purpose. |
|
|
|
## Indexing |
|
|
|
The Indexing tab allows users to configure and run the GraphRAG indexing process. |
|
|
|
### Features: |
|
- Select LLM and Embedding models |
|
- Set root directory for indexing |
|
- Configure verbose and cache options |
|
- Advanced options for resuming, reporting, and output formats |
|
- Run indexing and check status |
|
|
|
### Usage: |
|
1. Select the desired LLM and Embedding models from the dropdowns. |
|
2. Set the root directory for indexing. |
|
3. Configure additional options as needed. |
|
4. Click "Run Indexing" to start the process. |
|
5. Use "Check Indexing Status" to monitor progress. |
|
|
|
## Prompt Tuning |
|
|
|
The Prompt Tuning tab enables users to configure and run prompt tuning for GraphRAG. |
|
|
|
### Features: |
|
- Set root directory and domain |
|
- Choose tuning method (random, top, all) |
|
- Configure limit, language, max tokens, and chunk size |
|
- Option to exclude entity types |
|
- Run prompt tuning and check status |
|
|
|
### Usage: |
|
1. Set the root directory and optional domain. |
|
2. Choose the tuning method and configure parameters. |
|
3. Click "Run Prompt Tuning" to start the process. |
|
4. Use "Check Prompt Tuning Status" to monitor progress. |
|
|
|
## Data Management |
|
|
|
The Data Management tab provides tools for managing input files and viewing output folders. |
|
|
|
### Features: |
|
- File upload functionality |
|
- File list management (view, refresh, delete) |
|
- Output folder exploration |
|
- File content viewing and editing |
|
|
|
### Usage: |
|
1. Use the File Upload section to add new input files. |
|
2. Manage existing files in the File Management section. |
|
3. Explore output folders and their contents in the Output Folders section. |
|
|
|
## Configuration |
|
|
|
The application uses a combination of environment variables and a `config.yaml` file for configuration. Key settings include: |
|
|
|
- LLM and Embedding models |
|
- API endpoints |
|
- Community level for GraphRAG |
|
- Token limits |
|
- API keys and types |
|
|
|
To modify these settings, edit the `.env` file or create a `config.yaml` file in the root directory. |
|
|
|
## API Integration |
|
|
|
The application integrates with a backend API for executing indexing and prompt tuning tasks. Key API endpoints used: |
|
|
|
- `/v1/index`: Start indexing process |
|
- `/v1/index_status`: Check indexing status |
|
- `/v1/prompt_tune`: Start prompt tuning process |
|
- `/v1/prompt_tune_status`: Check prompt tuning status |
|
|
|
These endpoints are called using the `requests` library, with appropriate error handling and logging. |
|
|
|
## Troubleshooting |
|
|
|
Common issues and solutions: |
|
|
|
1. **Model loading fails**: Ensure the LLM_API_BASE is correctly set and the API is accessible. |
|
2. **Indexing or Prompt Tuning doesn't start**: Check API connectivity and verify that all required fields are filled. |
|
3. **File management issues**: Ensure proper read/write permissions in the ROOT_DIR. |
|
|
|
For any persistent issues, check the application logs (visible in the console) for detailed error messages. |