---
title: KDDA Global Model - Invoices
emoji: 🐨
---

# Configuration

`title`: _string_  
Display title for the Space

`emoji`: _string_  
Space emoji (emoji-only character allowed)

`colorFrom`: _string_  
Color for Thumbnail gradient (red, yellow, green, blue, indigo, purple, pink, gray)

`colorTo`: _string_  
Color for Thumbnail gradient (red, yellow, green, blue, indigo, purple, pink, gray)

`sdk`: _string_  
Can be either `gradio` or `streamlit`

`app_file`: _string_  
Path to your main application file (which contains either `gradio` or `streamlit` Python code).  
Path is relative to the root of the repository.

`pinned`: _boolean_  
Whether the Space stays on top of your list.

# Custom LayoutLM Model for Invoice Processing

This repository hosts a custom implementation of the [LayoutLM](https://huggingface.co/microsoft/layoutlm-base-uncased) model, specifically fine-tuned for extracting key information from invoices. The model is designed to identify and extract various fields such as amounts, dates, and names from invoice documents.

## Model Overview

This model is based on the LayoutLMv2 architecture and has been fine-tuned on a custom dataset of invoices. It is capable of performing token classification to extract the following entities:

- **Amount Including Tax**
- **Due Date**
- **Reference Number**
- **Customer Name**
- **Vendor Name**
- **Issue Date**
- **Amount**

The model uses a custom set of labels to identify and classify these entities within the invoice documents.

## Label Mapping

The model has been trained with the following `label2id` and `id2label` mappings:

### `label2id` Mapping

```json
label2id = {
    'I-Customer Name': 0,
    'B-Issue Date': 1,
    'I-Issue Date': 2,
    'I-Due Date': 3,
    'I-Amount': 4,
    'B-Due Date': 5,
    'O': 6,
    'B-Amount Including tax': 7,
    'B-Customer Name': 8,
    'B-Amount': 9,
    'I-Amount Including tax': 10,
    'B-Vendor Name': 11,
    'I-Vendor Name': 12,
    'I-Reference Number': 13,
    'B-Reference Number': 14
    }
id2label = {
    0: 'I-Customer Name',
    1: 'B-Issue Date',
    2: 'I-Issue Date',
    3: 'I-Due Date',
    4: 'I-Amount',
    5: 'B-Due Date',
    6: 'O',
    7: 'B-Amount Including tax',
    8: 'B-Customer Name',
    9: 'B-Amount',
    10: 'I-Amount Including tax',
    11: 'B-Vendor Name',
    12: 'I-Vendor Name',
    13: 'I-Reference Number',
    14: 'B-Reference Number'
    }


## Citation
@article{Xu2020LayoutLM,
  title={LayoutLM: Multi-modal Pre-training for Visually-Rich Document Understanding},
  author={Yiheng Xu and Minghao Li and Lei Cui and Shaohan Huang and Furu Wei and Ming Zhou},
  journal={ArXiv},
  year={2020},
  volume={abs/2012.14740}
}