Spaces:
Sleeping
Sleeping
metadata
title: Nlc Gen
emoji: 💩
colorFrom: gray
colorTo: purple
sdk: streamlit
sdk_version: 1.9.0
app_file: app.py
pinned: false
license: mit
NLC-Explorer
A Natural Language Counterfactual Generator for Exploring Bias in Sentiment Analysis Algorithms
Overview
This project is an extension of Interactive Model Cards. It focuses on providing a person more ways to explore the bias of a model through the generation of alternatives (technically counterfactuals). We believe the use of alternatives people can better understand the limitations of a model and develop productive skepticism around its usage and trustworthiness.
Known Limitations
- Words not in the spaCy vocab for
en_core_web_lg
won't have vectors and so won't have the ability to create similarity scores. - WordNet provides many limitations due to its age and lack of funding for ongoing maintenance. It provides access to a large variety of the English language but certain words simply do not exist.
- There are currently only 2 lists (Countries and Professions). We would like to find community curated lists for: Race, Sexual Orientation and Gender Identity (SOGI), Religion, age, and protected status.
Key Dependencies and Packages
- Hugging Face Transformers - the model we've designed this iteration for is hosted on hugging face. It is: distilbert-base-uncased-finetuned-sst-2-english.
- Streamlit - This is the library we're using to build the prototype app because it is easy to stand up and quick to fix.
- spaCy - This is the main NLP Library we're using and it runs most of the text manipulation we're doing as part of the project.
- NLTK + WordNet - This is the initial lexical database we're using because it is accessible directly through Python and it is free. We will be considering a move to ConceptNet for future iterations based on better lateral movement across edges.
- Lime - We chose Lime over Shap because Lime has more of the functionality we need. Shap appears to provide greater performance but is not as easily suited to our original designs.
- Altair - We're using Altair because it's well integrated into Streamlit.