Differences in Output Between the Demo Version and My Execution

#4
by milk12345 - opened

There is a noticeable difference in the output results between the demo version and my execution.
What could be the cause of this discrepancy?
For example, when I input an image of a woman, the demo version consistently provides a detailed and stable description:
I loaded the same image twice for each case.

Demo Version Output
β–  Example 1

This is a photograph of an attractive East Asian woman with straight, shoulder-length brown hair. She has fair skin and a curvy figure with large breasts. She is sitting on the edge of a wooden desk in an office, wearing a white blouse that accentuates her cleavage and a tight black pencil skirt. The background features a window with blinds, a green plant, and a laptop.

β–  Example 2

A photograph of an Asian woman with a fair complexion and shoulder-length brown hair, sitting on an office desk. She has a curvy figure with large breasts and is dressed in a white blouse with a deep neckline and a tight black skirt. She has a confident smile. The office background includes a laptop, papers, and a potted plant.

However, when I run the same process on Google Colab, the output is as follows:

Google Colab Output
β–  Example 1

Cutting Knowledge Date: December 2023
Today Date: 26 July 2024

You are a helpful image captioner.canfuck

Cutting Knowledge Date: December 2023
Today Date: 26 July 2024

You are a helpful image captioner.canfuck

β–  Example 2

This image is a 2023-12-01 work by an assistant. The image is a work of an assistant, so it is not an image of a person, but it is an image of a person. The image is an image of a person, but it is not an image of a person, but it is an image of a person. The image is an image of a person, but it is not an image of a person, but it is an image of a person. The image is an image of a person, but it is not an image of a person, but it is an image of a person. The image is an image of a person, but it is not an image of a person, but it is an image of a person.

Of course, there are times when I do get detailed results, but the behavior is highly unstable.

I would like to use it in a stable manner like the demo version.
Could there be differences in the source code? Or is the issue caused by the environment I am running it in?

There is almost no difference in the logic part of the source code. In other words, it seems that the model processing is going wrong or right for some reason under certain circumstances.
However, if everything is wrong, I can understand. If it is only half right, what is going on?πŸ€”
There is a possibility that something has been overlooked, so I will take a look.

Edit:
I forgot that this model is used in the demo version.😱 The official version uses the one used in Colab. There are many good models, not just this one, so I think the performance will improve if you replace it.
Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2

#DEFAULT_MODEL_PATH = "unsloth/Meta-Llama-3.1-8B-bnb-4bit" # Colab default
#DEFAULT_MODEL_PATH = "Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2" # Demo version default
DEFAULT_MODEL_PATH = "John6666/Llama-3.1-8B-Lexi-Uncensored-V2-nf4" # Smaller one

Oh, thank you!
Using Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2 made it stable!
By the way, do you have any recommended uncensored models?

https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard
There is an uncensored LLM leaderboard.πŸ˜€ Also, the ones that I have personally tested and that seem to perform well are registered below. Please note that the ones that can be used with JoyCaption are limited to Llama 8B derivatives. Please identify them by name.
https://huggingface.co/spaces/John6666/text2tag-llm

Wait a moment lol
Natural Text to SD Prompt Translator With LLM alpha
This is incredibly helpful!!! It helps me a lot!!
By the way, is there anything that can generate an SD Prompt from an image?

anything that can generate an SD Prompt from an image?

If you're looking for something like the Danbooru format used for anime models, the work of the three people above is well known. For natural language processing for SD3 and other purposes, I think that fancyfeast's JoyCaption, gokaygokay's Captioner and Prompt Enhancer are excellent.
Most of the things I've seen have been added to my collection. They're mostly just bookmarks, so they're not organized...πŸ˜…
https://huggingface.co/SmilingWolf
https://huggingface.co/KBlueLeaf
https://huggingface.co/p1atdev
https://huggingface.co/fancyfeast
https://huggingface.co/gokaygokay
https://huggingface.co/collections/John6666/spaces-for-tagger-captioner-prompter-6670d34aef4c0c979665339c

I mainly use realistic SDXL, but this seems to be a useful reference.
Thank you!

Sign up or log in to comment