Spaces:

vllab
/

controlnet-hands

Runtime error

App Files Files Community

MakiPan commited on May 5, 2023

Commit

006dceb

1 Parent(s): 9e5b250

Update app.py

Browse files

Files changed (1) hide show

app.py +2 -36

app.py CHANGED Viewed

@@ -261,6 +261,7 @@ To preprocess the data there were three options we considered:
 </td>
 </tr></table>
 </center>
 - The last option was to use [MediaPipe Holistic](https://ai.googleblog.com/2020/12/mediapipe-holistic-simultaneous-face.html) to provide pose face and hand landmarks to the ControlNet. This method was promising in theory, however, the HaGRID dataset was not suitable for this method as the Holistic model performs poorly with partial body and obscurely cropped images.
 We anecdotally determined that when trained at lower steps the encoded hand model performed better than the standard MediaPipe model due to implied handedness. We theorize that with a larger dataset of more full-body hand and pose classifications, Holistic landmarks will provide the best images in the future however for the moment the hand encoded model performs best. """)
@@ -290,8 +291,7 @@ We anecdotally determined that when trained at lower steps the encoded hand mode
         with gr.Column():
             output_image = gr.Gallery(label='Output Image', show_label=False, elem_id="gallery").style(grid=2, height='auto')
-        if model_type=="Standard":
-            gr.Examples(
                 examples=[
             [
                "a woman is making an ok sign in front of a painting",
@@ -324,40 +324,6 @@ We anecdotally determined that when trained at lower steps the encoded hand mode
         fn=infer,
         cache_examples=True,
     )
-        elif model_type=="Hand Encoding":
-            gr.Examples(
-                examples=[
-            [
-               "a woman is making an ok sign in front of a painting",
-               "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality",
-               "example.png"
-            ],
-            [
-               "a man with his hands up in the air making a rock sign",
-               "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality",
-               "example1.png"
-            ],
-            [
-               "a man is making a thumbs up gesture",
-               "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality",
-               "example2.png"
-            ],
-            [
-               "a woman is holding up her hand in front of a window",
-               "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality",
-               "example3.png"
-            ],
-            [
-               "a man with his finger on his lips",
-               "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality",
-               "example4.png"
-            ],
-        ],
-        inputs=[prompt_input, negative_prompt, input_image, model_type],
-        outputs=[output_image],
-        fn=infer,
-        cache_examples=False, #cache_examples=True,
-    )
     inputs = [prompt_input, negative_prompt, input_image, model_type]
     submit_btn.click(fn=infer, inputs=inputs, outputs=[output_image])

 </td>
 </tr></table>
 </center>
 - The last option was to use [MediaPipe Holistic](https://ai.googleblog.com/2020/12/mediapipe-holistic-simultaneous-face.html) to provide pose face and hand landmarks to the ControlNet. This method was promising in theory, however, the HaGRID dataset was not suitable for this method as the Holistic model performs poorly with partial body and obscurely cropped images.
 We anecdotally determined that when trained at lower steps the encoded hand model performed better than the standard MediaPipe model due to implied handedness. We theorize that with a larger dataset of more full-body hand and pose classifications, Holistic landmarks will provide the best images in the future however for the moment the hand encoded model performs best. """)
         with gr.Column():
             output_image = gr.Gallery(label='Output Image', show_label=False, elem_id="gallery").style(grid=2, height='auto')
+        gr.Examples(
                 examples=[
             [
                "a woman is making an ok sign in front of a painting",
         fn=infer,
         cache_examples=True,
     )
     inputs = [prompt_input, negative_prompt, input_image, model_type]
     submit_btn.click(fn=infer, inputs=inputs, outputs=[output_image])