-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Control Net support + Textual Inversion (embeddings) #131
Conversation
For me it fails in a different way: make -j && ./bin/sd --mode txt2img -p "a person" -m ../models/sd-v1-4.ckpt
|
Try specifying |
Now it completes, but
|
Try running it in debug mode. |
@ggerganov It was a variable that was causing an overflow. With debug mode, I managed to detect it, although NaNs (Not a Number) are still being generated somewhere in the unet computation graph, anyway thank you for your time. I am going to check. |
@ggerganov Is what I'm doing correct when extracting data from collected tensors into a vector? I suspect, it might be reading a memory address with residual data. Read |
@leejet can you test this? |
@FSSRepo Where did you download the ControlNet model from? When I used the official provided weights from https://huggingface.co/lllyasviel/ControlNet-v1-1/tree/main, the tensor names didn't match. |
@leejet Control Net for stable diffusion 1.5: |
@leejet those pth models have the prefix |
It looks like ControlNet is in effect.
|
omg, u guys are bringing everything to sd cpp! thank u for all ur hard work, can't wait for inpainting and outpainting too be a part of controlnet |
@leejet Ready, the official pth models should also be working now. I should emphasize that they also include other t2i models; those are a different architecture. They make changes to the input blocks, while ControlNet makes changes to the output blocks and the middle block. The missing part is to create the preprocessors. But that is cumbersome and somewhat complicated, as it requires implementing other models (image2depth and openpose) and algorithms to perform Canny (edge detection). |
Great job, and thank you for your contributions. I'll find some time to review and merge this PR. |
@Green-Sky Could you try this pull request if it works correctly for you? Also, if you try removing this #ifdef. __STATIC_INLINE__ void ggml_backend_tensor_get_and_sync(ggml_backend_t backend, const struct ggml_tensor* tensor, void* data, size_t offset, size_t size) {
// delete this to use just ggml_backend_tensor_get
#ifdef SD_USE_CUBLAS
if(!ggml_backend_is_cpu(backend)) {
ggml_backend_tensor_get_async(backend, tensor, data, offset, size);
ggml_backend_synchronize(backend);
} else {
ggml_backend_tensor_get(tensor, data, offset, size);
}
#else
ggml_backend_tensor_get(tensor, data, offset, size);
#endif
} with and without vae tiling |
@FSSRepo control net works very well for the first image after the model was loaded. Is it possible to make it work for more than one image after loading the model? |
which one? the neony artifacts or the "some tiles look lower resolution" ? I ran the branch rn and it appears the tiling is not executed? despite it being set.
... if however I am wrong and the tiling is executed, then it's perfect now. Even the seams would be gone. |
@Green-Sky you can try again |
toggling this section does not make a difference:
at least it looks like it is deterministic. (besides github deciding randomly how large an image is displayed/processed) |
also, if the tiles are overlapped, no steams should be visible. |
When conducting tests, I am also obtaining results that seem to be of low resolution. I'm not sure what the cause might be. EDIT: It seems to be something inherent to VAE tiling, as I tried with a CPU backend and obtained the same low-resolution result. |
Adding Openpose and scribble work quite well, canny not so well (it depends on the input image, actually) and |
Not sure, if this is of any help. But I tried to do canny edge detection with
And here a general example: https://docs.opencv.org/4.x/da/d5c/tutorial_canny_detector.html And I use this for openpose generation:
Which is based on this example: https://docs.opencv.org/3.4/d7/d4f/samples_2dnn_2openpose_8cpp-example.html And maybe this can be used for image segmentation: https://github.com/ggerganov/ggml/tree/master/examples/sam? |
That implies using very heavy libraries that are not really necessary for the functioning of this project itself; it's better to have it integrated from scratch in ggml. |
True (at least |
In the past few weeks, I have been busy with pull requests related to svd, so I didn't have a chance to merge this PR. Now that I've addressed some issues, I'm ready to merge this PR. Thank you for your contribution. |
@leejet Thank you for continuing to maintain the project despite having other jobs. I'll see if in the next few days, when I have time to implement flash attention in llama.cpp, I can adapt the web UI that I already have created to the project. |
Would controlnet with SDXL work with this update? |
I implemented a Control Net here; it's somewhat similar to the encoder phase of stable diffusion, but with some convolutional layers.
Usage
./bin/sd --mode txt2img -p "a person" -m ../models/sd-v1-4.ckpt --control-net models/control_openpose-fp16.safetensors --control-image assets/control.png
Some examples
NOTE: Requires more of 4 GB of VRAM, peak memory usage 3.6 GB (Out Of Memory in Windows). Use
--control-net-cpu
to keep Control Net with CPU backend