diff --git a/asset/content/sampler.jpg b/asset/content/sampler.jpg new file mode 100644 index 0000000..7f86ebd Binary files /dev/null and b/asset/content/sampler.jpg differ diff --git a/index.html b/index.html index 95611f1..a7de04a 100644 --- a/index.html +++ b/index.html @@ -2,6 +2,7 @@ + Sana @@ -181,9 +182,6 @@ padding: 10px; text-align: center; margin-top: 10px; - box-shadow: 2px 4px 12px #00000054; - border-top-left-radius: 20px; - border-top-right-radius: 20px; } .citation-content { text-align: left; @@ -204,17 +202,22 @@ font-weight: normal; } .citation pre { + border-radius: 15px; /* Rounded corners */ max-width: 90%; /* Limit the width to 80% of the screen */ text-align: left; } .footer { - background-color: #222; - color: #fff; + background-color: #f5f5f5; + box-shadow: 2px 4px 12px #00000054; + color: #333; padding: 20px; text-align: center; + margin-top: -20px; + border-top-left-radius: 20px; + border-top-right-radius: 20px; } .footer a { - color: #00d1b2; + color: dodgerblue; text-decoration: none; } .inserted-image { @@ -226,7 +229,7 @@ margin-left: auto; /* Center the image horizontally */ margin-right: auto; border-radius: 10px; - box-shadow: 2px 2px 12px 4px #00000012; + box-shadow: 2px 2px 10px 3px #00000030; } .video-container { text-align: center; /* Center the video horizontally */ @@ -236,7 +239,7 @@ max-width: 80%; /* The video will scale to fit the container */ height: auto; /* Maintain the video's aspect ratio */ border-radius: 10px; /* Rounded corners for the video */ - box-shadow: 2px 4px 12px #00000054; + box-shadow: 2px 2px 10px 3px #00000054; } .logo { color: black; @@ -499,18 +502,28 @@

Several core design details for Efficiency

Unlike CLIP or T5, Gemma offers superior text comprehension and instruction-following. We address training instability and design complex human instructions (CHI) to leverage Gemma’s in-context learning, improving image-text alignment.
+

+ + +
+ pipeline for Sana +
+ +
+

    •   Efficient Training and Inference Strategy: We propose automatic labeling and training strategies to improve text-image consistency. Multiple VLMs generate diverse re-captions, and a CLIPScore-based strategy selects high-CLIPScore captions to enhance convergence and alignment. - Additionally, our Flow-DPM-Solver reduces inference steps from 28-50 to 14-20 compared to the Flow-Euler-Solver, with better performance.

-

+ Additionally, our Flow-DPM-Solver reduces inference steps from 28-50 to 14-20 compared to the Flow-Euler-Solver, with better performance. +

+
- pipeline for Sana + flow-dpms vs flow-euler
-

Performance

+

Overall Performance

We compare Sana with the most advanced text-to-image diffusion models in Table 7. For 512 × 512 resolution, Sana-0.6 demonstrates a throughput that is 5× faster than PixArt-Σ, which has a similar model size, and significantly outperforms it in FID, Clip Score, GenEval, and DPG-Bench. For 1024 × 1024 resolution, @@ -524,11 +537,6 @@

Performance

Sana performance
-
-

Our Mission

-

Our mission is to develop efficient, lightweight, and accelerated AI technologies that address practical challenges and deliver fast, open-source solutions...

-
-

Sana-0.6B is deployable on a customer-grade 4090 GPU

@@ -541,6 +549,11 @@

Sana-0.6B is deployable on a customer-grade 4090 GPU

+
+

Our Mission

+

Our mission is to develop efficient, lightweight, and accelerated AI technologies that address practical challenges and deliver fast, open-source solutions...

+
+
@@ -565,7 +578,12 @@

BibTeX

-

Total clicks:

+

+ This website is licensed under a Creative + Commons Attribution-ShareAlike 4.0 International License. +

+ Total clicks: