About
Swingline is a fast, pure-GPU implementation of Weighted Voronoi Stippling.
Weighted Voronoi Stippling is an iterative process built on Lloyd's algorithm. Two steps are repeated over and over until the stipple locations converge:
- Render a Voronoi diagram from a set of seed points
- Adjust the seeds so that each one lies at the weighted centroid of its cell
The original paper suggests using the GPU to calculate the Voronoi diagram; Swingline extends this by calculating the cell updates on the GPU as well.
This makes it alarmingly fast!
For small sample counts (1-2K samples), Swingline is between 10 and 20 times faster than StippleGen. For larger sample counts (e.g. 10K), Swingline takes 10s of seconds to converge (and StippleGen simply hangs).
Rendering the Voronoi diagram
To construct the Voronoi diagram, we'll be using a fairly old technique: it was first published at SIGGRAPH '99 and is also referenced in the Red Book.
The core idea is incredibly simple:
At every cell center, render a uniquely-colored cone with
depth culling turned on.
From the side, a set of cones looks like this:
And from the top (with flat shading), it looks quite Voronoi-ish:
This works because the z-height of each cone represents Euclidean distance to the cell center. Turning on z-culling solves for the closest cell center for every pixel; coloring each cone uniquely lets us disambiguate which cell center was chosen.
Cones are drawn using instanced rendering.
We create one vertex buffer that contains a single cone, another buffer that contains
a list of offsets (representing each cell's location), and draw them using
glDrawArraysInstanced
.
The vertex shader is very simple:
layout(location=0) in vec3 pos; /* Absolute coordinates */
layout(location=1) in vec2 offset; /* 0 to 1 */
uniform vec2 scale;
out vec3 color_;
void main()
{
gl_Position = vec4(pos.xy*scale + 2.0f*offset - 1.0f,
pos.z, 1.0f);
// Pick color based on instance ID
int r = gl_InstanceID % 256;
int g = (gl_InstanceID / 256) % 256;
int b = (gl_InstanceID / 65536) % 256;
color_ = vec3(r / 255.0f, g / 255.0f, b / 255.0f);
}
Now, you understand the faint color patches in the background of the demo images: each color patch represents a different Voronoi cell, and the black dots are the stipple locations.
Finding weighted cell centroids
With the Voronoi diagram rendered, we need to find cell centroids.
The centroid of cell $i$ is defined as $$ p_i = \frac{ \sum_{A} x w(x)}{ \sum_{A} w(x)} $$ where $A$ is the set of pixels in cell $i$, $x$ is a pixel's coordinate, and $w(x)$ is a weighting function sampled at point $x$.
A naive implementation iterates over every pixel in the Voronoi diagram, accumulating a position and weight for the given cell's index.
For a $W$ by $H$ image, the naive strategy has a running time of $O(W \times H)$.
Instead, I'll present an $O(W + H)$ strategy that takes advantage of the GPU's massive parallelism. As implied by the running time, this algorithm is a two-stage operation. The first stage is horizontal accumulation and the second is vertical summing.
Here's a visualization of the algorithm, which probably won't make sense yet:
As shown in the picture above, we'll be using an intermediate texture of height $H$ and width $N$, where $N$ is the number of samples. In this texture, a fragment's $x$ coordinate represents which Voronoi cell it's looking for; its $y$ coordinate is the row of the Voronoi image that it searches.
In the first stage, each fragment iterates across every column of its assigned row, checking pixels and accumulating weights and positions for the pixels that match the fragment's Voronoi cell index.
Expressed as a fragment shader, the logic is as follows:
layout (pixel_center_integer) in vec4 gl_FragCoord;
out vec4 color;
uniform sampler2D voronoi;
uniform sampler2D img;
void main()
{
int my_index = int(gl_FragCoord.x);
ivec2 tex_size = textureSize(voronoi, 0);
color = vec4(0.0f);
// Iterate over columns of the source image, accumulating a
// weighted sum of the pixels that match our index
for (int x=0; x < tex_size.x; x++)
{
ivec2 coord = ivec2(x, gl_FragCoord.y);
vec4 t = texelFetch(voronoi, coord, 0);
// Unpack RGB value into a Voronoi cell index
int i = int(255.0f * (t.r + (t.g * 256.0f)
+ (t.b * 65536.0f)));
if (i == my_index)
{
float weight = 1.0f - texelFetch(img, coord, 0)[0];
weight = 0.01f + 0.99f * weight;
color.xy += (coord + 0.5f) * weight;
color.w += weight;
color.z += 1.0f;
}
}
// Normalize to the 0 - 1 range
color.x = color.x / tex_size.x;
color.y = color.y / tex_size.y;
}
Next, we accumulate down columns of the intermediate texture. Remember, each column in this texture represents a particular Voronoi cell. This summation collapses the per-row values into a single result for each Voronoi cell.
layout (location=0) in uint index;
out vec3 pos;
uniform sampler2D summed;
void main()
{
ivec2 tex_size = textureSize(summed, 0);
pos = vec3(0.0f, 0.0f, 0.0f);
float weight = 0.0f;
float count = 0.0f;
for (int y=0; y < tex_size.y; ++y)
{
vec4 t = texelFetch(summed, ivec2(index, y), 0);
pos.xy += t.xy;
weight += t.w;
count += t.z;
}
pos.xy /= weight;
pos.z = weight / count;
}
This shader renders directly back into the VBO that's used for cone positioning, using transform feedback.
We also store the average pixel weight, which is used to adjust the dot's radius. This makes the images look better.
Here's an example comparing constant and variable-radius stipples:
Implementation and Usage
Swingline is a command-line program with the following usage:
Usage: swingline [-n samples] [-r radius]
[-o output] [-i iterations] image
Sets of stipples are saved to an SVG output file.
Swingline is implemented in under 900 lines of C, plus stb_image.h
for image loading. It has GLFW
and libepoxy
as dependencies.
You can view the source on Github.
Further Reading
Thanks to Christian Brunschen, who pointed me to Parallel Banding Algorithm to Compute Exact Distance Transform with the GPU. This paper describes an advanced GPU-only algorithm to calculate Euclidean Distance Transforms (from which centroidal Voronoi diagrams can be extracted).