Fidget

Introduction

Fidget is a library for representing, compiling, and evaluating large-scale math expressions, i.e. hundreds or thousands of arithmetic clauses. It's mainly designed as a backend for implicit surfaces, but the library is flexible enough for many different uses!

Try the demo

What are implicit surfaces?

Implicit surfaces are expressions of the form $f(x, y, z) \rightarrow d$ returning a single distance value $d$. If that value is positive, then the point $(x, y, z)$ is outside the model; if it's negative, then the point is inside the model.

For example, a sphere of radius 1 can be expressed as

$$ f(x, y, z) = \sqrt{x^2 + y^2 + z^2} - 1 $$

By evaluating the expression at many points in space and checking its sign, we can render this function into an image. In this example, we'll render a heightmap, where filled-in voxels are colored based on their distance to the camera:

Picture of a shaded sphere

Fidget focuses on closed-form implicit surfaces, which build their expressions out of basic arithmetic operations. This is in contrast with more flexible representations (e.g. GLSL in a pixel shader), which may run entire Turing-complete programs to compute the distance value.

I like to think of these functions as an "assembly language for shapes": a low-level representation that you're unlikely to write by hand, but easy to target from higher-level representations.

Why implicit surfaces?

Implicit surfaces are a particularly pleasant representation: they're compact, philosophically straight-forward, and amenable to massively parallel evaluation (e.g. using SIMD instructions or on the GPU).

In addition, constructive solid geometry (CSG) operations like union or intersection – famously difficult for meshes and NURBS – are trivial! Here's the union of two exactly-coincident cylinders, which is just min(a, b):

Picture of two cylinders overlapping in a cross shape

Having a closed-form equation opens the door to interesting optimizations. For example, Fidget can capture a trace showing which branches are taken during evaluation; this trace can be used to simplify the expression, reducing the cost of future evaluation.

Finally, for the computer enthusiasts, implicit surfaces offer a deep well of entertaining research topics: everything from compilers to meshing algorithms to GPGPU programming.

In other words, I just think they're neat.

Origins

Over the past ten years, I've spent a lot of time thinking about rendering and evaluation of implicit surfaces:

I've written 4-5 kernels in various languages, at least one of which was used in commercial CAD software.
Back in 2020, I published a SIGGRAPH paper presenting a new strategy for fast rendering on modern GPUs

So, why write something completely new, when the industrial-strength libfive kernel is sitting right there?

This is personal-scale, mostly-unpaid research, so I have to optimize for my own motivation and energy. The questions that I'm currently finding interesting would be hard or frustrating to investigate using libfive as a foundation:

Finding the "right" APIs for implicit kernels, with the possibility of making substantial compatibility breaks
Experimenting with native (JIT) compilation, to improve performance without moving to the GPU
Cross-compiling to WebAssembly and building easily-accessible web demos

libfive is 40K lines of mostly C++, and is extremely challenging to hack on, even as the original author. It's also frustrating to touch: if it's been a few months since I last compiled it, it's inevitable that the build will have broken, and I'll have to mess with CMake for 10 minutes to unbreak it.

Fidget is written in Rust, which I'm also using professionally. It compiles with a single cargo build, cross-compiles to WebAssembly seamlessly, and can be refactored with confidence (thanks to Rust's strong type system and memory safety).

With all that context, let's get started on our tour of the library!

Library structure

The library is built as a stack of three mostly-decoupled layers, along with demo applications. Click on one to jump to that section of the writeup, or keep scrolling to read through in order.

Frontend
Math trees and graphs

Backend
Fast, flexible evaluation

Algorithms
Rendering and more

Demos
GUIs, CLIs, and web apps

Frontend: building math expressions

The frontend of Fidget goes from input script to bytecode:

Diagram showing the following pipeline: Script, evaluation, Math tree, deduplication, Math graph, flattening, SSA tape, register allocation, Bytecode

It's not mandatory to use this specific workflow; users of the library can show up at any point in this diagram.

Let's walk through step by step and see how things look at each stage!

Scripting

Fidget includes bindings for Rhai, an embedded scripting language for Rust. Using operator overloading, it's easy to write scripts to build up math expressions:

let scale = 30;

let x = x * scale;
let y = y * scale;
let z = z * scale;

let gyroid = sin(x)*cos(y) + sin(y)*cos(z) + sin(z)*cos(x);
let fill = abs(gyroid) - 0.2;

let sphere = sqrt(square(x) + square(y) + square(z)) - 25;
draw(max(sphere, fill));

The value passed to draw is a math tree which represents the expression built up by the script.

Tree and graph

The math tree is deduplicated to produce a directed acyclic graph:

math graph

SSA tape

Doing a topological sort lets us flatten the graph into straight-line code.

This code is in single static assignment form; there are arbitrarily many pseudo-registers (named rX), and each one is only written once.

r7  = Input[0]
r6  = MulRegImm(r7, 30.0)
r26 = SquareReg(r6)
r16 = Input[2]
r15 = MulRegImm(r16, 30.0)
r25 = SquareReg(r15)
r24 = AddRegReg(r26, r25)
r10 = Input[1]
r9  = MulRegImm(r10, 30.0)
r23 = SquareReg(r9)
r22 = AddRegReg(r24, r23)
r21 = SqrtReg(r22)
r20 = SubRegImm(r21, 25.0)
r19 = SinReg(r6)
r18 = CosReg(r15)
r17 = MulRegReg(r19, r18)
r14 = SinReg(r15)
r13 = CosReg(r9)
r12 = MulRegReg(r14, r13)
r11 = AddRegReg(r17, r12)
r8  = SinReg(r9)
r5  = CosReg(r6)
r4  = MulRegReg(r8, r5)
r3  = AddRegReg(r11, r4)
r2  = AbsReg(r3)
r1  = SubRegImm(r2, 0.2)
r0  = MaxRegReg(r20, 1)
Output[0] = r0

It's possible to evaluate this tape, but it doesn't scale well: you have to allocate a memory location for every single operation, since pseudo-registers aren't reused.

Bytecode

To improve evaluation, it's helpful to map from pseduo-registers to physical(ish) registers, which can be reused.

For example, the tape above can be packed into 6 reuseable registers:

r3 = Input[0]
r3 = MulRegImm(r3, 30.0)
r0 = SquareReg(r3)
r4 = Input[2]
r4 = MulRegImm(r4, 30.0)
r2 = SquareReg(r4)
r0 = AddRegReg(r0, r2)
r2 = Input[1]
r2 = MulRegImm(r2, 30.0)
r1 = SquareReg(r2)
r0 = AddRegReg(r0, r1)
r0 = SqrtReg(r0)
r0 = SubRegImm(r0, 25.0)
r1 = SinReg(r3)
r5 = CosReg(r4)
r1 = MulRegReg(r1, r5)
r4 = SinReg(r4)
r5 = CosReg(r2)
r4 = MulRegReg(r4, r5)
r1 = AddRegReg(r1, r4)
r2 = SinReg(r2)
r3 = CosReg(r3)
r2 = MulRegReg(r2, r3)
r1 = AddRegReg(r1, r2)
r1 = AbsReg(r1)
r1 = SubRegImm(r1, 0.2)
r0 = MaxRegReg(r0, r1)
Output[0] = r0

Register allocation uses the simple algorithm that I blogged about a few years back. It's a single-pass algorithm that trades efficienty for speed and determinism; that blog post has much more detail and interactive demos.

The bytecode interpreter uses 256 registers, so the register index is stored in a u8. If we run out of registers, then the allocator inserts LOAD and STORE operations to write to secondary memory (using a u32 index).

This bytecode tape is the end of the front-end; we now proceed to the backend for evaluation!

Backend: fast, flexible evaluation

The Fidget backend is decoupled from the front-end with a set of traits (Function, TracingEvaluator, and BulkEvaluator), which roughly represent "things that can be evaluated".

This decoupling means that algorithms can target a generic Function, instead of being tightly coupled to our math tree implementation. Right now, there aren't any non-math-tree implementers of the Function trait, but this could change in the future. In the current implementation, we have two different flavors of math tree evaluation: interpreted bytecode, and JIT-compiled functions.

Here's a high-level sketch of the evaluation process:

Diagram showing the following pipeline: Function -> Tape, Tape + Input arrays -> Evaluator -> Output arrays + Trace

For our math expressions, the Tape object is the same bytecode tape that we constructed earlier. The Evaluator is a very simple interpreter, which iterates over the bytecode tape and executes each opcode.

There are four different evaluation modes:

Single-point and array (SIMD) evaluation
Forward-mode automatic differentiation
Interval arithmetic

Bulk and SIMD evaluation

Bulk evaluation is straight-forward: the user provides an array of input values, and receives an array of outputs. When using the JIT backend, we generate SIMD code which processes 4 (AArch64) or 8 (x86-64) element at a time.

$$ [1, 2, 3, 4] + [5, 6, 7, 8] = [6, 8, 10, 12]$$

Forward-mode automatic differentiation

The derivative evaluator calculates both a value and up to three partial derivatives (with respect to some variables). For implicit surfaces, it's typically used to calculate the tuple

$$(f, \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z})(x, y, z)$$

At the surface (where $f(x, y, z) = 0$), the partial derivatives are a good approximation of the surface normal, which can be used for shading:

Sphere with cutouts, showing normals

Evaluation uses forward-mode automatic differentiation; values in registers are augmented with derivatives, and the chain rule is used at each step of evaluation. The JIT evaluator packs the value and three derivatives into a single 4 x f32 register for efficiency.

Interval arithmetic

Interval arithmetic lets us evaluate expressions over ranges of input values: instead of assigning $x = 1$, we can say $1 \le x \le 5$. The output will also be an interval, e.g. $2 \le f(x, y, z) \le 20$.

The results of interval arithemetic are conservative: they may not tightly bound the true range of the function, but they are guaranteed to contain all possible outputs (for input values within the given input intervals).

Interval arithmetic is a core building block for evaluation of implicit surfaces. If you evaluate a spatial region (using intervals for $x$, $y$, $z$) and find an interval that's unambiguously greater than 0, that entire region is outside your shape and doesn't need to be considered any further!

In the render below, green and red regions were proved inside / outside the shape by interval arithmetic; only the black and white pixels were evaluated by the point-wise evaluation.

hello world

Skipping regions based on interval arithmetic results is already a significant optimization over bulk evaluation, but wait – there's more!

Tracing evaluation and simplification

Our interval arithmetic evaluator also captures a trace of its execution. For example, consider the following min, evaluated using interval arithmetic:

$$\min(a, b) \text{ where } 0 \le a \le 1 \text{ and } 4 \le b \le 5$$

Because $a$ is unambiguously less than $b$ (within this interval), the expression can be simplified from $\min(a, b) \rightarrow a$.

Each min and max operation records a single choice, indicating which arguments affect the result (left-hand side, right-hand side, or both). These choices can be used to simplify the original function. In our $\min(a, b)$ example, we don't need to calculate $b$ or $\min$ if we've simplified the expression to just $a$.

The One Weird Trick

The combination of interval arithmetic and tape simplification is the One Weird Trick that makes working with large-scale expressions feasible. I'm going to circle around this idea from a few different angles, because it's critically important!

Interval arithmetic and tape simplification have a positive synergy: we can both skip inactive (empty / full) regions of space, and evaluating the remaining active regions becomes cheaper.

You can also think of tape simplification as analogous to a spatial acceleration data structure: we're computing a simplified expression that's only valid in a particular spatial region. Unlike typical raytracing acceleration data structures, we're building the accelerator dynamically during evaluation.

For rasterization, the cost of interval evaluation is amortized across many pixels in the interval. Interval evaluation of an $N \times N$ pixel region with a tape length of $T$ is $O(T)$, i.e. only proportional to the tape length, and independent of pixel count.

By the time we get down to single-pixel evaluation (e.g. for an $M \times M$ region, $M < N$), we're working on dramatically shorter tapes. At this point, evaluation must be done for every pixel, but the cost is $O(T' \times M \times M)$, where $T' < T$ (the tape is significantly shorter).

Both of these effects are even more significant for 3D rendering, where we're dealing with $N^3$ voxels instead of $N^2$ pixels!

To get some concrete numbers, let's take a closer look at the hello, world image above. We're rendering a 256×256 image (2D), and the original tape is 254 clauses. Here's how things go down:

Interval evaluation of 32×32 pixel tiles, of which there are 64
- Empty regions are skipped, leaving 47 active tiles
- The average tape length in active tiles has shrunk to 73 clauses
Each of these tiles is subdivided into 16 smaller tiles (8×8 pixel)
Interval evaluation of 8×8 pixel tiles, of which there are 752
- Empty regions are skipped, leaving 351 active tiles
- The average tape length in active tiles has shrunk to 20 clauses
Per-pixel evaluation of remaining 8×8 pixel tiles, of which there are 351

Here are the active tiles positions and tape lengths after simplification:

diagram of tape lengths, showing the same data as above

By the time we're doing per-pixel evaluation, the tape is more than 10× shorter than its original length!

Of course, not every shape is amenable to this kind of on-the-fly simplification. Fidget supports simplifying min and max operations (used in CSG), as well as logical ops (and and or). Shapes without CSG or logic won't benefit from simplification; however, they may still benefit from interval-arithmetic-based skipping of empty and full regions.

JIT compilation

The bytecode interpreter is a tight loop, but suffers from inevitable overhead:

Instruction dispatch is a single unpredictable branch (see "Beating the compiler" for details on interpreter optimization)
Each instruction reads and writes to memory, in the form of register slots within the VM evaluator

For maximum performance, Fidget also includes a JIT compiler, which lowers bytecode down to machine instructions! The machine instructions are straight-line code (without any dispatch), and use physical registers directly (so no reading and writing to memory).

Consider $\sqrt{x^2 + y^2} - 1$, which generates the following bytecode:

r0 = Input[1]
r0 = SquareReg(r0)
r1 = Input[0]
r1 = SquareReg(r1)
r0 = AddRegReg(r0, r1)
r0 = SqrtReg(r0)
r0 = SubRegImm(r0, 1.0)
Output[0] = r0

The input to the JIT compiler is the same bytecode tape as before, but planned with 12 (x86-64) or 24 (AArch64) physical registers, instead of the VM default of 255. (This isn't visible in the bytecode above, which only uses two registers)

These register limits mean that bytecode registers can directly be mapped to physical registers in hardware: v8-31 on AArch64, and xmm4-15 on x86-64.

With the help of godbolt.org, I hand-wrote snippets of assembly for every single opcode × data type × architecture. This was very tedious! However, it makes JIT compilation simple: patch in the desired physical registers within each snippet, then copy them into an mmap'd region of memory.

The bytecode above is turned into the following function on an AArch64 system:

; Preserve frame and link register, and set up the frame pointer
stp   x29, x30, [sp, 0x0]
mov   x29, sp

; Preserve callee-saved floating-point registers
stp   d8, d9, [sp, 0x10]

ldr s8, [x0, 4]         ; r0 = Input[1]
fmul s8, s8, s8         ; r0 = SquareReg(r0)
ldr s9, [x0, 0]         ; r1 = Input[0]
fmul s9, s9, s9         ; r1 = SquareReg(r1)
fadd s8, s8, s9         ; r0 = AddRegReg(r0, r1)
fsqrt s8, s8            ; r0 = SqrtReg(r0)
movz w9, 0x3f80, lsl 16 ; load 1.0 into s3
fmov s3, w9
fsub s8, s8, s3         ; r0 = SubRegImm(r0, 1.0)
str s8, [x1, 0]         ; Output[0] = r0

; Restore frame and link register
ldp   x29, x30, [sp, 0x0]

; Restore callee-saved floating-point registers
ldp   d8, d9, [sp, 0x10]

ret

At the Rust level, this chunk of memory can be cast to a function pointer, i.e.

unsafe extern "C" fn(
    *const f32, // x0: inputs
    *mut f32    // x1: outputs
);

Rust slices can be cast to raw pointers for inputs and outputs, e.g.

let in = [0.0, 1.0]; // [x, y]
let mut out = [0.0];

unsafe {
    (fn_ptr)(in.as_ptr(), out.as_mut_ptr());
}
println!("{out:?}");

JIT compilation has a dramatic effect on evaluation time. Here's a relatively complex example, built from 7867 expressions:

monologue from The Tempest

Doing brute force evaluation on 1024² pixels, the bytecode interpreter takes 5.8 seconds, while the JIT backend takes 182 milliseconds – a 31× speedup!

Note that the speedup is less dramatic with smarter algorithms; brute force doesn't take advantage of interval arithmetic or tape simplification! The optimized rendering implementation in Fidget draws this image in 6 ms using the bytecode interpreter, or 4.6 ms using the JIT backend, so the improvement is only about 25%.

Algorithms

With a foundation of fast evaluation, the sky's the limit!

Rendering

All of the models in this writeup are rendered using algorithms from fidget::render. Rendering uses the same core algorithm presented in my SIGGRAPH paper:

Render large spatial regions using interval arithmetic
Generate shortened tapes based on traces
Subdivide and recurse on ambiguous regions
Calculate normals using partial derivatives (3D rendering only)

(There's nothing particularly exotic here, for folks familiar with the literature; it's a particularly well-tuned implementation of ideas dating back to the 90s)

Models are transformed by a 4×4 homogeneous matrix as part of rendering, which means that perspective transforms are supported. Rendering typically produces two images: a heightmap, and per-pixel normals.

Architectural model, shown with depth and normals

These can be drawn using standard deferred rendering techniques, e.g. SSAO (left as an exercise for the reader).

Meshing

Fidget implements Manifold Dual Contouring for meshing.

Architectural model, shown as a model + wireframe

The implementation in Fidget should always produce meshes that are

Watertight
Manifold
Preserving sharp features (edges / corners)
Adaptive; triangles are less dense in mostly-flat regions

However, it has the following deficiencies:

It will not necessarily preserve thin features
The resulting meshes may include self-intersections
Vertex positioning is susceptible to adversarial cases

Looking closely at the model, the topology leaves something to be desired:

Architectural model, wireframe close-up

It may seem weird to write out all of the ways meshing can go wrong – why not just make it better? – but good meshing of arbitrary implicit surfaces remains an unsolved problem!

Whenever someone publishes a new meshing strategy, I go through my checklist and see how it scores; Manifold Dual Contouring, while not perfect, currently occupies a sweet spot of simplicity and performance.

Demos

The Fidget repository includes a handful of demos, with a great deal of implementation variety. I'd like to focus on the web GUI, because it's the most interesting; there's also a simple CLI (fidget-cli) and native script viewer (fidget-viewer).

You may remember the web editor from the very beginning of the writeup, some 3000 words ago! You can still try it out:

This demo was an opportunity for me to explore more modern (and in some cases, bleeding-edge) web development. There's a lot going on:

The GUI is written in TypeScript!
The Fidget crate is used as a library, rather than driving the event loop.
The text editor is CodeMirror, so I have to use Node modules...
...which requires a bundler; I picked webpack somewhat arbitrarily.
Script evaluation and rendering are both done by web worker threads, to avoid blocking the main event loop.
Rendering is parallelized, using wasm-bindgen-rayon to work around the lack of std::thread in the browser.
Workers share memory with the main event loop. We can cancel long-running renders if the user provides new input, by setting a flag in an Arc<AtomicBool> that's shared with the worker thread.

All of this was a terrible nightmare! Each individual component deserves praise for including a working example; unfortunately, they all used mutually exclusive bundlers, settings, servers, etc. Combining them all into a working system was quite a challenge!

(For example, I had to pin an old version of wasm-bindgen because they recently fixed a bug that was load-bearing for wasm-bindgen-rayon)

Still, I'm pleased with how it turned out. It even works on my phone, albeit without camera controls (since I'm using mouse events instead of touch events).

Fidget demo running on my phone

Aside: Meditations on demos

I always have mixed feelings about making demos for my projects. Fidget is first and foremost a library; people should be embedding it into their own projects as infrastructure, not using my demos as actual CAD tools.

However, the population of Weird CAD People (which is small to begin with!) breaks down into roughly the following groups:

Venn diagram of people who want to use weird CAD tools versus write their own weird CAD tools

Many more people will be interacting with the demos – even to the point of using them for design work – than building tools with the libraries. There's a tension between making the demos more useful for this larger audience, versus making the libraries better for a smaller set of users.

Historically, I've struggled to maintain a kernel and full-fledged CAD UI simultaneously, and have been reducing the scope of demos accordingly – but as you can tell from the web editor, even a "minimal" demo is a significant project.

I don't know how this will turn out with Fidget! My plan is roughly the following:

Keep following my interests, to preserve my motivation and focus
(the most limited resources!)
Accept suggestions from people using the tools, while setting expectations at a reasonable level
Prioritize feedback from people in the smaller circles of tool-builders, because – in an ideal world – they would take pressure off my demos by building full-fledged tools that others could use

Future possibilities

I've been quietly working on Fidget for about 2.5 years. In that time, it has evolved significantly – for example, it started as an LLVM-backed evaluator! I expect it to continue evolving, albeit at a slower pace, as I continue to find the "right" APIs and abstractions.

Let me discuss a few obvious future directions, before wrapping up.

Doing stuff on the GPU

I wrote a whole paper about this, so adding a GPU backend seems like an obvious extension. In fact, I have already done this, in the wgpu-bytecode branch!

However, performance isn't particularly compelling, at least on my Apple M1 Max laptop. In particular, the bytecode interpreter loop appears to be incredibly inefficient. I'm continuing to investigate the root cause, and will likely write up a blog post with more details.

(If you're familiar with Metal shader optimization, feel free to reach out, and I will happily brain-dump in your direction)

Better meshing

For serious users of the library, this is the elephant in the room. Fidget is using the same meshing strategy as libfive, but without a bunch of fine-tuning that makes the latter more robust.

I haven't spent much time on this, because I'm fundamentally unsatisfied with the available options. I would rather spend time implementing a bulletproof meshing algorithm, rather than adding epicycles to dual contouring. Unfortunately, I haven't seen anything in the literature that meets my requirements, and haven't yet devised anything better myself.

We'll see how things shake out! Depending on user demand, I would probably be willing to do some fine-tuning, but also want to keep an eye out for better options.

Standard library of shapes and transforms

Over the past few generations of software, I've been porting the Fab Modules's standard library of shapes into each new tool. This is tedious, but gives a somewhat-standard foundation for higher-level modeling.

libfive hit a high point in complexity, where I wrote each shape in C++, then automatically generated C, Python, and Scheme bindings from the header file. The explanation in the README has a certain "statements dreamed up by the utterly deranged" energy to it:

libfive_stdlib.h is both a C header and a structured document parsed by a helper script.

We're still brainstorming the right approach for Fidget; see fidget#145 for ongoing discussions. Dragging the fab shapes library into Rust would work, but I'm wondering if we can do better – for example, Inigo Quilez's library of primitives takes advantage of GLSL vectors to produce much more elegant code.

Bindings to higher-level languages

I'm a big fan of wrapping interactive shells around a compiled core. Right now, Fidget only has bindings to Rhai, which was chosen because it's one of the few mature Rust-first scripting languages. Rhai has been very easy to integrate (and I love that it compiles into WebAssembly), but I suspect that most users would prefer Python or Node bindings.

Then, there's the question of how to write those bindings: a C API would let people lean on the FFI libraries of their favorite languages, but feels like a step down from Fidget's Rust-first design. Similarly, once we have a standard library (discussed above), it would be great for it to be automatically exposed in each binding, with proper ergonomics (docstrings, default arguments, etc).

As always, feel free to reach out if you have thoughts!

Conclusion

Ever since initial publication, Fidget's README has described it as "quietly public"; 19 versions have been published to crates.io, and a few people have started to build on it, but I haven't been talking about it outside of semi-private spaces.

This blog posts marks the end of the quietly public phase: it's now loudly public.

Go try it out, build some stuff, tell your friends, and let me know how it goes!

The source code is on Github and can be added to a Rust project with cargo add fidget. It is released under a weak copyleft license (MPL 2.0), which is friendly for both OSS and commercial usage.

Thanks to everyone who contributed to this post and the project as a whole: Martin Galese, Bruce Mitchener, Avi Bryant, Blake Courter, Johnathon Selstad, Bradley Rothenberg, Moritz Mœller, Luke V, and various others in all the group chats.

Matt Keeter // Fidget