An x86-64 backend for raven-uxn

Uxn is a fictional CPU, used as a target for various applications in the Hundred Rabbits ecosystem. It's a simple stack machine with 256 instructions:

Screenshot of Uxn opcodes

My implementation of the Uxn CPU now has an x86-64 assembly implementation, which is about twice as fast as my Rust implementation. This required porting about 2000 lines of ARM64 assembly to x86-64, which was accomplished with the help of a robot buddy.


Let me provide a little more context.

A few years back, I wrote a Rust implementation of the CPU and peripherals, which was 10-20% faster than the reference implementation. For more background info, see that project's writeup:

Screenshot of Varvara

The Rust implementation is fast, but suffers from the usual downsides of a bytecode-based VM: the main dispatch statement is an unpredictable branch.

I then wrote an assembly implementation of the interpreter, which proved to be about 30% faster than the Rust version. This was hard: it took several days of work, and there were lingering bugs that I didn't discover until I added a fuzz tester to check for discrepancies between the Rust and assembly implementation.

The assembly implementation is written for an ARM64 target, for two reasons:

My blog post about the assembly implementation concludes with an optimistic statement:

On a brighter note, it should be relatively easy to port all of the assembly code to x86-64, but I'll leave that as a challenge for someone else!

I wrote that back in late 2024, and no one had yet risen to the challenge, so I decided to do it (kinda) myself. Because this is early 2026, you may know where this is going: the first draft was written autonomously by Claude Code.

Yes, that's right – it's finally my turn to test out the hip new coding agents on a problem that I know relatively well.

(This blog post was 100% written by me, a fleshy human, because I think that passing off AI-written text as human-authored is an insult to the reader)

How did it do?

In short, it did a great job of going from "zero to one": if I was given a blank text editor and asked to write the x86 implementations of every Uxn opcode, I would have done much worse.

The resulting implementation worked, passing both my unit tests and the fuzzer.

This was all basically autonomous: I deliberately did not help the agent with any implementation or debugging details, limiting my feedback to high-level strategy.

The assembly itself was of middling quality – and I then spent a while improving it – but the agent provided an invaluable boost of momentum to kick off the work.

The whole thing cost about $29, billed through an enterprise plan. I'm not sure how this would have gone with an unmetered plan, e.g. whether I would have hit usage limits midway through the process.

The implementation took a few hours of work, but only 15-20 minutes of hands-on time; the main speed limit was me noticing that it was waiting for approval to run a new command.

(This was all running on a disposable Oxide Computer VM, so I probably should have just run it with --dangerously-skip-permissions)

The implementation process

I started by giving the agent an overview of the problem and a description of my existing implementation:

The raven-uxn project implements a fictional CPU. There are two implementations: a safe Rust implementation, and a native code implementation. In the native implementation, we have hand-written assembly functions for each of the 256 opcodes, written with tail recursion so each instruction jumps to the next instruction. This is fast because there's no big case statement dispatching. However, the x86 implementation isn't yet working. Get it working: it should build with cargo build --features=native.

It successfully added an x86 assembly backend and got it compiling, which required a few rounds of tweaking the assembly syntax and re-running cargo build. At this point, I told the agent how to run unit tests:

Now that it's building, it should pass tests with cargo test -praven-uxn --features=native.

The agent found a bunch of failing unit tests, and fixed them mostly-autonomously. My contribution at this stage was telling it to fix one opcode at a time, because it was trying to devise a Grand Plan to fix every single failing test in one pass, and going off the rails. Because failures typically presented as SIGSEGV intead of assertion errors, I also instructed the agent to run tests one by one – otherwise, it was impossible to tell which specific opcode test caused the segfault.

Finally, once tests were passing, I told the agent how to run the fuzzer:

Great, now let's test it with fuzzing. You can run cargo +nightly fuzz run --release fuzz-native -- -max_total_time=10 -verbosity=0 to fuzz for 10 seconds and see if it finds anything.

A bunch more failures cropped up at this point, and it continued to debug them autonomously. Interestingly, at this point it started writing ad-hoc gdb scripts to probe the interpreter's internal state:

break _DEI
commands
  info registers rsp rbp
  x/4gx $rsp+0xd0
  x/4gx $rsp+0xd8
  x/4gx $rsp+0xe0
  cont
end
break raven_uxn::native::dei_entry
run --test-threads=1 dei
info registers rdi rsi
x/4gx $rsi
p *(raven_uxn::native::DeviceHandle*)$rsi

The internal thinking traces are a fascinating artifact; the thing debugs like a goldfish with logorrhea:

The comment says "orig - 8 - 48 - 0x98 = orig - 0xd0" and then "[orig+8] = [rsp + 0xd0 + 8] = [rsp + 0xd8]". But this is wrong. If "orig" = rsp BEFORE the call (i.e., rsp_caller), then: 7th arg is at [orig - 8] BEFORE the call (pushed last by caller)... no, in SysV x86_64:

Actually the comment's definition of "orig" is inconsistent. Let me figure out what "orig" means in the comment:

So "orig" is the reference point where the return address sits at [orig - 8], the 7th argument at [orig + 8], and the 8th argument at [orig + 16]. This matches the x86_64 SysV ABI where after the call instruction, the stack layout has the return address at the top and the caller's arguments positioned above it. I'm working through the stack frame layout and realizing the comment's offset claims don't align with the actual memory addresses—the math just doesn't check out. But from the debugger output, I can see the 7th argument (vm) is actually at [rsp+0xd...], which gives me the concrete data I need to move forward.

After a bunch more round-trips through the fuzzer, it stopped finding bugs and just started running the fuzzer for longer and longer periods. After 60 seconds of fuzzing with no hits, I declared success.

Why did this work?

This won't be a surprising sentiment if you've read blog posts of this nature: the agent worked well because there was a comprehensive test suite and a fuzzing harness, so it could easily close the loop.

The first implementation did not compile; once it compiled, it did not pass unit tests; once it passed unit tests, it did not pass fuzz testing. Having all of these layers of (machine-checkable) tests was necessary to get a fully working implementation.

I suspect it also worked because the problem is translation flavored: there was a full ARM64 assembly implementation, and translating from one assembly flavor to another is easier than writing it from a high-level specification (or even from the Rust code).

How was the code?

I'm not an x86 assembly expert, but even I could tell that there were a few questionable decisions. Let me give you a few examples.

Claude seemed to get caller / callee registers confused: it properly handled callee-saved registers in the function prologue and epilogue, but also insisted on saving them before doing a call to an external function. This increased stack usage and added a bunch of unnecessary instructions to each external function call:

; Save all interpreter state to the stack frame and set up args for C call
; C calling convention: arg1=rdi (VM ptr), arg2=rsi (DeviceHandle ptr)
.macro precall
    ; Write stack indices back through the pointers saved at entry
    mov rax, qword ptr [rsp + 0x30]   ; stack_index pointer
    mov byte ptr [rax], r12b
    mov rax, qword ptr [rsp + 0x38]   ; ret_index pointer
    mov byte ptr [rax], r14b

    ; Save interpreter registers
    ; [Human note: all of these are callee-saved!]
    mov qword ptr [rsp + 0x58], rbx
    mov qword ptr [rsp + 0x60], r12
    mov qword ptr [rsp + 0x68], r13
    mov qword ptr [rsp + 0x70], r14
    mov qword ptr [rsp + 0x78], r15
    mov qword ptr [rsp + 0x80], rbp

    ; Set up args: VM ptr and DeviceHandle ptr
    mov rdi, qword ptr [rsp + 0x40]
    mov rsi, qword ptr [rsp + 0x48]
.endm

It was also obsessed with using eax for everything, to its own detriment! The functions would often shuffle data into eax initially, then move it to a different register to make room for putting more data into eax. In the System-V ABI, there are nine scratch registers available, and I found that I could often tighten the code by using them:

; Claude's initial DIV2 implementation
;
; Note that it pushes to the x86 stack because
; it keeps using rax for temporary values!
_DIV2:
    movzx eax, byte ptr [rbx + r12]
    stk_pop
    movzx ecx, byte ptr [rbx + r12]
    stk_pop
    shl ecx, 8
    or eax, ecx                       ; b (divisor, top short)

    movzx ecx, byte ptr [rbx + r12]
    stk_pop
    movzx edx, byte ptr [rbx + r12]
    shl edx, 8
    or ecx, edx                       ; a (dividend, second short)

    ; 16-bit unsigned divide: a / b
    push rax                          ; save divisor (b) onto x86 stack [?!]
    mov eax, ecx                      ; dividend (a) in eax
    movzx eax, ax
    xor edx, edx
    pop rcx                           ; restore divisor into ecx
    movzx ecx, cx
    test cx, cx
    jz 1f
    div cx                            ; ax = a / b
    jmp 2f
1:
    xor eax, eax                      ; div by zero → 0
2:
    movzx r8d, al                     ; save result_lo
    shr eax, 8
    mov byte ptr [rbx + r12], al      ; store result_hi at current pos
    stk_push r8b                      ; push result_lo on top
    next

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Version tweaked by human
;
; This version loads the divisor / dividend directly into ecx and eax
_DIV2:
    movzx ecx, byte ptr [rbx + r12]
    stk_pop
    movzx r9d, byte ptr [rbx + r12]
    stk_pop
    shl r9d, 8
    or ecx, r9d                        ; ecx = b (divisor)

    movzx eax, byte ptr [rbx + r12]
    stk_pop
    movzx edx, byte ptr [rbx + r12]
    shl edx, 8
    or eax, edx                        ; eax = a (dividend), already zero-extended

    ; 16-bit unsigned divide: a / b
    xor edx, edx
    test cx, cx
    jz 1f
    div cx                             ; ax = a / b
    jmp 2f
1:
    xor eax, eax                       ; div by zero → 0
2:
    mov r8b, al                        ; save result_lo
    shr eax, 8
    mov byte ptr [rbx + r12], al       ; store result_hi
    stk_push r8b
    next

Finally, Claude was hesistant to use 8 and 16-bit operations, preferring to use 32-bit operations then mask the results. This behavior is likely a legacy of translating the ARM assembly, which used the "operation then mask" pattern everywhere because the ISA does not have instructions for 8 or 16-bit wrapping arithmetic.

These idiosyncracies made a difference, squeezing another non-trivial speedup out of the test ROMs that I was benchmarking:

FibonacciMandelbrot
Rust4.28 ms341 ms
x86 (initial)2.45 ms213 ms
x86 (improved)1.70 ms187 ms

One caveat applies: this was using Sonnet 4.6, and it's possible that Opus 4.6 would do a better job out of the gate. I also didn't yet have a closed-loop harness for performance testing, so I couldn't just tell the AI to make it faster.

Debugging a human-introduced bug

After doing all of this human cleanup, the fuzzer found a crash:

INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 3840651835
INFO: Loaded 1 modules   (4523 inline 8-bit counters): 4523 [0x55e78ed1fcd0, 0x55e78ed20e7b),
INFO: Loaded 1 PC tables (4523 PCs): 4523 [0x55e78ed20e80,0x55e78ed32930),
fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native: Running 1 inputs 1 time(s) each.
Running: ../foo.rom
=================================================================
==2999==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x7ede9fd08800 in thread T0
    #0 0x55e78ec25556  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x10e556) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #1 0x55e78ec7a16e  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x16316e) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #2 0x55e78ec7bf41  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x164f41) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #3 0x55e78ec84ae8  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x16dae8) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #4 0x55e78ec85448  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x16e448) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #5 0x55e78ec8422d  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x16d22d) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #6 0x55e78ec8b9d5  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x1749d5) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #7 0x55e78eca66d6  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x18f6d6) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #8 0x55e78ecaf002  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x198002) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #9 0x55e78eccd4f6  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x1b64f6) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #10 0x7fcea07fdd8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: 095c7ba148aeca81668091f718047078d57efddb)
    #11 0x7fcea07fde3f  (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) (BuildId: 095c7ba148aeca81668091f718047078d57efddb)
    #12 0x55e78eb98a24  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x81a24) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)

0x7ede9fd08800 is located 16384 bytes after 65536-byte region [0x7ede9fcf4800,0x7ede9fd04800)
allocated by thread T0 here:
    #0 0x55e78ec259c9  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x10e9c9) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #1 0x55e78ec801fc  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x1691fc) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #2 0x55e78ec79954  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x162954) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #3 0x55e78ec7bf41  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x164f41) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #4 0x55e78ec84ae8  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x16dae8) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #5 0x55e78ec85448  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x16e448) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #6 0x55e78ec8422d  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x16d22d) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #7 0x55e78ec8b9d5  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x1749d5) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #8 0x55e78eca66d6  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x18f6d6) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #9 0x55e78ecaf002  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x198002) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #10 0x55e78eccd4f6  (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x1b64f6) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
    #11 0x7fcea07fdd8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: 095c7ba148aeca81668091f718047078d57efddb)

SUMMARY: AddressSanitizer: bad-free (/home/ubuntu/raven/fuzz/target/x86_64-unknown-linux-gnu/release/fuzz-native+0x10e556) (BuildId: 0a135d2c356e27bb9ccb7046833c897d032c9b50)
==2999==ABORTING

(to be clear, this was something that I introduced when refactoring)

Out of the gate, this already has "nightmare bug" vibes: the fuzzer isn't failing because the interpreter and assembly implementation diverged in behavior, but due to an AddressSanitizer failure when freeing memory! I've seen this before, and it's never fun: it means that something in the assembly implementation is stomping over unrelated memory.

This was also a case of the fuzzer getting very, very lucky: it found this bug once, but in subsequent tests (with the bug still present) ran for hours without finding it again. (While debugging, I was seriously starting to wonder if it was a bug in libfuzzer itself)

The program that triggers this crash is
SUB EQUk STZ2k ROT2 EQUr EORkr GTHkr SUB JCN2r

Its behavior isn't obvious to me, and it's surprisingly non-trivial. When the program terminates, the return stack is full of alternating 1 and 0, and the data stack has a more complex pattern of values – 120 zeros, then the following:

[0x00, 0x00, 0x00, 0x12, 0x12, 0x00, 0x00, 0xee,
 0x12, 0x12, 0x11, 0x11, 0x12, 0x11, 0x11, 0x11,
 0x11, 0x10, 0x10, 0x11, 0x10, 0x10, 0x10, 0x10,
 0x0f, 0x0f, 0x10, 0x0f, 0x0f, 0x0f, 0x0f, 0x0e,
 0x0e, 0x0f, 0x0e, 0x0e, 0x0e, 0x0e, 0x0d, 0x0d,
 0x0e, 0x0d, 0x0d, 0x0d, 0x0d, 0x0c, 0x0c, 0x0d,
 0x0c, 0x0c, 0x0c, 0x0c, 0x0b, 0x0b, 0x0c, 0x0b,
 0x0b, 0x0b, 0x0b, 0x0a, 0x0a, 0x0b, 0x0a, 0x0a,
 0x0a, 0x0a, 0x09, 0x09, 0x0a, 0x09, 0x09, 0x09,
 0x09, 0x08, 0x08, 0x09, 0x08, 0x08, 0x08, 0x08,
 0x07, 0x07, 0x08, 0x07, 0x07, 0x07, 0x07, 0x06,
 0x06, 0x07, 0x06, 0x06, 0x06, 0x06, 0x05, 0x05,
 0x06, 0x05, 0x05, 0x05, 0x05, 0x04, 0x04, 0x05,
 0x04, 0x04, 0x04, 0x04, 0x03, 0x03, 0x04, 0x03,
 0x03, 0x03, 0x03, 0x02, 0x02, 0x03, 0x02, 0x02,
 0x02, 0x02, 0x01, 0x01, 0x02, 0x01, 0x01, 0x01,
 0x01, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00]

It's clearly executing some kind of looping algorithm before it terminates.

To make matters worse, the program also runs fine using the raven-cli executable; the fuzzer gets (un)lucky that the program stomps on RAM that is monitored by AddressSanitizer.

I sicced Claude (Sonnet 4.6) on this, but it mostly spun its wheels; it's hard to tell whether that's because it wasn't making forward progress, or whether Anthropic's servers were particularly overloaded that day.

Eventually, I tracked it down myself: it was an out-of-bounds write of 0 in the STR instruction, which wrote to an address before the start of the VM's RAM. The correct location in the VM's RAM was already 0, so the interpreter and assembly implementation didn't diverge.

There was also a second reason the bug was so hard: as you may notice, STR wasn't in the program! The bytecode program writes data to RAM, then jumps to that address; the VM then treats that data as further bytecode.

Quick aside: The easy way to debug this issue

I spent a while doing printf debugging, which was not the best way to do it;
as it turns out, Valgrind finds the out-of-bounds write, even when running the (seemingly-fine) raven-cli:

==61880== Invalid write of size 1
==61880==    at 0x41D4E4B: ??? (in /home/ubuntu/raven/target/release/raven-cli)
==61880==  Address 0x4cefbf0 is 16 bytes before a block of size 65,536 alloc'd
==61880==    at 0x4A8C36C: calloc (vg_replace_malloc.c:1678)
==61880==    by 0x41D8866: alloc_zeroed (alloc.rs:178)
==61880==    by 0x41D8866: alloc_impl_runtime (alloc.rs:190)
==61880==    by 0x41D8866: alloc_impl (alloc.rs:312)
==61880==    by 0x41D8866: allocate_zeroed (alloc.rs:435)
             # etc, etc

However, the address space is nonsense (0x41D4E4B).

It's then possible to combine it with GDB, by starting Valgrind with

$ valgrind --vgdb=yes --vgdb-error=0 ./target/release/raven-cli --native ../foo.rom

Then, from within GDB:

(gdb) target remote | /snap/valgrind/181/usr/libexec/valgrind/../../bin/vgdb
Remote debugging using | /snap/valgrind/181/usr/libexec/valgrind/../../bin/vgdb
relaying data between gdb and process 61880
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Reading symbols from /usr/lib/debug/.build-id/8c/fa19934886748ff4603da8aa8fdb0c2402b8cf.debug...
0x000000000425c290 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) cont
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x00000000041d4e4b in _STR ()
(gdb) disas
Dump of assembler code for function _STR:
   0x00000000041d4e39 <+0>:     movsbq (%rbx,%r12,1),%rax
   0x00000000041d4e3e <+5>:     dec    %r12b
   0x00000000041d4e41 <+8>:     mov    (%rbx,%r12,1),%cl
   0x00000000041d4e45 <+12>:    dec    %r12b
   0x00000000041d4e48 <+15>:    add    %rbp,%rax
=> 0x00000000041d4e4b <+18>:    mov    %cl,(%r15,%rax,1)
   0x00000000041d4e4f <+22>:    movzbl (%r15,%rbp,1),%eax
   0x00000000041d4e54 <+27>:    inc    %bp
   0x00000000041d4e57 <+30>:    lea    0x651a2(%rip),%rcx        # 0x423a000
   0x00000000041d4e5e <+37>:    jmp    *(%rcx,%rax,8)
End of assembler dump.

This is dead on, and would have saved me a few hours of frustration!

(For what it's worth, I'm comfortable with both Valgrind and GDB, but didn't know how to combine them; Claude Web helpfully provided the right commands)

Second aside: Hitting it with a bigger model

Out of curiosity, I reintroduced the bug into the codebase and threw Opus 4.6 (1M context window) at it, with --dangerously-skip-permissions:

raven-uxn implements several interpreters for the Uxn virtual machine. There's one implementation in Rust, but the interesting ones are in raw assembly. Writing raw assembly improves performance because we can write threaded code, where each instruction jumps directly to the next (Rust can't do this because it lacks guaranteed tail recursion). Anyways, I've been having a rare issue with the x86 assembly backend: one particular program sequence fails in fuzzing. The program sequence is the following opcodes: SUB EQUk STZ2k ROT2 EQUr EORkr GTHkr SUB JCN2r. When run in the fuzzer, this triggers an AddressSanitizer error. Interestingly, it does not trigger a check for discrepencies between the interpreter and assembly implementations, so it's producing the correct behavior (or incorrect behavior that doesn't change the end state of the VM). You can reproduce this with cargo +nightly fuzz run --release fuzz-native foo.rom. Your mission is to track down whatever is causing this issue.

I then went upstairs to make myself a cup of tea.

When I came back downstairs (after about 10 minutes), it had not solved the problem; indeed, it took a whole 18 minutes to figure it out. Along the way, it fixed five other instances of the incorrect pattern, which I hadn't noticed.

All of this — and subsequent semi-automated cleanups — cost another $25.

Opus did okay but not great at automated cleanup ("find all cases where we do a 32-bit load but only use the lowest 8 bits, and replace them with 8-bit loads"). It would often declare that it had fixed everything, only for me to find more examples of the undesirable pattern.

Is raven-uxn slop now?

I don't know, you tell me – ideally on social media, with personal insults and imprecations about my character!

Back in 2024, if someone had taken me up on my suggestion to write an x86 backend and had opened a PR with the same code that Claude delivered, I would have given it a similar amount of review / editing before merging it in.

(Honestly, I would have made fewer changes to a human PR, because I'm sensitive to completely ripping up someone's work; the diff from the original agent's implementation is substantial)

Is that implementation irrevocably tainted by its source, even after my edits?

A second perspective: if someone in 2026 had opened a PR with this same code and told me that Claude wrote it, I probably wouldn't have merged it – I don't trust strangers to apply the same level of engineering rigor when using LLMs.

Finally, this wouldn't have gotten done without Claude Code: I've got too much else to do, and the activation energy was too high. Is lowering energy barriers worth polluting the cognitive ecosystem with out-of-distribution entities?

What's next?

The PR is now merged, and a new 0.2.0 release is on the way.

This experience hasn't made me a vibe-coding maximalist; I find that the act of writing code myself is necessary to build the mental model of a complex system, and concerns about cognitive debt ring true to me.

However, I was impressed by Opus 4.6's ability to debug the subtle assembly bug, and will consider reaching for it in the future. There's an old aphorism that debugging requires being as twice as clever as writing the code initially, so if you write code that's at your cleverness limit, you won't be able to debug it; if LLMs help with debugging, it frees me up to write more clever code!

(and I wish people would stop arguing that these tools don't work)