It Can Happen to You

A few days ago, a fascinating article about Grand Theft Auto Online made the rounds of the tech news ecosystem.

I'd encourage you to read the whole thing, but in short, GTA Online had accidentally quadratic performance when parsing a large JSON blob (due to repeated calls to strlen); fixing this improved loading time by almost 70%.

This sparked a great deal of discussion: Was this C's fault? Perhaps "web shit"? Capitalism and incentives?

Still, folks in the comments section generally agreed: they wouldn't write anything that silly.

(do you feel the foreshadowing?)


One of my side projects is a high-performance 3D viewer named Erizo.

Porsche

Thanks to a lot of careful programming, it will open a 97 MB binary STL file in about 165 milliseconds flat, on a 2013 Macbook Pro. This is blinding fast.

In the interest of compatibility, I wrote a small parser for ASCII STLs as well.

ASCII STLs are a poorly-specified plain-text format that looks like this:

solid cube_corner
          facet normal 0.0 -1.0 0.0
            outer loop
              vertex 0.0 0.0 0.0
              vertex 1.0 0.0 0.0
              vertex 0.0 0.0 1.0
            endloop
          endfacet
          facet normal 0.0 0.0 -1.0
            outer loop
              vertex 0.0 0.0 0.0
              vertex 0.0 1.0 0.0
              vertex 1.0 0.0 0.0
            endloop
          endfacet
          ...
endsolid

I wrote an extremely robust parser, described in a comment as

/*  The most liberal ASCII STL parser:  Ignore everything except
 *  the word 'vertex', then read three floats after each one. */

Loading ASCII STLs always seemed a little slow, but I assumed that's what you get for using an inefficient textual format.

(foreshadowing intensifies)


Over the course of a few days, several things happened:

Here's the timestamped logs (in seconds), loading a 1.5 MB ASCII STL:

[erizo] (0.000000) main.c:10      | Startup!
[erizo] (0.162895) window.c:91    | Created window
[erizo] (0.162900) window.c:95    | Made context current
[erizo] (0.168715) window.c:103   | Initialized GLEW
[erizo] (0.178329) window.c:91    | Created window
[erizo] (0.178333) window.c:95    | Made context current
[erizo] (1.818734) loader.c:109   | Parsed ASCII STL
[erizo] (1.819471) loader.c:227   | Workers have deduplicated vertices
[erizo] (1.819480) loader.c:237   | Got 5146 vertices (7982 triangles)
[erizo] (1.819530) loader.c:240   | Waiting for buffer...
[erizo] (1.819624) loader.c:326   | Allocated buffer
[erizo] (1.819691) loader.c:253   | Sent buffers to worker threads
[erizo] (1.819883) loader.c:258   | Joined worker threads
[erizo] (1.819887) loader.c:279   | Loader thread done
[erizo] (1.821291) instance.c:32  | Showed window

From startup to showing the window, it took over 1.8 seconds!

Looking at the ASCII parser with fresh eyes, the cause was glaringly obvious:

    /*  The most liberal ASCII STL parser:  Ignore everything except
     *  the word 'vertex', then read three floats after each one. */
    const char VERTEX_STR[] = "vertex ";
    while (1) {
        data = strstr(data, VERTEX_STR);
        if (!data) {
            break;
        }

        /* Skip to the first character after 'vertex' */
        data += strlen(VERTEX_STR);

        for (unsigned i=0; i < 3; ++i) {
            SKIP_WHILE(isspace);
            float f;
            const int r = sscanf(data, "%f", &f);
            ABORT_IF(r == 0 || r == EOF, "Failed to parse float");
            if (buf_size == buf_count) {
                buf_size *= 2;
                buffer = (float*)realloc(buffer, buf_size * sizeof(float));
            }
            buffer[buf_count++] = f;

            SKIP_WHILE(!isspace);
        }
    }

You may notice sscanf, happily sitting there, reading a single float off the front of the data stream and checking the length of the whole string each time.

Yes, I had made the exact same mistake as the programmers working on GTA Online: I had an accidentally quadratic parser!

Replacing the sscanf call with strtof improved startup by nearly a factor of 10: from 1.8 seconds to 199 milliseconds.

[erizo] (0.000000) main.c:10      | Startup!
[erizo] (0.178082) window.c:91    | Created window
[erizo] (0.178086) window.c:95    | Made context current
[erizo] (0.184226) window.c:103   | Initialized GLEW
[erizo] (0.194469) window.c:91    | Created window
[erizo] (0.194472) window.c:95    | Made context current
[erizo] (0.196126) loader.c:109   | Parsed ASCII STL
[erizo] (0.196866) loader.c:227   | Workers have deduplicated vertices
[erizo] (0.196871) loader.c:237   | Got 5146 vertices (7982 triangles)
[erizo] (0.196921) loader.c:240   | Waiting for buffer...
[erizo] (0.197013) loader.c:326   | Allocated buffer
[erizo] (0.197082) loader.c:253   | Sent buffers to worker threads
[erizo] (0.197303) loader.c:258   | Joined worker threads
[erizo] (0.197306) loader.c:279   | Loader thread done
[erizo] (0.199328) instance.c:32  | Showed window

This is back down in the noise of "how long does the OS take to give us an OpenGL context", which is a great place for a high-performance tool.


As someone that has been programming for many years, this was a perfectly-timed reminder that there are always pitfalls out there. The documentation for sscanf does not include a time complexity, so this is particularly tricky footgun, and I'm sure it's not the only one lurking in the darkness.

You may not get such a perfectly-timed reminder, but it's worth remembering – next time you read a fascinating story about bad programming – that it can happen to you!

(Obviously, the moral of the story is to not use sscanf to repeatedly parse single tokens off the front of a string; I'm sure you'll do fine if you can just avoid that)


Follow-up and related discussions