It Can Happen to You
A few days ago, a fascinating article about Grand Theft Auto Online made the rounds of the tech news ecosystem.
I'd encourage you to read the whole thing, but in short,
GTA Online had accidentally quadratic
performance when parsing a large JSON blob
(due to repeated calls to strlen
);
fixing this improved loading time by almost 70%.
This sparked a great deal of discussion: Was this C's fault? Perhaps "web shit"? Capitalism and incentives?
Still, folks in the comments section generally agreed: they wouldn't write anything that silly.
(do you feel the foreshadowing?)
One of my side projects is a high-performance 3D viewer named Erizo.
Thanks to a lot of careful programming, it will open a 97 MB binary STL file in about 165 milliseconds flat, on a 2013 Macbook Pro. This is blinding fast.
In the interest of compatibility, I wrote a small parser for ASCII STLs as well.
ASCII STLs are a poorly-specified plain-text format that looks like this:
solid cube_corner
facet normal 0.0 -1.0 0.0
outer loop
vertex 0.0 0.0 0.0
vertex 1.0 0.0 0.0
vertex 0.0 0.0 1.0
endloop
endfacet
facet normal 0.0 0.0 -1.0
outer loop
vertex 0.0 0.0 0.0
vertex 0.0 1.0 0.0
vertex 1.0 0.0 0.0
endloop
endfacet
...
endsolid
I wrote an extremely robust parser, described in a comment as
/* The most liberal ASCII STL parser: Ignore everything except
* the word 'vertex', then read three floats after each one. */
Loading ASCII STLs always seemed a little slow, but I assumed that's what you get for using an inefficient textual format.
(foreshadowing intensifies)
Over the course of a few days, several things happened:
- I revisited Erizo for the first time in a few years to fix a focus issue on macOS
- The GTA Online article was published
- In a follow-up discussion, I learned that
parsing could be quadratic due to repeated calls to
sscanf
- I noticed that ASCII STL loading was really quite slow.
Here's the timestamped logs (in seconds), loading a 1.5 MB ASCII STL:
[erizo] (0.000000) main.c:10 | Startup!
[erizo] (0.162895) window.c:91 | Created window
[erizo] (0.162900) window.c:95 | Made context current
[erizo] (0.168715) window.c:103 | Initialized GLEW
[erizo] (0.178329) window.c:91 | Created window
[erizo] (0.178333) window.c:95 | Made context current
[erizo] (1.818734) loader.c:109 | Parsed ASCII STL
[erizo] (1.819471) loader.c:227 | Workers have deduplicated vertices
[erizo] (1.819480) loader.c:237 | Got 5146 vertices (7982 triangles)
[erizo] (1.819530) loader.c:240 | Waiting for buffer...
[erizo] (1.819624) loader.c:326 | Allocated buffer
[erizo] (1.819691) loader.c:253 | Sent buffers to worker threads
[erizo] (1.819883) loader.c:258 | Joined worker threads
[erizo] (1.819887) loader.c:279 | Loader thread done
[erizo] (1.821291) instance.c:32 | Showed window
From startup to showing the window, it took over 1.8 seconds!
Looking at the ASCII parser with fresh eyes, the cause was glaringly obvious:
/* The most liberal ASCII STL parser: Ignore everything except
* the word 'vertex', then read three floats after each one. */
const char VERTEX_STR[] = "vertex ";
while (1) {
data = strstr(data, VERTEX_STR);
if (!data) {
break;
}
/* Skip to the first character after 'vertex' */
data += strlen(VERTEX_STR);
for (unsigned i=0; i < 3; ++i) {
SKIP_WHILE(isspace);
float f;
const int r = sscanf(data, "%f", &f);
ABORT_IF(r == 0 || r == EOF, "Failed to parse float");
if (buf_size == buf_count) {
buf_size *= 2;
buffer = (float*)realloc(buffer, buf_size * sizeof(float));
}
buffer[buf_count++] = f;
SKIP_WHILE(!isspace);
}
}
You may notice sscanf
, happily sitting there, reading a single float off
the front of the data stream and checking the length of the whole string each time.
Yes, I had made the exact same mistake as the programmers working on GTA Online: I had an accidentally quadratic parser!
Replacing the sscanf
call with strtof
improved startup by nearly a factor of
10: from 1.8 seconds to 199 milliseconds.
[erizo] (0.000000) main.c:10 | Startup!
[erizo] (0.178082) window.c:91 | Created window
[erizo] (0.178086) window.c:95 | Made context current
[erizo] (0.184226) window.c:103 | Initialized GLEW
[erizo] (0.194469) window.c:91 | Created window
[erizo] (0.194472) window.c:95 | Made context current
[erizo] (0.196126) loader.c:109 | Parsed ASCII STL
[erizo] (0.196866) loader.c:227 | Workers have deduplicated vertices
[erizo] (0.196871) loader.c:237 | Got 5146 vertices (7982 triangles)
[erizo] (0.196921) loader.c:240 | Waiting for buffer...
[erizo] (0.197013) loader.c:326 | Allocated buffer
[erizo] (0.197082) loader.c:253 | Sent buffers to worker threads
[erizo] (0.197303) loader.c:258 | Joined worker threads
[erizo] (0.197306) loader.c:279 | Loader thread done
[erizo] (0.199328) instance.c:32 | Showed window
This is back down in the noise of "how long does the OS take to give us an OpenGL context", which is a great place for a high-performance tool.
As someone that has been programming for many years,
this was a perfectly-timed reminder that there are always pitfalls out there.
The documentation for sscanf
does not include a time complexity,
so this is particularly tricky footgun,
and I'm sure it's not the only one lurking in the darkness.
You may not get such a perfectly-timed reminder, but it's worth remembering – next time you read a fascinating story about bad programming – that it can happen to you!
(Obviously, the moral of the story is to not use sscanf
to repeatedly
parse single tokens off the front of a string; I'm sure you'll do fine
if you can just avoid that)
Follow-up and related discussions
- Hacker News discussion!
- Lobste.rs discussion, including an interesting survey across different C libraries
- Relevant FreeBSD patch
glibc
bug tracker- The cppreference.com page for
sscanf
now mentions this pitfall