Unlock This Episode
Our Free plan includes 1 subscriber-only episode of your choice, plus weekly updates from our newsletter.
Over the past many weeks we have built up a pretty impressive parser library. Our parser is a very general type that allows you to parse any kind of nebulous input into any kind of well-structured output. It supports lots of interesting forms of composition that allow you to break large problems down into smaller ones.
On top of all of that we were able to build up an impressive library of parsers and higher-order parsers that work on strings. They allowed us to scan values off the fronts of strings in an efficient manner, such as characters, numbers, prefixes and more. And these parsers dealt with a lower-level API than just plain
Substring is like a view into a string. It’s not the actual string itself, but rather just a representation of a portion of a base string that is stored somewhere else. This means we can consume little bits off the front of a substring while only changing our view of the underlying string, which is a very lightweight thing to do. If we were dealing with raw
Strings then we would need to make a whole copy of the string, which can be a very expensive operation.
So it seems that maybe we are ready to close the book on parsers and open source it! But not so fast. There is a very important topic to consider, especially when it comes to parsers, which is performance. Sometimes we need to parse megabytes or even gigabytes of data, and we need to be as efficient as possible when it comes to scanning the input. And although we have taken a huge step by using
Substrings instead of
Strings, it turns out there is a lot more we can do.
We want to start off by giving everyone a quick deep dive into Swift strings and their performance characteristics. It’s a tricky subject, and there are a few subtle edge cases to think about, but once that’s done we will find that there is an even lower level representation of strings for which parsing is even more efficient than
Substring. It’s really quite amazing to see, but it also kind of opens up a whole can of worms that requires more work to wrangle in.
A Swift library for benchmarking code snippets, similar to google/benchmark.
Swift 5 made a fundamental change to the String API, making the preferred encoding UTF-8 instead of UTF-16. This brings many usability and performance improves to Swift strings.
In this episode we explored different representations of strings and their subsequences (i.e.
UTF8View), but more generally there are collections and slices. This article gives a nice accounting of the zoo of types in Swift’s collections API.
While researching the string APIs for this episode we stumbled upon a massive inefficiency in how Swift implements
removeFirst on certain collections. This PR fixes the problem and turns the method from an
O(n) operation (where
n is the length of the array) to an
O(k) operation (where
k is the number of elements being removed).