Collection
Unlock This Episode
Our Free plan includes 1 subscriber-only episode of your choice, plus weekly updates from our newsletter.
Introduction
The interface for mutating substrings is the same as the interface for mutating strings, but we’ll get a bit of a performance boost by working with a view into the string instead of a copy.
We still have the restriction that the entire String
must be in memory, which means parsing a very large String
isn’t going to be efficient, but the optimizations we’ve made so far were very low-hanging and already buy us a lot, so let’s kick that can a bit down the road.
Subscribe to Point-Free
Access this episode, plus all past and future episodes when you become a subscriber.
Already a subscriber? Log in
Exercises
Right now all of our parsers (
int
,double
,literal
, etc.) are defined at the top-level of the file, hence they are defined in the module namespace. While that is completely fine to do in Swift, it can sometimes improve the ergonomics of using these values by storing them as static properties on theParser
type itself. We have done this a bunch in previous episodes, such as with ourGen
type andSnapshotting
type.Move all of the parsers we have defined so far to be static properties on the
Parser
type. You will want to suitably constrain theA
generic in the extension in order to further restrict how these parsers are stored, i.e. you shouldn’t be allowed to access the integer parser viaParser<String>.int
.We have previously devoted an entire episode (here) to the concept of
map
, then 3 entire episodes (part 1, part 2, part 3) tozip
, and then 5 (!) entire episodes (part 1, part 2, part 3, part 4, part 5) toflatMap
. In those episodes we showed that those operations are very general, and go far beyond what Swift gives us in the standard library for arrays and optionals.Define
map
,zip
andflatMap
on theParser
type. Start by defining what their signatures should be, and then figure out how to implement them in the simplest way possible. What gotcha to be on the look out for is that you do not want to consume any of the input string if the parser fails.Create a parser
end: Parser<Void>
that simply succeeds if the input string is empty, and fails otherwise. This parser is useful to indicate that you do not intend to parse anymore.Implement a function that takes a predicate
(Character) -> Bool
as an argument, and returns a parserParser<Substring>
that consumes from the front of the input string until the predicate is no longer satisfied. It would have the signaturefunc pred: ((Character) -> Bool) -> Parser<Substring>
.Implement a function that transforms any parser into one that does not consume its input at all. It would have the signature
func nonConsuming: (Parser<A>) -> Parser<A>
.Implement a function that transforms a parser into one that runs the parser many times and accumulates the values into an array. It would have the signature
func many: (Parser<A>) -> Parser<[A]>
.Implement a function that takes an array of parsers, and returns a new parser that takes the result of the first parser that succeeds. It would have the signature
func choice: (Parser<A>...) -> Parser<A>
.Implement a function that takes two parsers, and returns a new parser that returns the result of the first if it succeeds, otherwise it returns the result of the second. It would have the signature
func either: (Parser<A>, Parser<B>) -> Parser<Either<A, B>>
whereEither
is defined:enum Either<A, B> { case left(A) case right(B) }
Implement a function that takes two parsers and returns a new parser that runs both of the parsers on the input string, but only returns the successful result of the first and discards the second. It would have the signature
func keep(_: Parser<A>, discard: Parser<B>) -> Parser<A>
. Make sure to not consume any of the input string if either of the parsers fail.Implement a function that takes two parsers and returns a new parser that runs both of the parsers on the input string, but only returns the successful result of the second and discards the first. It would have the signature
func discard(_: Parser<A>, keep: Parser<B>) -> Parser<B>
. Make sure to not consume any of the input string if either of the parsers fail.Implement a function that takes two parsers and returns a new parser that returns of the first if it succeeds, otherwise it returns the result of the second. It would have the signature
func choose: (Parser<A>, Parser<A>) -> Parser<A>
. Consume as little of the input string when implementing this function.Generalize the previous exercise by implementing a function of the form
func choose: ([Parser<A>]) -> Parser<A>
.Right now our parser can only fail in a single way, by returning
nil
. However, it can often be useful to have parsers that return a description of what went wrong when parsing.Generalize the
Parser
type so that instead of returning anA?
value it returns aResult<A, String>
value, which will allow parsers to describe their failures. Update all of our parsers and the ones in the above exercises to work with this new type.Right now our parser only works on strings, but there are many other inputs we may want to parse. For example, if we are making a router we would want to parse
URLRequest
values.Generalize the
Parser
type so that it is generic not only over the type of value it produces, but also the type of values it parses. Update all of our parsers and the ones in the above exercises to work with this new type (you may need to constrain generics to work on specific types instead of all possible input types).
References
Parse, don’t validate
Alexis King • Tuesday Nov 5, 2019This article demonstrates that parsing can be a great alternative to validating. When validating you often check for certain requirements of your values, but don’t have any record of that check in your types. Whereas parsing allows you to upgrade the types to something more restrictive so that you cannot misuse the value later on.
Ledger Mac App: Parsing Techniques
Chris Eidhof & Florian Kugler • Friday Aug 26, 2016In this free episode of Swift talk, Chris and Florian discuss various techniques for parsing strings as a means to process a ledger file. It contains a good overview of various parsing techniques, including parser grammars.
Swift Strings and Substrings
Chris Eidhof & Florian Kugler • Friday Dec 1, 2017In this free episode of Swift talk, Chris and Florian discuss how to efficiently use Swift strings, and in particular how to use the Substring
type to prevent unnecessary copies of large strings.
We write a simple CSV parser as an example demonstrating how to work with Swift’s String and Substring types.
Swift Pitch: String Consumption
Michael Ilseman et al. • Sunday Mar 3, 2019Swift contributor Michael Ilseman lays out some potential future directions for Swift’s string consumption API. This could be seen as a “Swiftier” way of doing what the Scanner
type does today, but possibly even more powerful.
Difficulties With Efficient Large File Parsing
Ezekiel Elin et al. • Thursday Apr 25, 2019This question on the Swift forums brings up an interesting discussion on how to best handle large files (hundreds of megabytes and millions of lines) in Swift. The thread contains lots of interesting tips on how to improve performance, and contains some hope of future standard library changes that may help too.
Scanner
AppleOfficial documentation for the Scanner
type by Apple. Although the type hasn’t (yet) been updated to take advantage of Swift’s modern features, it is still a very powerful API that is capable of parsing complex text formats.
NSScanner
Nate Cook • Monday Mar 2, 2015A nice, concise article covering the Scanner
type, including a tip of how to extend the Scanner
so that it is a bit more “Swifty”. Take note that this article was written before NSScanner
was renamed to just Scanner
in Swift 3.