Invertible Parsing: Generalization

Episode #181 • Mar 14, 2022 • Subscriber-Only

Our parser-printer library is looking pretty impressive, but there are a couple problems we need to address. We have made some simplifying assumptions that have greatly reduced the generality our library aspires to have. We will address them by abstracting what it means for an input to be parseable and printable.

Generalization
Introduction
00:05
Why generalize?
03:32
RangeReplaceableCollection
13:00
An overgeneralized abstraction
19:03
AppendableCollection
24:31
Next time: mapping parser-printers
34:15

Unlock This Episode

Our Free plan includes 1 subscriber-only episode of your choice, plus weekly updates from our newsletter.

Introduction

And we think this is pretty incredible. We now have the basic infrastructure of printers in place. At its core it is just a protocol that exposes a single method for turning some type of output back into an input, and that process can fail. It is further expected that this print requirement plays nicely with the corresponding parse requirement, which consumes some input to turn it into an output, and that process can also fail.

This expectation is something we called “round-tripping”. If you parse an input into an output and print that output back into an input, then you should roughly get the same input as what you started with. Further, if you print an output to an input and parse that input to an output, then you should again roughly get the same output as what you started with. We say roughly because parsing and printing can be lossy operations, but the underlying content of the input or output should be unchanged.

With this infrastructure we’ve already converted a pretty impressive parser to also be a printer. We had a parser that could process a list of comma-separated fields into an array of tuples that represent a user’s id, name and admin status, and now it can print an array of tuples values back into a comma-separated string of users.

Further, there is some very nuanced logic embedded in this parser-printer, where we want to be able to parse quoted name fields so that we can allow for commas in the name, but at the same time we want to prefer printing without the quotes if the name doesn’t contain a comma. This logic can be tricky to get right, and previously we had it scattered in two different places: once for the parser and once for the printer. And now it’s all in just one place.

Now, we could keep moving forward by converting more and more parser combinators to be printer combinators, which would allow us to build up more and more complex printers, but there are some problems we need to address first. In order for us to get as far as we did with printers in the previous episodes we needed to make some simplifying assumptions that has greatly reduced the generality that our library aspires to have.

The first problem is that in a few places we had to needlessly constrain our printer conformances because their inputs were far too general for us to know how to print to them. For example, in the Prefix printer we just assumed that we were working on substrings rather than any collection, and for the integer and boolean parsers we assumed we were working on UTF8Views rather than any collection of UTF8 code units. This unnecessarily limits the number of situations we can make use of these parsers.

The second problem we ran into was that we couldn’t figure out how to make the map operation into a printer. The types just didn’t line up correctly, so we abandoned it and decided not to use map anywhere in our parser-printers. This means that we aren’t actually parsing and printing User structs, but rather we are really dealing with just tuples of the data. We currently don’t have a printer-friendly way of bundling up the data in those tuples into a proper User struct. Losing map due to printing is a pretty big deal as it is a fundamental operation for massaging parser outputs into more well-structured data, so ideally we can recover this operation somehow.

This episode is for subscribers only.

Subscribe to Point-Free

Access this episode, plus all past and future episodes when you become a subscriber.

See plans and pricing

Already a subscriber? Log in

References

Invertible syntax descriptions: Unifying parsing and pretty printing

Tillmann Rendel and Klaus Ostermann • Thursday Sep 30, 2010

Parsers and pretty-printers for a language are often quite similar, yet both are typically implemented separately, leading to redundancy and potential inconsistency. We propose a new interface of syntactic descriptions, with which both parser and pretty-printer can be described as a single program using this interface. Whether a syntactic description is used as a parser or as a pretty-printer is determined by the implementation of the interface. Syntactic descriptions enable programmers to describe the connection between concrete and abstract syntax once and for all, and use these descriptions for parsing or pretty-printing as needed. We also discuss the generalization of our programming technique towards an algebra of partial isomorphisms.

This publication (from 2010!) was the initial inspiration for our parser-printer explorations, and a much less polished version of the code was employed on the Point-Free web site on day one of our launch!

Unified Parsing and Printing with Prisms

Fraser Tweedale • Friday Apr 29, 2016

Parsers and pretty printers are commonly defined as separate values, however, the same essential information about how the structured data is represented in a stream must exist in both values. This is therefore a violation of the DRY principle – usually quite an obvious one (a cursory glance at any corresponding FromJSON and ToJSON instances suffices to support this fact). Various methods of unifying parsers and printers have been proposed, most notably Invertible Syntax Descriptions due to Rendel and Ostermann (several Haskell implementations of this approach exist).

Another approach to the parsing-printing problem using a construct known as a “prism” (a construct Point-Free viewers and library users may better know as a “case path”).

Downloads