Generalized Parsing: Part 1

Episode #124 • Nov 9, 2020 • Subscriber-Only

The parser type we built so far is highly tuned to work on strings, but there are many things out in the world we’d want to parse, not just strings. It’s time to massively generalize parsing so that it can parse any kind of input into any kind of output.

Part 1
Introduction
00:05
Why generalize?
01:31
Generalizing the library
11:37
Generalizing our parsers
19:32
Taking advantage of generalized parsers
22:52
Next time: parsing dictionaries and more
32:32

Unlock This Episode

Our Free plan includes 1 subscriber-only episode of your choice, plus weekly updates from our newsletter.

Introduction

Over the last few weeks we have done the work to get everybody up to speed on functional parsing, which is a topic we covered in depth over two years ago and we’ve been wanting to pick up again. And last week we covered something completely new, which is how to change the signature and definition of zip so that it is a little more friendly for parsing because zip on parsers is a little different than zip on other types we’ve encountered it on.

Today we are continuing with parsing, but we are going to unlock a whole new level of functionality. We are going to massively generalize parsing so that it can parse any kind of input into any kind of output. So far all of the parsers we have discussed have been highly tuned to work on strings and substrings. However, there are many things out in the world we’d want to parse, not just strings, and generalizing will allow us to tackle all new problems.

But, even better than opening up new worlds of things we can parse, by generalizing we will also open whole new worlds of composition and we will get access to substantial performance gains, both of which were impossible to see before generalizing.

So, this is incredibly powerful, but it is going to take us time to get there. So let’s start by understanding why it is we’d want to generalize parsing in the first place.

This episode is for subscribers only.

Subscribe to Point-Free

Access this episode, plus all past and future episodes when you become a subscriber.

See plans and pricing

Already a subscriber? Log in

Exercises

  1. Generalize prefix(while:) to not just work on substrings, but any Collection.

    Solution
    extension Parser
    where
      Input: Collection,
      Input.SubSequence == Input,
      Output == Input
    {
      static func prefix(while p: @escaping (Input.Element) -> Bool) -> Self {
        Self { input in
          let output = input.prefix(while: p)
          input.removeFirst(output.count)
          return output
        }
      }
    }
    
  2. Generalize prefix(upTo:) and prefix(through:) to not just work on substrings, but any Collection.

    Solution

    These are a bit trickier because there is no range(of:) operation on Collection. We can recover this behavior with a combination of starts(with:) and removeFirst() to advance the input’s start index for each check:

    extension Parser
    where
      Input: Collection,
      Input.SubSequence == Input,
      Input.Element: Equatable,
      Output == Input
    {
      static func prefix(upTo subsequence: Input) -> Self {
        Self { input in
          guard !subsequence.isEmpty else { return subsequence }
          let original = input
          while !input.isEmpty {
            if input.starts(with: subsequence) {
              return original[..<input.startIndex]
            }
            input.removeFirst()
          }
          input = original
          return nil
        }
      }
    }
    

    Similarly can be done with prefix(through:).

    extension Parser
    where
      Input: Collection,
      Input.SubSequence == Input,
      Input.Element: Equatable,
      Output == Input
    {
      static func prefix(through subsequence: Input) -> Self {
        Self { input in
          guard !subsequence.isEmpty else { return subsequence }
          let original = input
          while !input.isEmpty {
            if input.starts(with: subsequence) {
              let index = input.index(input.startIndex, offsetBy: subsequence.count)
              input = input[index...]
              return original[..<index]
            }
            input.removeFirst()
          }
          input = original
          return nil
        }
      }
    }
    

Downloads