The FOSSA Podcast: Adopting Haskell into an Existing Codebase

It's only appropriate that the first episode of FOSSA's new engineering-themed podcast covers a topic near and dear to many of our developers: the Haskell programming language.

Our engineering organization includes many Haskell enthusiasts; in fact, we used the language to write the current version of the FOSSA CLI. In this episode, we'll explain why we adopted Haskell, characteristics of the language, and pros and cons for teams considering it.

Episode Outline

Introductions
Why FOSSA adopted Haskell (and why we switched from Golang): 2:30
Why Haskell isn't hard to learn — with caveats: 10:15
The many positives of Haskell's type system: 19:40
Effect tracking in Haskell — what it is and why we love it: 23:15
Pros and cons of Haskell for developers: 29:55
Bigger-picture pros and cons of Haskell for less technical decision-makers (like the ability to hire Haskell-savvy developers): 34:00

Episode Highlights

Why FOSSA adopted Haskell

Our choice of Haskell was less based on the ideology of the language and more on the pragmatism of having an original team of people who were interested and experienced in Haskell — some professionally, some simply out of frustration with the alternatives.

Another consideration was that a particular project was very suitable to Haskell. We have a blog post on this with a Haskell consulting company, Serokell, but the first thing we actually built in Haskell was our CLI. It's very compiler-ish. You read in data, process it, transform it, and spit out an answer. Our CLI does a lot of text parsing; Haskell has very powerful parsing libraries, so it was a particularly good fit for the project and the team, and, ultimately, the fastest way forward.

Why we switched from Golang to Haskell

We wrote version one of our CLI in Golang. The decision hinged on finding a language that can be compiled to run in all of our customers' Continuous Integration (CI) environments, and we knew Golang had a good cross-compilation story. We also needed a language that was easy to get started with. Golang is the type of language you can pick up in 30 minutes and get an app running.

However, as the app grew, we discovered that writing correct parsers in Golang is hard. Getting them correct and debugging them was like pulling teeth.

A fundamental advantage of Haskell is its support for sum types. They are like a more powerful version of discriminated unions or tagged union types in other languages. They are incredibly useful because they allow you to express constraints on the values flowing through your program that you can't express cleanly in languages like Golang. While Golang can express unions of types, the compiler isn't much help in dealing with and consuming those types. It's difficult to ensure that all cases have been handled. Interestingly, TypeScript is actually very good at this with tagged unions. Alas, there's no world where we're shipping a Node.js runtime into our customer CIs.

Why Haskell isn't as difficult to learn as some may think

A common complaint about Haskell is the amount of specialised knowledge necessary to be productive. These are some of the most interesting features of the language, but it's possible to be productive without them. Like almost any language, Haskell is very deep, but most modern languages have complex features. For example, you can get very far with C/C++ focusing on basic object-oriented principles, but if you really want to master it, you'll need to spend a lot of time studying advanced features like template metaprogramming. But engineers aren't going to need that most of the time, and you can get started and be very productive with a lot less.

Another concern is just that Haskell doesn't use common syntax. It's not in the ALGOL language family like Java or JavaScript, so it takes a little while to learn and learn how to read. Once you can read it, the overall syntax is actually quite simple and elegant. It has very few exceptions and special cases. There just aren't that many language constructs (barring compiler extensions).

The biggest barrier is that Haskell requires a different way of thinking than object-oriented programming. This can be uncomfortable if you've only done, say, Java and OOP design patterns. But it is becoming less of a hurdle every day. We are seeing more and more functional programming idioms make their way into other languages. It's kind of rare to have a language that doesn't let you map over a collection now. You can do that with Java Streams, and JavaScript Arrays, and you can do advanced transformations with LINQ in C#.

These concepts are beginning to feel more and more familiar to many engineers, but they were originally borrowed from functional languages. They fit naturally in a language like Haskell and haven't always translated well. If you are a front-end developer and you've ever written Redux, you may have thought, "Wow, this boilerplate for writing actions and reducers is really annoying." That's partially because you're fighting against the language. If you try to use that style in Haskell, a lot of that boilerplate disappears.

What makes Haskell's type system stand out

One of the most well-known Haskell attributes is its powerful type system. It allows you to express constraints very clearly. A great example that comes to mind is an experience in Java where we had to escape strings between our UI and database. Our application kept having bugs where engineers would either forget to escape a string or they would double escape it; we'd end up with extra characters floating around in the UI. It got so bad we didn't know what data was escaped or not in the database.

I kept having to ask, "Can we just express these as different things?" We could have one function, escapeString, that takes an UnescapedString and makes it an EscapedString, and one unescapeString function which does the opposite. It would make it so we can never mess it up, we can never forget to do it, and we can never do it twice.

You can kind of do that in Java, but there's a cost. In contrast, a language like Haskell has a zero-cost abstraction for these sorts of wrappers. At the compiler level, they disappear in your emitted code. So you can add these extra constraints and add extra information to the types without impacting your run-time. That creates a lot more safety in your types. You can make sure that you're not going to get into situations that you could otherwise get into, like saving an unescaped string to the database. It can be used for all sorts of things, like non-empty lists, so that you never accidentally pass an empty list to a function that needs at least one item, which is a super common mistake.

What effects are and why Haskell effect-tracking is a game-changer

An effect is a thing that your program does that is not captured by the inputs and the outputs of a function. So, for example, if you have a function that takes some input and then prints some stuff, the act of printing something is not something that's represented in the output of the function signature. Similarly, let's say your function takes no input but reads from a specific file on disk. Well, then, yeah it really does take input, but that input is merely not represented as part of the function's type signature.

We call these things effects. If you think about it, depending on what you care about, what your language considers to be an effect or not can change. For example, you could argue that Rust has effect tracking — and what they care about is memory allocation as an effect.

Going back to effect tracking in Haskell, what Haskell tracks is IO. If you've ever debugged a program where you're like, "This piece of code cannot possibly be printing this log line, there's no way that could possibly happen…" and then you jump to definition 15 times and you get all the way down to the performance-sensitive loop that's not supposed to doing IO and you see a console.log, then you have felt this pain. This has happened to me so many times. Haskell makes this impossible.

Or, for example, if you are debugging a thing that's sending an API request and you're like, "There's no way this should be sending an API request," effect tracking marks that effect of doing IO as part of the type signature of the function that you're running. So you can say, "This function definitely does do IO," or, "This function can possibly do IO." And the compiler will check that for you at compile time.

The two knock-on effects here are (1) if you're a fan of dependency injection, effect tracking is dependency injection on steroids. Or, really, dependency injection is a poor man's effect tracking. And (2) if you enjoy being able to write unit tests, well, being able to track effects dramatically simplifies your ability to write unit tests because you can show through the compiler, "Hey, this unit test runs code that doesn't do an effect and therefore is re-runnable." Or, alternatively, "Every call site where I would have done an effect, I have substituted out this effecting function that is our air quotes dependency injection, and so this unit test is now safe to run."

Considerations for adopting (or not adopting) Haskell

If correctness and avoiding bugs and crashes is a priority for the team, that's when a language like Haskell will shine. If you really don't care — like, you need to get something up quickly that gets through some happy path with minimal time to deployment, then you'll want to look at other languages. That's where languages like JavaScript and Python shine, where as long as you've got the main part of the code correct, you're going to be OK, but it may not catch for you that there are other inputs you didn't handle or other edge cases and exceptions that may occur which will cause your app to crash. Haskell may slow you down a little bit because it will force you to at least acknowledge a lot of those things exist. As a programmer, you'll want to say, "I know that getting an empty list here should not be possible," but the compiler won't let you write the program in a way that doesn't deal with that case.

For the FOSSA CLI, performance and correctness are extremely important. Our CLI gets put on customers' machines — and those machines may or may not upgrade it regularly. It's not like a web app where we can just quickly patch a bug if we find it and have all the customers get it within a few hours. This is something people are putting into their CI pipelines, and they want it to work and not crash. So we want to be careful and write something that we know has good quality.

The other case where we at FOSSA used Haskell was for a microservice and that's because we want to write a service where we write it once, it works, and then our team moves on to other projects and we don't have to constantly go back and fix bugs. If you just have one web application, you get bugs, you fix them, and you move on. But when you're doing microservices, you may be putting down some code and letting that run on its own for a while. You aren't going to be doing active development on it for some amount of time, so you want it to work and not have your engineers constantly switching over and getting all of that context back into their heads just to fix a small bug. So, that was another place where I think Haskell worked out pretty well for us.

Also, you want that put-down code to be documented. One of the nice things about types is that they're sort of free documentation in the sense that when it's very cheap to construct new types, maintainability of code is all about maintaining intent. It's all about preserving the intent of the developer. And types are one way you can express that intent, where in another language a field might be a string, a field that is actually a discriminated union might be a string. And then six months later you come back and you're like, "What is this string? Is this actually meant to be free-form text? Are there semantics to it? Does it only take on a certain number of values?"

In Haskell, well, you can just bake that into the program and that becomes a certain class of loss of intent or loss of developer memory that you no longer need to worry about.

Best Joke of the Episode

"How can Haskell possibly be a good language? Most Haskell programs don't even compile!"

Host and Guests

Sara Beaudet, Support Engineer, FOSSA: Sara is the host of the FOSSA Podcast. They are passionate about cybersecurity, open source software, and helping people explore the world of technology.

Leo Zhang, Software Engineer, FOSSA: Leo is an engineer on FOSSA's Platform team, which owns the back-end analysis services that power FOSSA's underlying data platform.

Drew Haven, Software Engineer, FOSSA: Drew is an engineer on FOSSA's Platform team. He discovered Haskell over a decade ago and has never found another language that fits him as well. He focuses on designing high-quality, extensible, and maintainable applications.

Episode Outline

Episode Highlights

Best Joke of the Episode

Host and Guests

Subscribe to our newsletter