During the summer, our team at Klarna has been recruiting other teams to join our Haskell monorepo. Getting more teams interested in joining us in doing Haskell has always been our strategy - by doing it in a monorepo we feel that we are able to make sure that all teams write their code in a similar way.
This poses a couple of challenges to us - only one of which we’re dealing with in this article!
There are the basics of how we build the applications and make sure that tooling works great for everyone involved, to how the CI tests and deploys the artifacts.
Then there are more subtle things: how we make sure that code reviews focus on the right things - and that people don’t introduce new patterns and ways of doing things.
I believe that one of the most important things that you can do as a developer is to make sure that the above is automated. Which leads us to one of my pet peeves:
When there’s no (enforced) canonical way of formatting something in a language - people tend to think up their own way of doing something. Let’s take something as simple as formatting data
in Haskell. Here are a couple of alternatives for how to format a record type:
-- JS style:
data Car = Car {
manufacturingYear :: Year,
milesRun :: Natural
deriving stock (Eq, Show)
}
-- Vertical alignment (my personal hell):
data Car = Car { manufacturingYear :: Year
milesRun :: Natural
,
}deriving stock (Eq, Show)
-- 2 space indent:
data Car = Car
manufacturingYear :: Year
{ milesRun :: Natural
,
}deriving stock (Eq, Show)
The last one is my personal favorite as it minimizes git-diffs (yeah, I’m one of those people).1
This is just the tip of the iceberg though. Once we’ve agreed to how we format data, then we have: newtypes, imports, module headers, docstrings, language pragmas…
Keeping this in sync in a single team is fine, maybe not enjoyable, but fine. In a monorepo with >50kLOC and multiple teams; yeah, that’s not going to be pleasant at all.
There are several options for Haskell:
We chose to go with the last of the bunch. The reason simply being that we wanted to be able to customize the way that our code is formatted and some of our engineers are maintainers of the repo.2
With stylish in tow, we added formatting to our CI and the world was a better place, for a while.
We like shiny things! Especially if that means low-latency garbage collection and improved runtime - or a sane option for qualified imports, post qualified!
-- This let's you write:
import A.B.C qualified as C
import D.E.F qualified as F
import G.H.I (J)
-- Instead of having to do this to minimize diffs:
import qualified A.B.C as C
import qualified D.E.F as F
import G.H.I (J)
We upgraded to 8.10 as soon as our friends at IOHK patched haskell.nix to offer the latest greatest version of GHC.
The drawback: enabling the post qualified imports broke stylish haskell 😢
(Enough preamble!) Here’s the recipe for how stylish-haskell formats your haskell source code:
Stylish is able to go into your source file, look at a specific segment of your code and apply a so called “step” to it. Each step modifies only a single type of structure - an example being how to format language pragmas.
The step has the following definition, where stepFilter
is the actual functionality of the step:
data Step = Step
stepName :: String
{ stepFilter :: Lines -> Module -> Lines
,
}
-- where
type Lines = [String]
The Module
is given to the step by parsing the source code with haskell-src-exts. This means that the stepFilter
function has both the original source code in terms of Lines
as well as an AST representation of said source.
Since the step returns Lines
it’s possible to compose several of these together in order to format the entire file.
When it comes time for the Step
to edit the Lines
, stylish has the concept of Block
as well as editor functionality operating on blocks and lines called Change
.
-- A block is defined as:
data Block = Block
blockStart :: Int
{ blockEnd :: Int
,
}
-- and a change as:
data Change a = Change
changeBlock :: Block a
{ changeLines :: [a] -> [a]
, }
Here’s a short example:
deleteTrailingWhitespace :: Step
= makeStep \lines _module -> fmap lines stripWhitespace
deleteTrailingWhitespace where
= reverse . dropWhile isSpace . reverse
stripWhitespace
-- alternatively using the changes API:
deleteTrailingWhitespace :: Step
=
deleteTrailingWhitespace "Delete trailing whitespace" $
makeStep lines _module -> applyChanges lines (stripWhitespace <$> lineNumbers)
\where
=
stripWhitespace i . reverse . dropWhile isSpace . reverse
changeLine i
= -- elided for brevity lineNumbers
Both of these do the same thing - but it’ll become important later that we’re able to modify the file in place, and for that - the editor functions really come in hand.
The editor is very useful when combined with the positions from the haskell-src-exts lib. One example being formatting records. A record is represented as a Decl
which contains a RecDecl
. We can get the starting positions of the record from the Decl
and then tell stylish to only format what’s between the start and end line of the record. If a Decl
turns out to not be a record, we can choose to emit no change. This means we get preservation of all other parts of the source file - no need to preserve comments or imports around the record, we can focus only on the thing we want to change.
In short, stylish haskell does the following:
Lines
Step
to enable and how to configure each of them from the stylish config you as the user specifiedChange
datas in order to edit the file in place by either line numbers or by the concept of Block
The limitations of stylish rest fundamentally on the functionality of haskell-src-exts. This means that any new language feature enabled by language pragma or other flag, needs to first gain support in this dependency before stylish can make use of it.
The pros of using this library is that it’s quite easy to manipulate the resulting AST that you get from parsing. In contrast to most real compiler parsers, it keeps a lot of source file information that normal parsers might discard. Compiler parsers tend to discard things that aren’t useful to compilation - such as comments.3
Other tooling projects in Haskell land have started using the GHC parser directly instead as means to mitigate this limitation. The parser as well as the AST is available in the ghc-lib-parser package, which is the GHC API but usable as a library.
In order to fix our issue, I started re-writing the parts of stylish that I thought relevant to my team’s needs using ghc-lib-parser. The result of this work is that this PR branch now formats our >50kLOC haskell monorepo on every PR and every branch build.
I think it might be interesting to write a separate article on how the GHC AST works and how I adapted stylish to work with it. It contains a couple of interesting things like a printer monad. I’ll try to put something together soon, hope this was an interesting read!
// Felix
Some of you might argue that it does not minimize diffs. Well, it does when you structure co-products like this:
data Car = Car
manufacturingYear :: Year
{ milesRun :: Natural
,
}deriving stock (Eq, Show)
data Vehicle
= MkCar Car
| MkBicycle Bicycle
deriving stock (Eq, Show)
This has the added benefit that you can get precise types when you deconstruct the co-product.↩︎
Funny story, they actually just wanted to see if they could get imports formatted according to our standards and after a few PRs they were made maintainers - after that we thought, well - looks like a well structured project that we can extend to our liking. Let’s go for it!↩︎
As a side note, Eugene and Olaf spent an inordinate amount of time getting things like comment positions just right for scalameta. Getting positions correct is really damn difficult. Making an AST easy to use while retaining this information is an art.↩︎