About Layer DevOps
We’re a Toronto-based B2B SaaS company that builds developer tools. This case study is about our core product, LayerCI - a platform for continuous integration, continuous delivery and continuous staging (you can read more about continuous staging with ephemeral environments here).
After raising a funding round backed by Y Combinator in 2020, we needed to focus on growing our engineering team. Setting up our own CI process became a top priority; the more of us working on the product and the more customers using it, the greater the costs associated with postponing the task.
Preparing for new hires in advance makes a huge difference. Though it was tempting to devote all development efforts to the product itself, we would be losing the opportunity to adjust our development process as LayerCI grew. Much like with washing the dishes, putting off the work until later would only result in a huge, painful and disruptive task that would have to be done all at once.
In the meantime we would also be spending more effort on uncoordinated manual testing, dealing with the extra bugs and regressions that inevitably slip by. When we framed the issue this way, the decision became obvious. The developer velocity of both our team and our customers is central to our mission; the CI process could not wait.
Planning it out
Our goal was to ensure no new change could ruin a customer’s day, no matter how quickly progress was made. The simplest way to ensure your product works for your customers is to try it yourself, so we would do just that. For a commit to be merged to our main branch, all critical product features would need to be checked. LayerCI would automatically start these checks for every new pull request, and the request would stay open until all the tests pass.
While designing the CI pipeline process itself, we stuck to one guiding principle that made it all viable: fast feedback. Order everything so that the average pipeline fails quickly. With this in mind, we sorted tests by how much of the product they cover. Unit tests, smallest in scope and taking the least time, would run early. Product-level feature integration tests, which need a full instance of the application running, would come last.
These steps were clear enough, but there were also about a half hour of software installation and deployment steps to deal with. Still, we were optimistic. We knew that effective configuration and caching would mean none of the setup or deployment steps could substantially slow down the pipeline. The goal was achievable.
- Linting, unit tests:
- Building and deploying:
Docker microservices, Sanic.io, Kubernetes
- API and product Integration tests:
- Website integration tests:
Every LayerCI integration starts with a Layerfile, which outlines the steps required to run your app on a virtual machine. The first few steps usually involve installing core software such as Docker and Node.js, then there are steps for downloading necessary container images and libraries. Following that the app is built, deployed and tested.
LayerCI makes use of many different technologies; their combined installation and setup time is upwards of 10 minutes. We could install the bare minimum required software before running unit tests, but that would only move the problem down the line. The integration tests have the largest and most critical test surface, and the later end-to-end ones would need the full application running. The rest of the time-consuming setup would have to occur anyway.
Fortunately, this was exactly the kind of issue our product was built to solve. Using its cache, LayerCI automatically skips steps that aren’t affected by the current pipeline’s commit. Since the expensive setup steps would not be changing often, all we had to do was place them at the start of the Layerfile and they would almost always be skipped. The unit tests could happily proceed with catching goroutine panics.
After software installation and unit tests complete, the pipeline runner starts the continuous-staging steps: building and deploying an instance of LayerCI in its own ephemeral environment. This was needed for proper test coverage of the product, but it also brought up a familiar problem: fresh Docker builds of a large application like LayerCI can take well over 20 minutes to complete, depending on machine and network load. A delay that long would certainly break our testing goals, so we had to do something about it.
Each commit deploys its own version of the product, so there was no skipping the step this time. Was there a way to speed it up instead? We knew that builds using the Docker cache were generally much faster, we just had to leverage that somehow. The strategy soon presented itself: we would simply have the LayerCI cache store the Docker cache between builds. It was straightforward to specify this in the Layerfile, and doing so greatly reduced the average pipeline running time.
Timeline and conclusion
All in all, setting up our CI process went very smoothly. Thanks to the fast feedback and the high-visibility tools at our disposal, we got our Layerfile running in a couple days. We were adding new integration tests within a couple weeks. Our performance decisions also ended up paying off, letting the Layer cache cut pipeline times by over 75% on average. Mission successful.
By acting early, we had averted the imminent disaster. There would be no pileup of devops needs and no era of clumsy, error-prone manual testing for LayerCI. We could get back to building features, confident that they would keep working as our team and product grew.
We’re passionate about the UX of DevOps. If your team is in a similar place and you’re thinking it’s time to set up your CI, consider trying us out. Chances are that the process will go just as smoothly for you as it did for us. If you’re already with another CI provider but are curious about our UX practices, you could check out our new open-source test runner tool. It’s designed to bring some of LayerCI’s debugging and pipeline visibility features to any CI provider.