We're building something new at WonderProxy: a monitoring platform called Observ.io! It's pretty exciting and is a large departure from WonderProxy, which is a website wrapped around some off-the-shelf software (squid).
Building new things is scary: there's so much you don't even know you don't know. To try and manage our risk, we've taken two big mantras to heart:
- Build the simplest thing that could possibly work.
- Prioritize building things that are likely to break everything.
Build the simplest thing that could possibly work
There are lots of different aspects to a monitoring solution: storing what we're monitoring, determining what success looks like, queuing work, checking to see if something is working, evaluating success, notifying of problems, etc. etc. etc.
There are great ways to build those sorts of things: interfaces, classes, levels of abstraction, queue solutions, etc. The first implementation of the "check to see if something is working" bit was a single PHP script executed by four different cron jobs. The minute-ly cron job ran
checker.php 1, the every five minutes cron job ran
checker.php 5, etc. This is a horrible piece of architecture, and utterly unscalable, but it worked.
There are a lot of great things about building something that simply works:
- It lets you see how different pieces of the puzzle fit together. Once you've got a tool that checks to see if a site is up, you might notice that you need more information to run properly or that there's a huge bottleneck in an external system.
- You've spent very little time. We reached a point where Observ.io would tell us if our website was down very very quickly. This let us see all of the components: the front end, performance of the backend, notifications of problems, etc. Making something that's functional perform better is a much easier mental slog than building one perfect gear in a large machine that won't turn on for months.
- You've wasted very little time. If you're building something new, you're going to end up throwing away most of what you've written a few times as you iterate. Your data model will have a fundamental problem and will need to be redone, your interfaces will abstract away the wrong pieces and force your code to perform heroic feats to get the data it needs, and so on. By building something simple, you reduce the amount you're going to throw away.
- You can put it in front of prospective customers. Once you have something that works, you can start showing it to other people. Hopefully the kind of people who might give you money one day! By building something quickly, you've gotten it in front of those people sooner.
Prioritize building things that will break everything
Once you've built something that works, break it by making something else that is better. The goal here is not simply to rewrite something you built quickly to make it better, but to expose the flaws in what you've built by adding a new feature.
Our very first script that could check to see if your website was up had a very simple check:
if ($response['status-code'] === 200). This worked and we put it in front of our beta testers, but it's clearly incredibly limited: it could be a
200 response code and a blank page, or a
200 response code that took 13 seconds to be generated. Allowing users to define what "Success" looks like for their tests broke everything: it was new data to store, abstractions needed to be added to the checker script, failure notifications needed to present what the failure was, etc.
By prioritizing this work, we limited the number of components built on top of something that we knew would radically change later. Things like fancier status pages, prettier emails, and overview dashboards will come eventually, but if we'd built them on top of code that only checked for a
200 response, those components would probably have fatal flaws once the criteria changed or we supported having multiple success criteria.
When you break stuff as soon as possible:
- You expose flaws with real code and real use cases, not your expectations of what that might look like.
- You minimize the amount of code built on top of stuff you're about to break.
- No one grows overly attached to the code that's only been around for two weeks.
Putting it together
By building the simplest thing that could possibly work and then tearing it down when we integrate something new that challenged previous assumptions, we are able to quickly get new features into our users' hands. We've also avoided building large complex systems that will not survive the next new feature and avoided spending the time that would come with that.
This is not the approach we'd take if we were working on a project that we had more experience with. It introduces frailty; we do end up tossing a fair amount of code; and as professionals, it feels awkward to :ship: code with room for obvious improvement. However, with a new project with an unknown quantity of unknowns, we're really happy with where it's taking us.