Why I Joined CodeZero’s Advisory Board
Hello weary traveler! My name is Marty Weiner. I’d like to tell you why I joined CodeZero’s advisory board through the wonders of storytelling 😀.
Many moons ago, I was with Pinterest as an engineer before it was called “Pinterest”. In my 4.5 years there, it grew from a place nobody had heard of to a household name. And with that rapid growth came all the insanity of trying to keep up with all the problems that come with rapid growth - rapid hiring, moving buildings, trying to keep the site up, going from a handful of servers to a handful of thousands of servers, breaking apart our monolithic codebase into tons of microservers, etc. Yes, gray hairs became a problem too.
Then I had an amazing offer I couldn’t refuse (not in the Godfather sense) to join Reddit as its first CTO in 2015 for a few rollercoaster years! The size of Reddit’s user base was already large, but the team was far too small for the challenges Reddit faced. So I brought on a recruiting firm, and we went to town. Over the next few years, we grew from 25 engineers to several hundred and with that came - rapid hiring, moving buildings, trying to keep the site up, going from a handful of servers to a handful of thousands of servers, breaking apart our monolithic codebase into tons of microservers, etc. Yes, gray hairs became a bigger problem too.
I’m noticing a pattern…
Two of those problems - rapid server and microservice growth - make for a big problem that doesn’t exist when it’s just a few of us hacking in the founder’s apartment: how do you accurately replicate your entire system somewhere convenient for your developers to make rapid progress against? And, once you do that, how do you make collaboration among multiple developers fun and easy?
In 2010, Pinterest was a simple monolith on a handful of servers. Around Jan 2011, the users started coming… fast. There were only 2 backend+web engineers and 1 iPhone engineer. In those days, we didn’t sleep, and we tested+built in production just to survive another day. I once fixed the website from a paddle boat in Shoreline Lake in Mountain View, CA with my 3-year-old climbing on my back - it was those kinds of times. We tried not to edit in production, but that was by far the fastest / low-effort way in a time of massive stress.
But, soon after, the other engineers and I started to split the monolith into services principally to make deploys easier and to reduce the chances of adding bugs in critical areas (like database manipulations). We (more-or-less) stopped making changes to production boxes directly, instead relying on deploying changes. This was due to us starting to grow as a company with more hands touching things, meaning more coordination was necessary. We had to think about employee scale in addition to customer scale. We dreamed of the day when we could effectively develop in our production environment as we once did (but without the danger) because it was so damn fast and effective.
Around 2013, the microservice madness had begun. We had a handful of services: api, web, database, cache, spam, email, search, and the data pipeline. In 2015, that ballooned up to 50 or so services. When we had 10s of services, MySQL, HBase, Memcache, Solr, and a handful of others, we could spend some effort to make a magic tool that would deploy all of these in one instance of Linux in AWS to act as a dev environment. In the pre-docker days, this was pretty hard - we used tools like Puppet to download all the packages, set everything up, and generate good fake data. These dev environments were ultra fragile as the developers’ first priority, of course, was making sure production worked, instead of making sure the development environment was happy. Priority realistically went to speed in the moment, not longer-term investment. Even worse, many of these services were not built with any consistency - different protocols, different ways of logging, etc. One pain we certainly felt was collaboration between groups that building services naturally erodes.
At Reddit, the same problem all over again but with some learnings from my time at Pinterest - a monolithic codebase and rapid employee growth = need for rapid (but careful!) growth in services. We started with something really cool - the team built a service generator called Baseplate.py - so that all services we created could conform to a few requirements and developer behaviors: consistent API protocol, testing built in, mocking at all layers, logging built-in, common deployment mechanisms, legal/compliance needs, consistent libraries, reporting to a centralized tracking system, etc. At the time, he also had dreams that it would allow us to spin up portions of the development environment, but it was still difficult. We, at one point, had a large portion of the system even deployable on a single laptop with lots of RAM. Very cool! But, this kind of system just becomes very difficult to maintain, and you start to feel the pain on the poor laptop. I theorized that a dedicated well-functioning dev environment would need a dedicated team just to keep it operating. And, most likely, this development environment would be difficult to exist on just one laptop (or one cloud machine) for very long at all. The environment would need to be split up.
One thing going for us at Reddit was our service and server deployment got really good. We got smoother and smoother at spinning up new services, adding machines, and planning for future utilization. At the time I believe we could have spun up a copy of Reddit’s topology rather rapidly and easily. This meant that we did have the tools to spin up a full staging environment (or portions of it). And now, in a world of Docker + Kubernetes, this ability has become par for the course for any company, which is super cool!If we now live in a world where spinning up your developer environment is similar to spinning up your production environment, what’s missing are two things. The first, and one I started building at both Pinterest and Reddit before going onto other things, is the ability to see all of your services in one place, who owns them, who to call, how to connect, SLAs, etc. This would solve a major communication piece that crops up when you surpass about 100 engineers - who is responsible for a service and does a service even exist (reduces duplicate work, which is a big problem the bigger the company).
The second thing missing is the ability to develop collaboratively super fast - going back to that feeling of a startup of 3 engineers in a room hacking super fast together. So, when I met the CodeZero guys and they showed me a quick demo, I immediately knew what I was looking at because it addressed both of these two pains in one go: discovery/cataloging of all the services available and the ability to hack together with a teammate hyper rapidly on a development environment that can be spun up with the push of a button with companies like Civo. The power comes from being able to “pull a service” to your laptop as if it’s on your laptop and hack on it even though it’s sitting blissfully in the cloud.
Even crazier, with CodeZero, several teammates and I could hack on one continuous flow throughout a rather complex topology. For instance, at Reddit, when a post is made, it has quite a “lifetime”, including being placed in a few databases, logging events get fired, push notifications are sent, recommendations are updated, spam systems are notified, mods may be sent a private message, the search engine is notified, and more. The blast radius of a single post is quite complex. If we had the ability to pull down several of those servers to our laptops, add if and print statements or even breakpoints, we could watch a post propagate at every stage of and debug very painful problems. I’m picturing all this in development, but imagine if there was a gnarly problem in production that could not be replicated in development. Of course, you’d have to be careful (both for not breaking the service and ensuring privacy), but being able to pull down a live production server onto your laptop seamlessly seems like one of the most powerful tools I can imagine to debug really hard-to-find problems.
And so, I was sold. I joined CodeZero’s advisory board to help offer my experience of the pain felt developing large service topologies like Pinterest’s and Reddit’s, and I recently became an investor in their latest round. But there’s a problem. Since I’m now “retired” as a full-time dad, enjoying advising companies, it’s harder to get my hands on a large sea of services to play with. This is really hard since now I’ve got this really cool CodeZero hammer and everything looks like a nail. So if you have an even slightly interesting service topology, I wanna try this out with you, if for no other reason than to geek out on cool tech 😀.