Skip to content

Instantly share code, notes, and snippets.

@swyxio
Last active May 3, 2023 12:23
Show Gist options
  • Save swyxio/ff8a4f6757286444fa20b43f6b98b205 to your computer and use it in GitHub Desktop.
Save swyxio/ff8a4f6757286444fa20b43f6b98b205 to your computer and use it in GitHub Desktop.
Cloud Operating Systems and Reconstituting the Monolith. tweet responses: https://twitter.com/swyx/status/1226257539886669825?s=20

Update

The first post has been published: https://www.swyx.io/writing/cloud-distros

The second post has been adapted for Temporal: https://www.swyx.io/why-temporal/


these are bullet points of a blogpost on a topic i know very little about but i feel like there is something there that is happening as we speak

this might be better as two blog posts

Cloud Operating Systems

  • the Big 3 Cloud Providers are mostly (not exclusively) racing each other towards providing good cloud primitives.
    • arguably this is not the best way to perceive their strategy as it seems GCP/Azure are verticalizing rather than matching AWS horizontally, but that's not relevant here
  • Applications were originally envisioned to be run directly on these clouds, but, increasingly, intermediate providers are rising up to provide a better developer experience and enforce opinionated architectures (like JAMstack)
    • Netlify
    • Zeit
    • Repl.it
    • Begin.com
    • Glitch
    • Render.com
    • Amplify
    • Binaris
    • Stackery
    • ???
  • The working name for this new generation of cloud providers, used by Martin Casado, Amjad Masad, and Guillermo Rauch, is "second layer" or "higher level" cloud providers.
  • Nobody loves these names. It doesn't tell you the value add. Also the name implies that more layers atop these layers will happen, and that is doubtful.
  • In the first (serverful) wave of Cloud, the abstraction from hardware to software was often explained as a 3 layer model: IaaS -> PaaS -> SaaS

https://venturebeat.com/wp-content/uploads/2011/11/iaas-paas-saas.jpg?resize=640%2C439&strip=all?w=640&strip=all

  • But all the big clouds are essentially PaaSes now - OSes are increasingly being abstracted away. So maybe we can use "second layer PaaS"?
  • if we view the Big 3 as providing new "cloud primitives", then maybe a better name for "second layer clouds" is "Cloud Operating Systems". especially if the premise (if not the current reality) is your application seamlessly running across multiple clouds.

Reconstituting the Monolith

  • Serverless cannot proclaim total victory until we can recreate DHH's demo from 2005 in 15 minutes.
  • The plain fact is that has been hard to break up with the monolith - it is simply too handy to have everything in one place.
  • Serverless functions (Lambda) are nice, but not nearly enough to replace everything we used to do in a single runtime.
  • We can piece back everything with services and APIs, but this architecture is still far too bespoke and brittle and slow and leaky. (altho in theory we still get the benefits of everything being distributed, not worrying about horiz/vertical scaling, and pay-per-use pricing)
  • the jobs that monoliths do that we have to reconstitute in serverless-land:
    • static fileserving: often relegated to CDNs anyway
    • functions: marginal compute
    • gateway: for auth/sessions/rate limiting, etc
      • auth is a hard enough problem on its own that it is offered as a standalone service, altho really it is made up of other elements
    • socket management: for live subscriptions, maybe part of the gateway
    • jobrunners: for long running compute (aka batch processing?)
    • queue: for not dropping messages and jobs (aka stream processing?)
    • scheduler: for coordinating functions and jobrunners. at most basic level this is a cronjob, but you will eventually want a smarter scheduler for prioritizing work across limited allocated resources.
    • object/cold storage: slower, immutable, large, (long lived ?) persistence
    • database/hot storage: fast, mutable, small, (short lived ?) persistence
      • related jobs: searching, caching
    • (metajobs: error logging, usage logging, dashboarding, CI/CD)
    • (unique to cloud: latency aka edge computing. see victor bahl at msft)
  • each has to be able to talk to and make use of each other EASILY to match the DX of monoliths
  • keeping up with this stuff is a fulltime job, the media company covering this is literally called The New Stack
  • infinite scalability is nice, but not at the expense of infinite potential cost. a good cost cap + failover story is also important to DX. Users understand "sorry our service is temporarily down because of a sudden surge in demand", but the opposite of "sorry your bill this month is $1m because of a sudden surge in usage and it's up to you to figure out why" is less well accepted by developers and their employers
  • so maybe the answer to breaking the monolith up is to reconstitute the monolith inside the application framework - standard APIs that expose the various functions of a monolith.
  • the Serverless Framework is an early pioneer of this, but seems focused on the IaaC job rather than the unified interface job (and doesn't have as good an answer for non serverless stuff)
  • Zeit and Next.js take the monorepo -> microservices split rather seriously and have vertically aligned themselves all the way down to the frontend library layer - is there more to do here?
  • Redwood is TPW and team's effort to do this atop Netlify, but the db layer is currently on Heroku.
  • i think Cloud Operating Systems are well positioned to offer and coordinate these jobs and expose a good DX layer for users.
    • Binaris and Repl.it focus on functions
    • Zeit and Netlify combine static fileserving with functions
    • Begin combines data with the above
    • Amplify adds storage with the above (and, for some reason, XR?!)
    • what about the other jobs of the monolith? currently, we are told to spin up services the regular old way. or duct tape together a bunch of solutions not designed for this task and not integrated with anything else.
    • not. good. enough.

I think the Cloud OS that reconsititutes the monolith earliest, will be a natural aggregator of every application developer moving to a serverless first world.

note - kevit scott - reprogramming the american dream, AI given infinite compute. the guy who built a supercomputer on aws.

again the mega caveat to all of the above is that i am a novice in this industry and am ignorant of both how hard it is to do all of this and the full capabilities of every platform

@swyxio
Copy link
Author

swyxio commented Feb 25, 2020

cool cool. overall, yeah I agree that devs care about Time to market, Cost, and Power. I'm interested in framing it as "Infra team as a service" - Cloud OSes (or to use your words, Cloud Package Manager) democratize infra. I always hear about the kind of stuff that Uber Airbnb etc have entire teams to set up for the rest of their devs, so this is a good "bring this cutting edge cloud tech to the masses" type story to sell and invest in.

I find it funny that people are already looking at a "post Kubernetes world" (not your words, i just hear it more and more now). everybody is being enabled by k8s - so really are we just talking about different forms of "k8s as a service" here? (i'm not a k8s person at all)

I get you with the open source angle, and I like that. Begin.com also has a good angle on that with arc.codes.

I think in 2020 the idea of "cloud agnostic" or "multicloud" has really soured. Everyone understands that diff clouds have different strengths and if you try to be agnostic you're by definition only making use of lowest common denominator stuff (or reimplementing abstractions expensively). In fact it is one way in which my "OS" analogy really fails, Cloud OSes likely won't actually work equally well on different "hardware". but happy to be proven wrong.

@rylandg
Copy link

rylandg commented Feb 25, 2020

How did they pivot? Been watching them (and was a customer) for a while. AWS isn't the only cloud, they've got tons of services and a PaaS of their own and the offering they have in k8s (Ocean) in how they evolved their technology to dynamically generate node pools based on usage (not perfect, but still amazing) would be an evolution more than a pivot IMO. But curious what if you saw something more here.

IIRC Spotinst initial value prop was basically the spot instances feature from AWS. AWS didn't have that feature at the time of Spotinst launch. Now AWS has, spot instances, spot fleet, Fargate and various improvements to scaling across other services especially ASG's. "AWS isn't the only cloud" is also a bit hyperbolic. I doubt there are many cases in history where anyone was truly the "only player" in a market. AWS holds 40-50% marketshare which is absolutely unreal in a space like cloud computing.

Heroku's success was based on Github + ruby as well, so github (and git ecosystem) is fueling the world here.

How was Heroku successful because of Github? I wasn't really around at the time but the timelines don't really line up. By 2010 Heroku was already being acquired by Salesforce and GitHub had O(100k) users and wasn't widely adopted. The ruby part is definitely true, but Heroku was specifically targeting rails.

I think netlify making it dead simple for web devs to get their code in git to the cloud globally is the key to their adoption. I spoke to a web dev last week and they told me my product is too hard (it is) to use because there are no options for their react app to be hosted. Right now my platform says "Website" and you have to choose "Nodejs" and your node version versus "React" and it's version. BIG DX DIFFERENCE This is how Netlify has kicked ass IMO.

While I generally agree, the example with your site is way more than just a UI change. Especially with modern CDN's, developers will very rarely serve web-content from an explicitly provisioned NodeJS server. I guess I'm trying to say that it's the open-ended potential of NodeJS that scares people, not ignorance to NodeJS itself. I think a lot of developers see NodeJS and translate it to "not managed for me".

Netlify's strategy was great because they not only understood the pain of their users, but also where the users are when they most often feel that pain.

This is interesting. It took me 7 years to convince a Java dev to get out of their comfort zone and try node so. So I think we're waiting on all the java devs to get old and retire? ;)

I wouldn't bet my life on this prediction, but I don't think Java is being primarily used because a bunch of boomers are still around. You have to understand that from a corporate management perspective, Java is a perfect tool:

  • Widely taught at most U.S. universities (it was taught at University of Washington when I went ~5 years ago)
  • Relatively beginner friendly abstracts away harder comp sci fundamentals
  • Runs on any device (maybe not embedded/IOT because JVM, but there are solutions)
  • Faster CPU perf than most other popular languages, "Java is performant enough" (obviously C/C++/Rust might be better, but we're not in a bubble here)
  • Your developers can't fuck things up

If your job is managing thousands of developers (or controlling any large number of resources honestly), the name of the game is minimizing risk. Java is the ultimate corporate risk minimizer.

You'd be surprised at how low the revenue was when it was acquired. This acquisition was a huge bet on adoption versus revenue and salesforce through sales at it to make it a 100M+ ARR business. I don't know the inside deets on Firebase though, but guessing similar story.

I was actually fairly aware of the revenue. Just to be clear, I think Salesforce paid the correct price. That being said, paying the correct price wasn't commonplace in 2010 🤣

I think people leave Heroku for more control and cost optimizations and they ditch the DX, or try to home-grow their own.

I think the only people that Heroku cares about leaving the platform are the ones who scale out of it. It may sound harsh but its a numbers game and the rest of the users don't add up to much.

I'm finding companies are winning better if they focus on dev/staging for enterprise and take the final artifacts/images and throw them at a devops team to handle. Merging the line in giving DX for faster development but near-same or identical cloud environments so to lessen the dev to prod insanity is the problem. But you're totally right - I'd leave heroku because I'm stuck with mlab.

Good observation. Mostly just seems like another manifestation of risk management. Being "ops aware" in terms of productization will be really advantageous (at least until devops is optimized out).

This concept of simplifying the cloud or a layer on the cloud may be getting old. We're all really good at abstracting stuff all the time (especially people working on frameworks, dev tools, SaaS/PaaS/IaaS). I just listened in on this podcast which talks about how Kubernetes (k8s) will not be a thing in 5 years. Sounds controversial, but the conversation takes it to k8s will be a layer under the layers and just not be the big buzz anymore. At the end of the day, k8s is massively supercharging Docker which was another short buzz that has become the norm in next-gen infra. But no one says they will spawn a container on k8s, they said they will spawn a pod - (same thing for people who don't know k8s).

Lol I wish Kubernetes would not be a thing now. I agree, and as you said, it's nothing to do with Kubernetes and moreso the continuous squashing and integrating that happens in tech. At least for now people only care about if their code runs (that includes scaling etc).

I'll respond to the other stuff after work :)

@Disturbing
Copy link

I think in 2020 the idea of "cloud agnostic" or "multicloud" has really soured. Everyone understands that diff clouds have different strengths and if you try to be agnostic you're by definition only making use of lowest common denominator stuff (or reimplementing abstractions expensively). In fact it is one way in which my "OS" analogy really fails, Cloud OSes likely won't actually work equally well on different "hardware". but happy to be proven wrong.

Many businesses want this, but can't do it. I know several enterprises that built it (in years) and are now going backwards because it's not working. Data + Multicloud problem is not easy for anyone, especially depending on the amount of data. I wish I could just add multi-cloud to KintoHub for everything, but then you might get the idea that you can spawn a mysql DB multi-cloud block and realize quickly that it doesn't work like that :).


IIRC Spotinst initial value prop was basically the spot instances feature from AWS. AWS didn't have that feature at the time of Spotinst launch. Now AWS has, spot instances, spot fleet, Fargate and various improvements to scaling across other services especially ASG's. "AWS isn't the only cloud" is also a bit hyperbolic. I doubt there are many cases in history where anyone was truly the "only player" in a market. AWS holds 40-50% marketshare which is absolutely unreal in a space like cloud computing.

Gotcha - I was more wondering if there was a story like Discord coming from Gaming Apps => Gaming Chat which is a significant pivot in the business model.

How was Heroku successful because of Github? I wasn't really around at the time but the timelines don't really line up. By 2010 Heroku was already being acquired by Salesforce and GitHub had O(100k) users and wasn't widely adopted. The ruby part is definitely true, but Heroku was specifically targeting rails.

After Heroku was bought by Salesforce, the relationship didn't end there. They started < 12 months apart in the same city, 1M users on Heroku and a majority of users on GitHub were Heroku users. If Github users are talking about heroku and heroku users are talking about GitHub, it's a big deal in developer mindshare in terms of growing a new dev tool, especially in the first 100K users. This information is more from directly picking James Lindbaumen and Jason Warren's brains on their story for my current venture. Jason recently told me node was a huge player in the later-stages of Heroku as well which blew my mind. Long story short, bet on technology companies that are not popular yet allow you to really be heard by 100% of that community before they blow up. IE: If I shout out about graphql today, 0.01% of people might hear me and talk about me. If I shout out about Hasura today (subset in graphql), 15% of my users are from Hasura and > 50% of their community hears me.

While I generally agree, the example with your site is way more than just a UI change. Especially with modern CDN's, developers will very rarely serve web-content from an explicitly provisioned NodeJS server. I guess I'm trying to say that it's the open-ended potential of NodeJS that scares people, not ignorance to NodeJS itself. I think a lot of developers see NodeJS and translate it to "not managed for me". Netlify's strategy was great because they not only understood the pain of their users, but also where the users are when they most often feel that pain.

Totally agree - but this user, in particular, didn't know nodejs translates to react ;). The more I speak to the open-source and dev tool leaders, the more they keep telling me that the KISS principle for every keystroke of content and every idea put in motion towards the vision - things need to be so dead simple, it's laughable - but for new users, especially people learning technologies - it's important and critical for adoption. Thanks to Shawn's feedback - removing cloud-native from the front page of KintoHub already went in motion.

If your job is managing thousands of developers (or controlling any large number of resources honestly), the name of the game is minimizing risk. Java is the ultimate corporate risk minimizer.

Was joking ;) But informative!

@swyxio
Copy link
Author

swyxio commented Mar 4, 2020

somewhat relevant to this topic (?) i just discovered this term Hyperconvergence: https://en.wikipedia.org/wiki/Hyper-converged_infrastructure which seems to be a bit of a faded meme (which i wasnt around for) but i wonder if all im doing is rediscovering the promises and problems of the old hyperconvergence debate (which again i wasnt around for so i dont really know how it panned out, i assume it was largely correct but also overhyped by vendors)

@rylandg
Copy link

rylandg commented Mar 4, 2020

@sw-yx I've spent some time around around HCI and have worked with quite a few customers that were either actively using it or had used it. You were wondering if it's overhyped, the answer is yes. That being said, I can't think of a single emerging tech that hasn't been overhyped in recent years, it's a real problem in the industry/world IMO. Hype aside, I think HCI is a real indicator of where the market wants to go.

In your original post, you describe an offering that provides all of the individual building blocks needed to create a holistic product, while maintaining your ability to choose which of those blocks are actually used. Even though all of the blocks have been wrapped in a nice "whole product offering" for you (Heroku, Firebase, Amplify, Netlify even), you're still the one who ultimately makes decisions about what services are used. While HCI relates partially to this conversation, there is a key difference. The premise of HCI is that compute and storage are inseparable and if you want one you implicitly want the other. This usually means, that by using HCI you forfeit the ability to scale compute and storage independently.

HCI is really about simplifying things down to a single unit. Because of this, HCI offerings tend to manifest as Appliances (I've heard the term "datacenter in a box" before). It's easy to see why an appliance is the path of least resistance for HCI, if you understand the basic rationale for HCI in the first place (less moving parts). I've personally worked with two different HCI appliances before (albeit minimally), the Hyperflex from Cisco and Nutanix ??? (maybe AOS). I didn't use either solution enough to form a confident opinion, but anecdotally Nutanix was very user friendly and the customer seemed very happy with it. On the other side of the coin, the Hyperflex customer I worked with was in the process of moving away from the solution and was quite frustrated. While I think the appliance route is the most convenient manifestation of HCI, it suffers from a big problem. Generally speaking, large enterprises are the ones who tend to invest in appliances. Unfortunately, HCI is not a great use case for many enterprises who often need to have a strong devops team anyway and also have a cost-driven need to scale compute resources independently. OTOH HCI appliance might actually make sense to a smaller shop that wants a more hands off approach to scaling on-prem infrastructure. From my observations, smaller shops tend to not go the appliance route.

What you should really keep your eye on is vsan from VMWare. Someone might yell at me and tell me that vsan is not really an HCI, which is obviously true. In practice most people who want HCI would be just as served by vsan. With vsan, instead of going the appliance route, vmware offers a Virtualized San layer that allows you to access storage from any DAS across the nodes. This provides a lot of the benefits people are looking for with HCI, without needing to physically bundle your compute and storage into a single unit.

I hope I made things clearer!

@swyxio
Copy link
Author

swyxio commented Mar 4, 2020

whoa, you really did, thank you!! TIL you worked with Nutanix.. old coworker of mine was a senior sales person there and i confess i never really understood what they did until you explained it in terms of HCI (which, btw, to me stands for Human Computer Interaction, haha).

@rylandg
Copy link

rylandg commented Mar 5, 2020

Glad it made sense. In my short time I've somehow been lucky enough to work with a lot of very diverse types of technology. Even then, I see at least 1 startup a day whose value prop completely goes over my head. When it happens it just excites me because its another reminder of how much new stuff there is to learn.

In regards to Nutanix, when I worked with their HCI offering I wasn't working directly with them (rather one of their customers). More recently my last company had a small engagement with Nutanix itself and while it didn't materialize, the Nutanix team members I worked with were incredibly intelligent and down to earth people. Left a good impression on me.

HCI used to mean human computer interfaces for me which I think is actually the same thing as human computer interaction. And although I do understand what Nutanix offers, I'm still waiting for someone to explain to me what ServiceNow does (10% joking 90% not lol).

@swyxio
Copy link
Author

swyxio commented Mar 22, 2020

tagging more info as i learn it - here's Joe Duffy of Pulumi talking about how the Cloud OS idea was first proposed by Dave Cutler (Windows NT architect)

(18 mins in) https://softwareengineeringdaily.com/2020/03/19/pulumi-infrastructure-as-code-with-joe-duffy/

and some links i found:

===

Terraform is kind of "self-rolled distros": https://www.youtube.com/watch?v=h970ZBgKINg

@swyxio
Copy link
Author

swyxio commented Apr 21, 2020

@swyxio
Copy link
Author

swyxio commented Jun 22, 2020

https://youtu.be/HlAXp0-M6SY

"we must treat the data center itself as one massive warehouse scale computer." urs holzle

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment