Skip to content

Instantly share code, notes, and snippets.

@swyxio
Last active May 3, 2023 12:23
Show Gist options
  • Save swyxio/ff8a4f6757286444fa20b43f6b98b205 to your computer and use it in GitHub Desktop.
Save swyxio/ff8a4f6757286444fa20b43f6b98b205 to your computer and use it in GitHub Desktop.
Cloud Operating Systems and Reconstituting the Monolith. tweet responses: https://twitter.com/swyx/status/1226257539886669825?s=20

Update

The first post has been published: https://www.swyx.io/writing/cloud-distros

The second post has been adapted for Temporal: https://www.swyx.io/why-temporal/


these are bullet points of a blogpost on a topic i know very little about but i feel like there is something there that is happening as we speak

this might be better as two blog posts

Cloud Operating Systems

  • the Big 3 Cloud Providers are mostly (not exclusively) racing each other towards providing good cloud primitives.
    • arguably this is not the best way to perceive their strategy as it seems GCP/Azure are verticalizing rather than matching AWS horizontally, but that's not relevant here
  • Applications were originally envisioned to be run directly on these clouds, but, increasingly, intermediate providers are rising up to provide a better developer experience and enforce opinionated architectures (like JAMstack)
    • Netlify
    • Zeit
    • Repl.it
    • Begin.com
    • Glitch
    • Render.com
    • Amplify
    • Binaris
    • Stackery
    • ???
  • The working name for this new generation of cloud providers, used by Martin Casado, Amjad Masad, and Guillermo Rauch, is "second layer" or "higher level" cloud providers.
  • Nobody loves these names. It doesn't tell you the value add. Also the name implies that more layers atop these layers will happen, and that is doubtful.
  • In the first (serverful) wave of Cloud, the abstraction from hardware to software was often explained as a 3 layer model: IaaS -> PaaS -> SaaS

https://venturebeat.com/wp-content/uploads/2011/11/iaas-paas-saas.jpg?resize=640%2C439&strip=all?w=640&strip=all

  • But all the big clouds are essentially PaaSes now - OSes are increasingly being abstracted away. So maybe we can use "second layer PaaS"?
  • if we view the Big 3 as providing new "cloud primitives", then maybe a better name for "second layer clouds" is "Cloud Operating Systems". especially if the premise (if not the current reality) is your application seamlessly running across multiple clouds.

Reconstituting the Monolith

  • Serverless cannot proclaim total victory until we can recreate DHH's demo from 2005 in 15 minutes.
  • The plain fact is that has been hard to break up with the monolith - it is simply too handy to have everything in one place.
  • Serverless functions (Lambda) are nice, but not nearly enough to replace everything we used to do in a single runtime.
  • We can piece back everything with services and APIs, but this architecture is still far too bespoke and brittle and slow and leaky. (altho in theory we still get the benefits of everything being distributed, not worrying about horiz/vertical scaling, and pay-per-use pricing)
  • the jobs that monoliths do that we have to reconstitute in serverless-land:
    • static fileserving: often relegated to CDNs anyway
    • functions: marginal compute
    • gateway: for auth/sessions/rate limiting, etc
      • auth is a hard enough problem on its own that it is offered as a standalone service, altho really it is made up of other elements
    • socket management: for live subscriptions, maybe part of the gateway
    • jobrunners: for long running compute (aka batch processing?)
    • queue: for not dropping messages and jobs (aka stream processing?)
    • scheduler: for coordinating functions and jobrunners. at most basic level this is a cronjob, but you will eventually want a smarter scheduler for prioritizing work across limited allocated resources.
    • object/cold storage: slower, immutable, large, (long lived ?) persistence
    • database/hot storage: fast, mutable, small, (short lived ?) persistence
      • related jobs: searching, caching
    • (metajobs: error logging, usage logging, dashboarding, CI/CD)
    • (unique to cloud: latency aka edge computing. see victor bahl at msft)
  • each has to be able to talk to and make use of each other EASILY to match the DX of monoliths
  • keeping up with this stuff is a fulltime job, the media company covering this is literally called The New Stack
  • infinite scalability is nice, but not at the expense of infinite potential cost. a good cost cap + failover story is also important to DX. Users understand "sorry our service is temporarily down because of a sudden surge in demand", but the opposite of "sorry your bill this month is $1m because of a sudden surge in usage and it's up to you to figure out why" is less well accepted by developers and their employers
  • so maybe the answer to breaking the monolith up is to reconstitute the monolith inside the application framework - standard APIs that expose the various functions of a monolith.
  • the Serverless Framework is an early pioneer of this, but seems focused on the IaaC job rather than the unified interface job (and doesn't have as good an answer for non serverless stuff)
  • Zeit and Next.js take the monorepo -> microservices split rather seriously and have vertically aligned themselves all the way down to the frontend library layer - is there more to do here?
  • Redwood is TPW and team's effort to do this atop Netlify, but the db layer is currently on Heroku.
  • i think Cloud Operating Systems are well positioned to offer and coordinate these jobs and expose a good DX layer for users.
    • Binaris and Repl.it focus on functions
    • Zeit and Netlify combine static fileserving with functions
    • Begin combines data with the above
    • Amplify adds storage with the above (and, for some reason, XR?!)
    • what about the other jobs of the monolith? currently, we are told to spin up services the regular old way. or duct tape together a bunch of solutions not designed for this task and not integrated with anything else.
    • not. good. enough.

I think the Cloud OS that reconsititutes the monolith earliest, will be a natural aggregator of every application developer moving to a serverless first world.

note - kevit scott - reprogramming the american dream, AI given infinite compute. the guy who built a supercomputer on aws.

again the mega caveat to all of the above is that i am a novice in this industry and am ignorant of both how hard it is to do all of this and the full capabilities of every platform

@Disturbing
Copy link

I think in 2020 the idea of "cloud agnostic" or "multicloud" has really soured. Everyone understands that diff clouds have different strengths and if you try to be agnostic you're by definition only making use of lowest common denominator stuff (or reimplementing abstractions expensively). In fact it is one way in which my "OS" analogy really fails, Cloud OSes likely won't actually work equally well on different "hardware". but happy to be proven wrong.

Many businesses want this, but can't do it. I know several enterprises that built it (in years) and are now going backwards because it's not working. Data + Multicloud problem is not easy for anyone, especially depending on the amount of data. I wish I could just add multi-cloud to KintoHub for everything, but then you might get the idea that you can spawn a mysql DB multi-cloud block and realize quickly that it doesn't work like that :).


IIRC Spotinst initial value prop was basically the spot instances feature from AWS. AWS didn't have that feature at the time of Spotinst launch. Now AWS has, spot instances, spot fleet, Fargate and various improvements to scaling across other services especially ASG's. "AWS isn't the only cloud" is also a bit hyperbolic. I doubt there are many cases in history where anyone was truly the "only player" in a market. AWS holds 40-50% marketshare which is absolutely unreal in a space like cloud computing.

Gotcha - I was more wondering if there was a story like Discord coming from Gaming Apps => Gaming Chat which is a significant pivot in the business model.

How was Heroku successful because of Github? I wasn't really around at the time but the timelines don't really line up. By 2010 Heroku was already being acquired by Salesforce and GitHub had O(100k) users and wasn't widely adopted. The ruby part is definitely true, but Heroku was specifically targeting rails.

After Heroku was bought by Salesforce, the relationship didn't end there. They started < 12 months apart in the same city, 1M users on Heroku and a majority of users on GitHub were Heroku users. If Github users are talking about heroku and heroku users are talking about GitHub, it's a big deal in developer mindshare in terms of growing a new dev tool, especially in the first 100K users. This information is more from directly picking James Lindbaumen and Jason Warren's brains on their story for my current venture. Jason recently told me node was a huge player in the later-stages of Heroku as well which blew my mind. Long story short, bet on technology companies that are not popular yet allow you to really be heard by 100% of that community before they blow up. IE: If I shout out about graphql today, 0.01% of people might hear me and talk about me. If I shout out about Hasura today (subset in graphql), 15% of my users are from Hasura and > 50% of their community hears me.

While I generally agree, the example with your site is way more than just a UI change. Especially with modern CDN's, developers will very rarely serve web-content from an explicitly provisioned NodeJS server. I guess I'm trying to say that it's the open-ended potential of NodeJS that scares people, not ignorance to NodeJS itself. I think a lot of developers see NodeJS and translate it to "not managed for me". Netlify's strategy was great because they not only understood the pain of their users, but also where the users are when they most often feel that pain.

Totally agree - but this user, in particular, didn't know nodejs translates to react ;). The more I speak to the open-source and dev tool leaders, the more they keep telling me that the KISS principle for every keystroke of content and every idea put in motion towards the vision - things need to be so dead simple, it's laughable - but for new users, especially people learning technologies - it's important and critical for adoption. Thanks to Shawn's feedback - removing cloud-native from the front page of KintoHub already went in motion.

If your job is managing thousands of developers (or controlling any large number of resources honestly), the name of the game is minimizing risk. Java is the ultimate corporate risk minimizer.

Was joking ;) But informative!

@swyxio
Copy link
Author

swyxio commented Mar 4, 2020

somewhat relevant to this topic (?) i just discovered this term Hyperconvergence: https://en.wikipedia.org/wiki/Hyper-converged_infrastructure which seems to be a bit of a faded meme (which i wasnt around for) but i wonder if all im doing is rediscovering the promises and problems of the old hyperconvergence debate (which again i wasnt around for so i dont really know how it panned out, i assume it was largely correct but also overhyped by vendors)

@rylandg
Copy link

rylandg commented Mar 4, 2020

@sw-yx I've spent some time around around HCI and have worked with quite a few customers that were either actively using it or had used it. You were wondering if it's overhyped, the answer is yes. That being said, I can't think of a single emerging tech that hasn't been overhyped in recent years, it's a real problem in the industry/world IMO. Hype aside, I think HCI is a real indicator of where the market wants to go.

In your original post, you describe an offering that provides all of the individual building blocks needed to create a holistic product, while maintaining your ability to choose which of those blocks are actually used. Even though all of the blocks have been wrapped in a nice "whole product offering" for you (Heroku, Firebase, Amplify, Netlify even), you're still the one who ultimately makes decisions about what services are used. While HCI relates partially to this conversation, there is a key difference. The premise of HCI is that compute and storage are inseparable and if you want one you implicitly want the other. This usually means, that by using HCI you forfeit the ability to scale compute and storage independently.

HCI is really about simplifying things down to a single unit. Because of this, HCI offerings tend to manifest as Appliances (I've heard the term "datacenter in a box" before). It's easy to see why an appliance is the path of least resistance for HCI, if you understand the basic rationale for HCI in the first place (less moving parts). I've personally worked with two different HCI appliances before (albeit minimally), the Hyperflex from Cisco and Nutanix ??? (maybe AOS). I didn't use either solution enough to form a confident opinion, but anecdotally Nutanix was very user friendly and the customer seemed very happy with it. On the other side of the coin, the Hyperflex customer I worked with was in the process of moving away from the solution and was quite frustrated. While I think the appliance route is the most convenient manifestation of HCI, it suffers from a big problem. Generally speaking, large enterprises are the ones who tend to invest in appliances. Unfortunately, HCI is not a great use case for many enterprises who often need to have a strong devops team anyway and also have a cost-driven need to scale compute resources independently. OTOH HCI appliance might actually make sense to a smaller shop that wants a more hands off approach to scaling on-prem infrastructure. From my observations, smaller shops tend to not go the appliance route.

What you should really keep your eye on is vsan from VMWare. Someone might yell at me and tell me that vsan is not really an HCI, which is obviously true. In practice most people who want HCI would be just as served by vsan. With vsan, instead of going the appliance route, vmware offers a Virtualized San layer that allows you to access storage from any DAS across the nodes. This provides a lot of the benefits people are looking for with HCI, without needing to physically bundle your compute and storage into a single unit.

I hope I made things clearer!

@swyxio
Copy link
Author

swyxio commented Mar 4, 2020

whoa, you really did, thank you!! TIL you worked with Nutanix.. old coworker of mine was a senior sales person there and i confess i never really understood what they did until you explained it in terms of HCI (which, btw, to me stands for Human Computer Interaction, haha).

@rylandg
Copy link

rylandg commented Mar 5, 2020

Glad it made sense. In my short time I've somehow been lucky enough to work with a lot of very diverse types of technology. Even then, I see at least 1 startup a day whose value prop completely goes over my head. When it happens it just excites me because its another reminder of how much new stuff there is to learn.

In regards to Nutanix, when I worked with their HCI offering I wasn't working directly with them (rather one of their customers). More recently my last company had a small engagement with Nutanix itself and while it didn't materialize, the Nutanix team members I worked with were incredibly intelligent and down to earth people. Left a good impression on me.

HCI used to mean human computer interfaces for me which I think is actually the same thing as human computer interaction. And although I do understand what Nutanix offers, I'm still waiting for someone to explain to me what ServiceNow does (10% joking 90% not lol).

@swyxio
Copy link
Author

swyxio commented Mar 22, 2020

tagging more info as i learn it - here's Joe Duffy of Pulumi talking about how the Cloud OS idea was first proposed by Dave Cutler (Windows NT architect)

(18 mins in) https://softwareengineeringdaily.com/2020/03/19/pulumi-infrastructure-as-code-with-joe-duffy/

and some links i found:

===

Terraform is kind of "self-rolled distros": https://www.youtube.com/watch?v=h970ZBgKINg

@swyxio
Copy link
Author

swyxio commented Apr 21, 2020

@swyxio
Copy link
Author

swyxio commented Jun 22, 2020

https://youtu.be/HlAXp0-M6SY

"we must treat the data center itself as one massive warehouse scale computer." urs holzle

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment