Skip to content

Instantly share code, notes, and snippets.

@lukestanley
Created January 16, 2024 11:31
Show Gist options
  • Save lukestanley/ddb30f4bcf4bf91d1ef82fc07adace8d to your computer and use it in GitHub Desktop.
Save lukestanley/ddb30f4bcf4bf91d1ef82fc07adace8d to your computer and use it in GitHub Desktop.
Local-first software with Martin Kleppmann automated transcript
And I feel like this idea really changes the abstractions that operating systems should provide, because maybe OSes should not just be providing this model of files as a sequence of bytes, but this higher-level CRDT-like model. And how does that impact the entire way how software is developed? And we've talked before about Mark's dabbling in playing the piano. I understand this is a hobby you're starting to look into as well, Martin. Oh, yes. I've been playing the piano, trying to do it a bit more consistently for the last year and a half or so. A lockdown project. And you have a technique for not annoying your neighbors. For the neighbors? Or is this an electronic piano, or how do you do that? It's an electric piano, although I don't think it's too bad for the neighbors. Lately, I've been trying to learn a Debussy 400 piece that I can play together with my wife. So she'll play two hands, and I'll play the other two. Nice. I suspect a lot of our listeners know you already, Martin. I think you're within your small, narrow niche. You're a pretty high-profile guy. But for those that don't, it'd be great to hear a little bit about your background, what brought you on the journey to the topic. We're going to talk about today. Yeah, well, I'm a computer scientist, I guess. I started out as an entrepreneur and started two startups some years ago. I ended up at LinkedIn through the acquisition of the second startup. And I worked on large-scale stream processing with Apache Kafka and was part of that sort of stream processing world for a while. And then I wanted to share what I had learned about building large-scale distributed data systems. And so I then took some time out to write a book, which is called Designing Data-Intensive Applications, which has turned out to be surprisingly popular. Yeah, you wrote a nice kind of tell-all. You showed the numbers on it, which it's been financially successful for you, but also one of the more popular O'Reilly books just by kind of copy sold in recent times. I like that post. I like the candor there. But yeah, it makes you a pretty successful author, right? Yeah, it's sold over 100,000 copies. Which is way more than what I was expecting for something that's a pretty technical, pretty niche book, really. But the goal of the book really is to help people figure out what sort of storage technologies and data processing technologies are appropriate for their particular use case. So it's a lot about the trade-offs and the pros and cons of different types of systems. And there's not a whole lot on that sort of thing out there. You know, there's a lot of sort of vendor talk, hyping the capabilities of their particular database or whatever it might be. But not so much honest comparison between different approaches. So that's what my book tries to provide. Yeah, and then after writing that book, I sort of slipped into academia sort of half by accident, half by design. So I then found a job at the University of Cambridge where I could do research full time. And since then, I've been working on what we have come to call the Google First software, which we're going to talk about today. The nice thing there is that now I'm there in academia, compared to the startup world, I have the freedom to work on really long-term ideas, big ideas which might take five or 10 years until they turn into like viable technologies that might be used in everyday software development. But if they do work, they'll be really impactful and really important. And so I'm enjoying that freedom to work on really long-term things now as an academic. And certainly it struck me when we got the chance to work together through these Ink and Switch projects that because you have both, the commercial world, including startup founder, but obviously you're very immersed in the academic kind of machinery now. And again, just that long-term mindset and thinking about creating public goods and all that sort of thing. And I found that I actually really like now working with people that have both of those. Another great example there would be another former podcast guest, Jeffrey Litt. He was also in the startup world. Now he's doing academic work at MIT. Yes, and I'm actually doing a project with him right now. Right. I forgot. I forgot about that. There's a current Ink and Switch project there. So I find that maybe if you live your whole life in one of those two, kind of commercial slash industry or academia, you get like a fish doesn't know what water is kind of thing. But if you have experienced both models, then it's easier to know the pros and cons and understand the shape of the venue you're doing your work in. And in the end, the point is to have some meaningful impact on humanity through your work, whatever small piece of the world you hope you're making better. In our case, it's computer things. Right. But that the venue you're in is not the point. That's just a vehicle for getting to where you want to go. And each of these styles of venue have different trade-offs. And being aware of those maybe makes it easier to have your work have an impact. Yes, I think it is really helpful to have seen both sides. And I find it allows me to be a little bit more detached from the common mindset that you get. Like in every job, you're like, oh, I'm going to do this. In every domain you get, you know, there are certain things that everyone believes, but, you know, they're kind of unspoken, maybe not really written down either. And so like in academia, that's like the publishing culture and the competitiveness of publication venues and that sort of stuff, which seems ridiculous to outsiders. But if you're in it, you kind of get accustomed to it. And likewise, in startups, it's like the hype to be constantly selling and marketing and promoting what you're doing to the max. Crushing it. Always crushing it. Exactly. And to an outsider, that seems really bizarre. It's kind of a ridiculous show that people put on, frankly. But to an insider, you know, you just get used to it. And that's just your everyday life. I find that having seen both makes me a bit more detached from both of them. And I don't know, maybe I see a little bit more through the bullshit. So as you hinted, our topic today is local first software. So this is an essay that I'll link to in the show notes. It's about two. It's about two years old. And notably, there's four authors on this paper. Three of them are here, kind of almost a little reunion. And actually, the fourth author, Peter Van Hardenburg, we hope to have on as a future guest. But I thought it would be really fun to not only kind of summarize what that philosophy is, particularly because we're actively pursuing that for the Muse sinking persistence model, but also to look at sort of what we've learned since we published that essay and revisiting a little bit. What do we wish we'd put in? How's the movement, if that's the right word for it? How's that evolved? What have we learned in that time? But I guess before getting into all that, maybe, Martin, you can give us the elevator pitch, if I'm to reference a startup terminology, the brief summary of what is local first software? Yeah, local first software is a reaction to cloud software. And so with cloud software, I mean things like Google Docs, where you have a browser window and you type into it and you can share it. It's really be easily, you can have several people contributing to a document really easily. You can send it for comments very easily and so on. So it has made collaboration a ton easier, but it's come at a great cost to our control and our ownership of the data. Because whenever you're using some cloud software, the data is stored on the cloud provider servers like Google servers, for example. And, you know, as users, we are given access to that data temporarily until that day where Google suddenly decides to lock your account and you are locked out of all of the documents that you ever created with Google Docs. Or until the startup, a software as a service product you're using, suddenly goes bust and decides to shut down their product with two weeks notice and maybe allows you to download a zip file full of JSON files as your data export. And I find that tragic because as creative people, we put a ton of effort, time and our souls and really our personalities into the things that we create. And so much now, the things that we create are computer based things. You know, whether you're writing the script for a play or whether you're negotiating a contract or whether you're doing any sort of endeavor, it's probably a file on a computer somewhere. And if that file is in some cloud software, then there's always this risk that it might disappear and that you might lose access to it. And so what we try to do with Local First Software is to articulate a vision for the cloud. So what we try to do with Local First Software is to articulate a vision for the cloud. And we try to do that in a way that's not going to happen in the future where that does not happen, where we have the same convenience that we have with cloud software, that is, we have the same ability to do real time collaboration. It's not back to the old world of sending files back and forth by email. We still want the same real time collaboration that we get with Google Docs. But at the same time, we also want the files stored on our own computers, because if there are files on our own computers, then nobody can take them away. They are there. We can back them up ourselves. We can optionally back them up to a cloud server. If we want to. There's nothing wrong with using a cloud service as long as the software still continues working without the cloud service. Moreover, we want the software to continue working offline so that if you're working on a plane or working on a train that's going through a tunnel or whatever, the software should just continue to work. And we want better security and privacy because we don't want cloud services scanning through the content of all of our files. I think for creativity, it's important to have that sense of privacy and ownership over your workspace. And so those are some of the ideas that we try to encapsulate in this idea of local first software. So how can we try to have the best of both worlds of the convenience of cloud software, but the data ownership of having the files locally on your own device? Yeah, for me, the core of it is really agency and much of the value of cloud. I think there's a version of this also for mobile apps, let's say, and app stores and that sort of thing, which is not what we're addressing in the paper. But maybe that's something that we can do. Maybe there's a theme in computing that we've made computers vastly more accessible by, in many cases, taking agency from people. And that's actually a good thing in many cases, right? You don't need to defrag your hard drive anymore. You lose your device, your email and your photos and all those things are still in this cloud that's managed by experienced sysadmins and product managers and so forth at companies like Google and so forth. And they can often do a better job. Yeah. I mean, I think of managing my own email servers, SMTP servers, years back and needing to deal with data backup and spam filtering and all that kind of thing. And Gmail came along and I was just super happy to outsource the problem to them. Absolutely. They did a better job managing it. So I think that's basically, in many ways, a good trend or is a net good in the world. And I don't think we feel like we necessarily want to go back to everyone needs to do a better job. Yeah. I think that's really important. So I think that's really important. Yeah. And I think that's really important. Yeah. And I think that's really important. Yeah. I think that's really important. Yeah. I think that's really important. Yeah. I think that's really important. Yeah. Maybe they should do more of those data management tasks. But I think for the area of creative tools or more, I guess you call them power users, but it's like you said, if you're writing a play, that's just a very different kind of interaction with a computer than the average person doing some calendar and email and messaging. Yeah. Maybe they want different tradeoffs. It's worth doing a little bit more management and taking a little more ownership to get that greater agency over something like, yeah, my work product. Yeah. The script of my play or my master thesis or whatever it is that I'm working on is something that really belongs to me. And I want to put a little extra effort to have that ownership. Right, exactly. And I feel like it's not reasonable to expect everyone to be a sysadmin and to set up their own services. You know, you get this self-hosted cloud software, but most of it is far too technical for the vast majority of users. And that's not where we want to go with this. I think you still want exactly the same kind of convenience of cloud software that, you know, it just works out of the box and you don't have to worry about the technicalities of how it's set up. But one part of local first software is that because all of the interesting app-specific work happens client-side on your own device, it now means that the cloud services that you do use for syncing your data, for backing up your data, and so the cloud services become generic. And so you could imagine... Dropbox or Google Drive or AWS or some other big cloud provider just giving you a syncing service for local first apps. And the way we're thinking about this, you could have one generic service that could be used as the syncing infrastructure for many different pieces of software. So regardless of whether the software is a text editor or a spreadsheet or a CAD application for designing industrial products or music software, or whatever... All of those different apps could potentially use the same backup and syncing infrastructure in the cloud. And you can have multiple cloud providers that are compatible with each other, and you could just switch from one to the other. So in that point, then it just becomes like, okay, who do you pay six cents a month in order for them to store your data? It becomes just a very generic and fungible service. And so that, I see, makes actually the cloud almost more powerful. Because it removes the lock-in that you have from... You have to use a single cloud service provided by the software author. Instead, you could switch from one cloud provider to another very easily. And you still retain the property that you're using all of the cloud provider's expertise in providing a highly available service. And you don't have to do any sysadmin yourself. It's not like running your own SMTP server. So I feel like this is a really promising direction that local first software enables. Yeah. For sure. Indeed. And you could even describe local first software, I think, as sort of generalizing and distributing the capabilities of the different nodes. So in the classic cloud model, you have these thin clients. They can dial into the server and render whatever the server tells them. And then you have the servers, and they can store data and process it and return it to clients. And when you have both of those at the same time, you know, it works great. But then if you're a client, like you said, who's in a tunnel, well, too bad. You can't do anything. And the local first model... Is more that any node in that system can do anything. It can process the data. It can validate it. It can store it. It can communicate it. It can sync it. And then you can choose what kind of typologies you want. So it might be that you just want to work alone in your tunnel. Or it might be that you want to subscribe to a cloud backup service that does the synchronization and storage part for you while you still maintain the ability to process and render data locally. This actually gets to how I first got into what we're now calling local first software. I was in a coffee shop with Peter Van Hartenberg, who's one of the other... Authors that Adam mentioned. And we're talking about working together at the lab when he was a principal there. He's now the director. And he showed me the Pixel Pusher prototype. So Pixel Pusher was this pixel art app where you color individual pictures to make a kind of retro graphic thing. And it was real-time collaborative. But the huge thing was that there was no server. So that you had this one code base and this one app. And you got real-time collaboration across devices. And that was the moment that I realized... You know... I was a fish in the cloud infrastructure water. And I didn't realize it. It's assumed, oh, you need servers and AWS, need a whole ops team. You're going to be running that for the rest of your life. It's a whole thing. Well, actually, no. You could just write the app and point at the other laptop and there you go. And we eventually kind of realized all these other benefits that we would eventually articulate as the desiderata or property to the local first software article. But that was the thing that really actually kicked it off for me. Yeah. And that aspect that the apps become really self-contained. And that you just don't have... You don't have a server anymore. If you have a server, it's like a really simple and generic thing. You don't write a specific server just for your app anymore. That's something that I'm not sure we really explored very well in the local first asset as it was published. But I've been continuing to think about that since. You know, this has really profound implications for the economics of software development. Because right now, as you said, like if you're a startup and you want to provide some SaaS product, you need your own ops team that is available 24-7. With everyone pager duty so that when the database starts running slow or a node falls over and you need to reboot something or whatever, you know, there's all this crap that you have to deal with, which makes it really expensive to provide cloud software because you need all of these people on call. And you need all these people to write these scalable cloud services. And it's really complicated, as evidenced by my book, a lot of which is basically like, oh, crap, how do I build a scalable cloud service? And with local first software, potentially... That problem simply goes away because you've just got each local client which just writes to storage on its own local hard disk. You know, there are no distributed systems problems to deal with, no network timeouts and so on. You just write some data locally. And then you have this syncing code, which you just use an open source library like AutoMerge, which will do the data syncing between your device and maybe a cloud service and maybe the other services. And the server side is just non-existent. And you've just like removed the entire backend team from the cloud. The cost of developing a product. And you don't have the ops team problem anymore because you're using some generic service provided by some other cloud provider. And, you know, that has the potential to make the development of collaborative software so much cheaper, which then in turn will mean that we get more software developed by smaller teams faster. It'll improve the competitiveness of software development in general. Like, it seems to have so many positive effects once you start thinking it through. Yeah, absolutely. For me. Yeah. Yeah. Maybe similar to both of you. My motivations were, well, both as a user and as a, let's say, a software creator or provider. On the user side, we have these seven different points we articulate. And I think you can, in fact, we even set it up this way is you can give yourself a little scorecard and see which of the boxes you tick. It'll be fun to do that for the Muse syncing service when that's up and running. But the offline capability is a huge one to me. And it's not just the convenience. I mean, yeah, it's every time. I'm working on the train and my train goes through a tunnel and suddenly I can't type into my documents anymore, for example. Or I don't know, I like to go more remote places to work and have solitude, but then I can't load up Figma or whatever else. And yeah, that for me as a user is just this feeling of it comes back to the loss of agency, but also just practically it's just annoying. Absolutely. And, you know, we assume always on Internet connections, but I wonder how much that is because the software engineers are people sitting in. offices or maybe now at home in San Francisco on fast Internet connections with always connected devices versus kind of the more realities of life walking around in this well connected, but not perfectly so world we all live in. That's on the user side. Yeah, I feel like there's a huge bias there towards like, oh, it's fine. We can assume everyone always has an Internet connection because, yes, we happen to be that small segment of the population. But those have a. Reliable Internet connection most of the time. There's so many situations in which you simply can't assume that. And that might be anything from a farmer working on their fields, using an app to manage what they're doing to their crops and something like that. And, you know, they won't necessarily have reliable cellular data coverage, even in industrialized countries, let alone in other parts of the world where you just can't assume that sort of level of network infrastructure at all. Yeah. Yeah. It's funny you mention this because we often run into this on the summits that we have for Muse. So we were recently in rural France and we had pretty slow Internet, especially on upload. I think it was a satellite connection. And we always had this experience where there are four of us sitting around a table and you're looking at each other, but you can't, you know, send files around because it needs to go to, you know, whatever, Virginia and come all the way back. It's crazy if you think about it. It's ridiculous. Yeah. And I don't think you even need to reach as far as a farmer or a team summit at a remote location. I had a kitchen. I had a kitchen table in the house I lived in right before this one that was like a perfect place to sit and work with my laptop. But the location of the refrigerator, which it really couldn't be any other place, just exactly blocked the path to my router. And the router couldn't really be any other place. I guess I could run a wire or something, but I really wanted to sit right there and work. But again, it's this ridiculous thing where you can't even put a character into a document and I can pick up the laptop and walk a meter to the left. And now suddenly I can type again. Yeah, totally. And you compare that to something like Git. It does have more of a local. It's probably one of the closest thing, the true local for software where you can work. And yes, you need an internet connection to share that work with others. But you're not stopped from that moment to moment typing things into your computer. Yep. And furthermore, from the implementation perspective, even when you have a very fast internet connection, you're still dealing with this problem. So if I'm using an app and I type in a letter on my keyboard, between the time when I do that. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. kind of second special case thing. Like, oh, it's only going to be like this for 100 milliseconds. So just kind of do a hacky solution and make it so that most of the time the right letter shows up. And that's why you have this behavior where apps will have like an offline mode, but it like never works. Because I think we've mentioned this in the podcast before, there are systems that you use all the time and systems that don't work. This is a maximum we can link to. But again, with local first, you're kind of exercising that core synchronization approach all the time, including when it's just you and another server on a good connection. Yeah, and from a sort of fundamentals of distributed systems point of view, I find that very satisfying because I just see this as different amounts of network latency. Like if you're online, you have network latency of 50 or 100 milliseconds. If you're offline, you have network latency of three hours or however long it's going to be until you next come back online again. To me, those are exactly the same. You know, I don't care if it's a few orders of magnitude apart. Both the network latency both need to be dealt with. And if we can use the same techniques for dealing with both standard online latency and being offline, that just simplifies the software dramatically. Going back to sort of the infrastructure, fewer moving parts thing and speaking to our personal motivations, for me, the experience of running Heroku was a big part of my motivation or fed into my interest in this because Heroku was an infrastructure business. I didn't quite grasp what that meant when we went into it. I just wanted a better way to deploy apps. And in the end, I enjoy writing software. I enjoy creating products that solve problems for people. But infrastructure is a whole other game. And, you know, it became the point where once you're, I don't know if mission critical is the right word, but just something people really care about working well and you're in the critical path. So, for example, our routing infrastructure, if it was down for three seconds, people would complain. So the slightest hiccup, and as they should, that was part of the service that that company is providing. And so that's fair enough. But then when I go, okay, well, I'm building software. When I think of, for example, Muse, where I'm providing this productivity tool to help people think and that sort of thing, I don't want to be paged because someone went to move a card five centimeters to the right and our server was down or overloaded or something. So then they can't move the card. And so then they're writing into support angrily. I'm pretty comfortable with there's some kind of cloud syncing problem. And okay, I can't easily like push my changes to someone else. And that is still a problem, but it feels like it's on the slightly different timeline. You're not just blocking the very most basic, fundamental operation of the software. And so the idea that exactly as you said, it changes the economics. For me personally, I want to spend more of my time writing software and building products and less of my time setting up, maintaining and running infrastructure. So I guess looking back on the two years that have elapsed, I would say that this is probably, it's hard to know for sure, but the IncanSwitch essays, there's a number of them that I think had a really good impact. But I think this one probably, just my anecdotal feeling of seeing people cite it by linking it in Twitter comments and things like that. It feels like one of the bigger impact pieces that we published. And I do really see quite a lot of people referencing that term. You know, we've sort of injected that term into discussion again, at least on a certain very niche, narrow world of things. So yeah, I'd be curious to hear from both of you first, whether there's things that looking back, you wish we'd put in or you would add now. And then how that interacts with what you make of local first movement or other work that people are doing on that now. I'm very happy that we gave the thing a name. There's something we didn't have initially when we started writing this and we were just writing this like manifesto for software that works better, basically. And then at some point we thought like, it would be really good to have some way of referring to it. And, you know, people talk about offline first or mobile first, and those were all kind of established things. And, you know, terms that people would throw around. And we also wanted some term X where we could say like, I'm building an X type app. And so I'm very glad that we came up with this term local first, because I've also seen people even outside of our direct community starting to use it and just, you know, put it in an article casually without even necessarily explaining what it means and just assuming that people know what it is. And I think that's a great form of impact. If we can give people a term to articulate what it is they're thinking about. Yeah. Language, a shared vocabulary to describe something is a very powerful way to one, just sort of advance our ability to communicate clearly with each other, but also, yeah, there's so many ideas. I mean, it's a 20 something page paper and there's so many ideas, but you can wrap this up in this one term. And for someone who has downloaded some or most of these ideas, that one term can carry all the weight and then you can build on that. You can take all that as a given and then build forward from there. Yeah. Yeah. One thing I wish we had done more on is I think trying to get a bit more into the economic implications of it. I guess that would have made the essay another five pages longer. And so at some point we just have to stop, but I feel like it's quite an important aspect. Like what we talked about earlier of not having to worry about backends or even just like not having to worry generally about the distributed systems problem of like you make a request to a server, the request times out. You have no idea whether the server got the request or not. Like, do you retry it? If so, how do you make the retries idempotent so that it's safe to retry and so on? Like all of those problems just go away if you're using a general purpose syncing infrastructure that somebody else has written for you. And there are other implications as well that are less clear of what about the business model of software as a service? Because there are a lot of companies business model right now is basically pay us. Otherwise you're going to get locked out of your data. So it's using this idea of holding data hostage almost as the reason for that. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. And yeah, And yeah, yeah. So there's just anoxchange of your messages that someone else doesn't because you can just diversify andubscribe. And then as I experienced, something like that and it damaged away. It's not clear that that business model will still work so clearly. But of course, software developers still have to be paid for their time somehow. So how do we find a way of building sustainable software businesses for collaboration software, but without holding data hostage? I think that's a really deep and interesting question. Yeah, I think as an aside that you might call it the political economy of software is understudied and under-considered. And I would put in here the economics of software business, but also the interaction with things like regulation and governments and the huge amount of path dependence that's involved. I think that's just a huge deal. I think we're starting to realize it. But yeah, there's a ton of stuff we could do and think about just for local first. Just one kind of branch that I hope we get to explore is, we mentioned how local first enables you to do totally different topologies. So with cloud software, almost by definition, you have this hub and spoke model where everything goes through the central server and the central corporation. Well, with local first, you can very easily, for example, have a federal... where you dial into one of many synchronization nodes. And you could even have more like a mesh system where you request and incentivize packets to be forwarded through a mesh to their destination. Sort of like TCP IP networking, but for the application layer. And it may be, it's still kind of TBD, but it may be that a mesh or a distributed approach has totally different political implications from a centralized node. And that might become important. So I just think there's a lot to think about and do here. Yeah, I think so too. I would take email. It's an analogy maybe, which is a federated system, just like what you described. Like you send your email to your local SMTP server and it forwards it to the recipient's SMTP server. And the system works really well. It certainly has criticisms, like spam filtering is difficult in a really decentralized way. Maybe spam is not a problem that local first software will have as much because it's intended more like for collaboration between people who know each other rather than as a way of contacting people you don't know yet. But certainly like, I think, taking some inspiration from that federation and seeing how that can be applied to other domains, I think would be very interesting. Yeah. And this brings us to the topic of like industrialization and commercialization. And I feel like there's more promise than ever around local first and more people are excited about it. But I still feel like we're just in the beginning phases of getting the ball rolling on industrial and commercial applications. And if I'm being really honest, I feel like it might've been slower than I had initially hoped over the past few years. So I was curious if Adam and Martin, you would reflect on that. It's always hard to say, right? The thing with any technology, but certainly in my career in computing, this has always proven to be the case is that something seems right around the corner and it stays so for 20 years. I don't know, maybe VR in that category, but then there'll be a moment, suddenly it'll just be everywhere, broadband internet or something like that. So as people who are both trying to advance, trying to advance the state of the art, but also making business decisions, you know, should I start a company? Should I invest in a company? Should I keep working on the company I'm working on based on what technologies exist or where you see things going? Yeah. You're always trying to make accurate predictions. So yeah, I agree on one hand, it felt very close to me on the basis of the prototypes we'd built the auto merge library, the referenced Martin. I'll link that in the notes here, but basically that's a JavaScript implementation of something called CRDT. Which just, I guess as a sidebar, it could be easy to think that CRDTs and local first software are kind of one of the same because they're often mentioned together. And in fact, our paper talks about them together, but CRDTs are a technology we find incredibly promising for helping to deliver local first software, but local first is a set of principles. It doesn't require any particular technological solution. Yeah. Based on the strength of those prototypes, many of which worked really well, there's the networking side of it and whether you can have that be fully kind of decentralized versus needing more of a central coordination server. But once you get past that hump, it does work really, really well. But I think that comes back to the point you both made there about the economic model side of things, which is we have a whole software industry that's built around people will pay for software when there's a service connected to it, right? So SaaS in particular, B2B SaaS is just a fantastic business to be in. And as a result, we've seen a huge explosion of software around that, but connected to the. And I think that's a really good example of that is, for example, the freemium model exactly like what you mentioned with Slack, Google Docs is one of those, Notion is one of those, and they do this kind of free for individuals, but then you pay when you're a business and then you need to come up with the feature stuff, the kinds of features that seem to be selecting for you being a business with more serious needs and something like retaining your message history is there. I wrote a whole other shorter essay about paying for software. I'll link that in the notes. But I think that's a really good example of how the industry got into a weird corner. The industry got itself into a weird painted itself into a corner because things like Google giving you so much incredibly high quality software, Gmail, Google Docs, Google Maps, et cetera, for quote unquote free. But then how you're really paying for it is your attention and your data, right? And that being kind of monetizable through being able to essentially serve you ads. And I think that's fine. And I'm very glad for Google's existence. And I think that's a really good example of how it almost feels like then it taught people that good software should be free and that you shouldn't pay for. Maybe that's a little bit connected to the concept that software R&D basically costs nothing to make additional copies of it. So therefore, if you make this big upfront investment and then the software exists and you can duplicate it endlessly. I think there's a lot of things flawed about all of that. But the end place that gets you to is, okay, if someone has my data and I'm paying them to maintain it and run the servers that it's on, I can stomach that. And I think that's a really good example of how you can make a big upfront investment. And I think that's a really good example of how you can duplicate it endlessly. But I think there's a lot of things flawed about all of that. Okay, now I'll pay $5 a month, $10 a month for my Dropbox account or something like that. But other than that, we've become accustomed to, oh, if it's an app on the App Store, the App Store is a good example of these kinds of consumer economics. We just expect it to be vastly lower cost or free. And good software costs money to make. And as we kind of talked about earlier, I would rather be building the software, not maintaining the infrastructure. But when you set it up so that the only way you can make money is to build a software, you're not going to be able to make money. But when you set it up so that the only way you can make money is to build a software, you're not going to be able to make money. And I think that's a really good example of how you can make money. Sometimes you open your first location and you set up big,BNB account and you build software that has infrastructure. Oftentimes you open your first location and you set up big,BNB account and you build software that has infrastructure. You're actually incentivized. Build that back end as soon as you can and get the user's data in there and not necessarily hold it hostage. but just take ownership of it because that's what people will pay for. because that's what people will pay for. They won't pay for software where they own the data themselves. Yeah, one thing that a friend has suggested is that when talking about the business model of local first software, They won't pay for software where they own the data themselves. we should just call it SaaS , like, label it as SaaS, market it in the affirmative, don't think about any other software because it's cheap, They won't pay for software where they own the data themselves. SaaS, market it in exactly the same way as SaaS. Don't even tell people that it's local first software and just use the fact that it's a lot cheaper and easier to implement local first software and use that for your own benefit in order to build the software more cheaply. But don't actually market the local first aspect. And I thought that's quite an interesting idea because it is an idea that people are accustomed to. And to be honest, I think the amount of piracy that you would get from people like ripping out the syncing infrastructure and putting it with something else and then continuing to use the app without paying for it, it's probably pretty limited. So you probably only need to put in a very modest hurdle there of saying, okay, this is the point at which you pay, regardless of whether that point for payment is necessarily enforced in the infrastructure. It might just be an if statement in your client-side app and maybe that's fine. Yeah, well, Muse is basically an example of that. We have this membership model. It is subscription. It's your only option. And there are a lot of folks that complain about that or take issue with it. And I think there are many valid complaints you can make. But I think in many cases, it is just a matter of what folks are accustomed to. And we want to be building and delivering great software that improves and changes over time and maps to the changing world around it. And that's something where as long as you're getting value, you pay for it and you're not getting value anymore. You don't have to pay anymore. And then a model like that basically works best for everyone, we think. Again, not everyone agrees. But then again, you do get this pushback of we are running a small service, but it's not super critical to the application. But maybe that would be a good moment to speak briefly about the explorations we're doing on the local first sync side, Mark. Yeah, so right now, Muse is basically a local-only app. Like it's a traditional desktop app where files are just saved to the local device and that's about it. And you can manually move bundles across devices. But otherwise, it just runs locally. And the idea is to extend Muse with first syncing across your devices and then eventually collaboration across users using a local first approach. Now, we don't plan to do, at least initially, the kind of fully distributed mesh networking peer-to-peer thing. It will be a sync service provided by Muse and kind of baked in to the app. But it will have all those nice local first properties of it works offline, it's very fast, all the different nodes are first class and so forth. While eventually supporting syncing and collaboration. So yeah, we're going through this journey of, we had a lot of experience with basic prototypes in a lab, but there's a big jump to have a commercialized and industrialized product. Not just in terms of charging for a new business model and stuff, but in terms of the performance and just like all the weird things that you deal with in the real world, like versioning and schemas and the idiosyncrasies of networking and all the things that you deal with. And so it's a lot of fun. And I think it's a lot of fun. And I think it's a lot of fun. And I think it's a lot of fun. And I think it's a lot of fun. And I think it's a lot of fun. It's already hosted for them or they can spin up their own server and make that path super easy and straightforward. And that's kind of where my research is focusing, trying to get the technologies to that point. So right now, we have some basic implementations of this stuff. So AutoMerge is a library that does this kind of data synchronization. It includes a JSON-like data model that you can use to store the state of your application. It has a sort of basic network protocol that can be used to synchronize your application. It can be used to sync up two nodes. But there's so much more work to be done on making the performance really good. Like at the moment, it's definitely not very good. We're making progress with that, but it's still a long way to go. Making the data sync protocol efficient over all sorts of different types of network link in different scenarios. Making it work well if you have large numbers of files, for example, not just a single file and so on. And so there's a ton of work still to be done there on the technical side. I think before this is really, in a state where people can just pick up the open source library and run with it. Part of it is also like just getting the APIs right, making sure it has support across all of the platforms. Just having a JavaScript implementation is fine for initial prototypes, but obviously iOS apps are written in Swift and Android apps will be written in Kotlin or whatever people use. And so you need to have support across all of the commonly used platforms. And we're gradually getting there, but it's a ton of work. And conceptually, seeing how AutoMerge is evolving and how people are trying to use it, sometimes very successfully, sometimes less so. But I see this as a case of technology transfer, which is an area I'm incredibly interested in because I think it's kind of a big unsolved problem in HCI research, computer science. Honestly, maybe all research, but I'll stick to my lane in terms of what I know, which is there is often this very excellent cutting edge, research that does sit in the lab, so to speak, and never graduates or it's very hard or there isn't a good path often for it to jump over that hump into what's needed in the production world. And of course, in the research world, you're trying to do something new and different and push the boundaries of what was possible before. And in the production commercial side, you want to choose boring technologies and do things that are really reliable and known and stable. And those two, there's often a bridge that's hard to cross. It's hard to divide there. Sitting in your seat as, again, someone who's enmeshed in the academic world right now and you're creating this library, you know, it started as a, called a proof of concept, for lack of a better term. And then you have customers, if that's the right word to put it. But as an academic, you shouldn't have customers, but you sort of do because people want to use this library and in fact are for their startups and things like that. How do you see that transition happening? Or is there a good role model you've seen elsewhere? Or just kind of figure it out as you go? Well, I think where we're trying to go with this is it's great for AutoMerge to have users. I don't think of them as customers. I don't care about getting any money from them. But I do care about getting bug reports from them and experience reports of how they're getting on with the APIs and reports of performance problems and so on. And those things are all tremendously valuable because they actually feed back into the research process. And so I'm essentially using the OpenAI to open source users and contributors as a source of research problems. So with my research hat on, this is great because I have essentially here right in front of me a goldmine of interesting research problems to work on. I just take like the top issue that people are complaining about on GitHub, have a think about how we might solve that. And often there's enough of a nugget of research problem in there that when we solved a problem, we can write a paper about it. It can be an academic contribution, as well as moving forward. And so I think that's a really good way to start moving the open source ecosystem gradually towards a point where we've ironed out all of those major technical problems and hopefully made something that is more usable in production. So I actually feel those worlds are pretty compatible at the moment. There are some things which are a bit harder to make compatible, like sort of the basic work of porting stuff to new languages or new platforms. That's necessary for real life software engineering, but there's no interesting research to be done there, to be honest. But so far I've found that quite a lot of the problems that we have run into actually do have interesting research that needs to be done in order to solve them. And as such, I think they're quite well compatible at the moment. I like imagining the mental picture of someone submits a bug report and one year later you come back and say, here's the fix and also the paper we published about it. I've literally had cases where somebody turns up on Slack and says, I found this problem here. What about it? And I said, oh yeah, I wrote a paper about it. And the paper has a potential algorithm for fixing it, but I haven't implemented it yet, sorry. And they go like, WTF? What? You put all of this thought into it. You've written a paper and you haven't implemented it? And I go, well, actually, sorry for me, that's easier because if I want to implement it, I have to put in all of the thoughts and convince myself that it's correct. And then I also have to implement it. And then I also have to write all the tests for it. And then I have to make sure that it doesn't break other features. And it doesn't break the APIs. And I need to come up with good APIs for it and so on. So for me, actually, like implementing it is a lot more work than just doing the research in a sense. But actually, doing the research and implementing it can be a really useful part of making sure that we've understood it properly from a research point of view. So that at the end, what we write in the paper ends up being correct. In this particular case, actually, it turned out that the algorithm I had written down in the paper was wrong because I just hadn't thought about it. I hadn't thought about it deeply enough. And a student in India emailed me to say, hey, there's a bug in your algorithm. And I said, yeah, you're right. There's a bug in our algorithm. We better fix it. And so probably through implementing that, maybe I would have found the bug, maybe not. But I think this just shows that it is hard getting this stuff right. But the engagement with the open source community, I found a very valuable way of both working towards a good product but also doing interesting research. I think it's also useful to think of this in terms of the research. And development frame. So research is coming up with the core insights, the basic ideas, those universal truths to unlock new potential in the world. And it's my opinion that with Local First, there's a huge amount of development that is needed. And that's a lot of what we're doing with Muse. So analogy I might use is like a car and an internal combustion engine. If you came up with the idea of an internal combustion engine, that's amazing. It's pretty obvious that that should be world changing. You can spin this shaft at 5,000 miles an hour. You can spin this shaft at 5,000 RPM with 300 horsepower. It's amazing. But you're really not there yet. You need to invent suspension and transmission and cooling. And it's kind of not obvious how much work that's going to be until you go to actually build a car and run it at 100 miles an hour. So I think there's a lot of work that is still yet to be done on that front. And eventually that kind of does boil down or emit research ideas and bug reports and things like that. But there's also kind of its own whole thing. And there's a lot to do there. There's also the continuous analogy. I think once the research and the development examples get far enough along, you should have some unanticipated applications of the original technology. So this should be someone saying, like, what if we made an airplane with an internal combustion engine? I don't think we've quite seen that with Local First. But I think we will once it's more accessible. Because right now to use Local First, you've got to be basically a world expert on Local First stuff to even have a shot. But once it's packaged enough and people see enough examples in real life, they should be able to more easily come up with their own new wild stuff. So I think that's a big thing. And I think that's a big thing. And I think that's a big thing. And I think that's a big thing. And I think that's a big thing. And I think that's a big thing. And I think that's a big thing. Yeah, we have seen some interesting examples of people using our software in unexpected ways. One that I like is The Washington Post as in the newspaper, everyone knows they have an internal system for allowing several editors to update the layout of the home page. So the sagte thing is that when one of our editors came up with the idea of changing the set, allergies again. Yeah. paycheck. placement of which article goes where, with which headline, with which image, in which font size, in which column. All of that is set manually, of course, by editors. And they adopted AutoMerge as a way of building the collaboration tool that helps them manage this homepage. Now, this is not really a case that needs local first, particularly, because it's not like one editor is going to spend a huge amount of time editing offline and then sharing their edits to the homepage. But what I did want is a process whereby multiple editors can each be responsible for a section of the homepage, and they can propose changes to their section, and then hand those changes over to somebody else who's going to review them and maybe approve them or maybe decline them. And so what they need, essentially, is this process of version control, Git-style version control, almost, but for the structure representing the homepage. And they want the ability for several people to update that independently. And that's not because people are working offline, but because people are using, essentially, branches using the Git metaphor. So different editors will be working on their own local branch until they've got it right, and then they'll hit a button where they say, okay, send this to another editor for approval. And that I found really interesting. It's sort of using the same basic technologies that we've developed with CRDTs, tracking the changes to these data structures, being able to automatically merge changes made by different users, but applying it in sort of a different way, in sort of this interesting, unexpected context. And I hope, like, as these tools mature, we will expand the set of applications for which they can be sensibly used. And in that expansion, we will then also see more interesting, unexpected applications where people start doing things that we haven't anticipated. Maybe this reflects my commercial world bias, or maybe I'm just a simple man, but I like to see something working, more than I like to read a proof that it works. And both are extremely important, right? So the engineering approach to seeing if something works is you write a script, you know, fuzz testing, right? You try a million different permutations, and if it all seemed to work, kind of the Monte Carlo simulation test of something. And it seems to work in all the cases you can find, so seems like it's working. And then there's, I think, the more proof style in the sense of mathematical proof of here is an airtight, logical, deductive reasoning case or mathematical case that shows that it works in all scenarios, that it's not a Monte Carlo calculation of the area under the curve, it's calculus to determine precisely to infinite resolution area to the curve. And I think they both have their place kind of to Mark's point, you need to both kind of conceptually come up with the combustion engine and then build one and then all the things that is going to go with that. And I think we all have our contributions to make. I think I probably much as I like the research world at some point when there's an idea that truly excites me enough, and local first broadly and CRDT specifically are in this category and I want to see it, I want to try it, I want to see how it feels. In fact, that was our first project together, Martin was, we did this sort of Trello clone, essentially, that was local first software and could basically merge together of two people worked offline and had a little bit of a version history. I did a little demo video. I'll link that in the show notes. But for me, it was really exciting to see that working. And I think maybe your reaction was a bit of a like, well, of course, you know, we have five years of research. Look at all these papers that prove that it would work. But I want to see it working and moreover feel what it will be like, because I had this hunch that it would feel really great for the end user to have that agency. But seeing it slash experiencing it for me, that drives it home and creates. I think it's a really important part of the process. I think that it's a really important part of the process. And so I think that some of the things that we've done that we've done on the internet and in the real world, the internal motivation far more than the thought experiments in the right word, the conceptual realm work, even though I know that there's no way we could have built that prototype without all that deep thinking and hard work that went into the science that led up to it. Yeah. And it's totally amazing to see something like working for the first time and it's very hard to anticipate how something's going to feel, as you said. like you can sort of rationalize about its pros and cons and things like that. But that's still not quite the same. Right. Yeah. That's interesting. the same thing as the actual first-hand experience of really using the software. All right, so local first, the paper and the concept, I think we're pretty happy with the impact that it made, how it's changed a lot of industry discussion. And furthermore, that while the technology maybe is not as far along as we'd like, it has come a long way, and we're starting to see it make its way into more real-world applications, including Muse in the very near future. But I guess looking forward to the future for either that kind of general movement or the technology, what do you both hope to see in the coming, say, next two years, or even further out? Well, the basic thing that I'd like to see is this development of the core idea and see it successfully applied in commercial and industrial settings. Like I said, I think there's a lot of work to do there, and some people have started, but I'd like to see that really land. And then assuming we're able to get the basic development landed, a particular direction, I'm really excited about is non-centralized topologies. I just think that's going to become very important, and that's a unique potential of local-first software. So things like federated syncing services, mesh topologies, end-to-end encryption, generalized sync services like we talked about, I'm really excited to see those get developed and explored. Yeah, those are all exciting topics. For me, one thing that I don't really have a good answer to, but which seems very interesting, is what does the relationship between apps and the operating system look like in future? Because right now, we're still essentially using the same 1970 Unix abstraction of we have a hierarchical file system. A file is a sequence of bytes. That's it. A file has a name, and the content has no further structure other than being a sequence of bytes. But if you want to allow several users to edit a file at the same time and then merge those things, together again, you need more knowledge about the structure of what's inside the file. You can't just do that with an opaque sequence of bytes. And I see RDTs as essentially providing a sort of general-purpose, higher-level file format that apps can use to express and represent the data that they want to have, just like JSON and XML are general-purpose data representations. And CRDTs further refine this by not just capturing the current state, but also capturing all the changes that were made to the state. And thereby, they much better encapsulate what was the intent of the user when they made a certain change, and then capturing those intents of the user through the operations they perform that then allows different users' changes to be merged in a sensible way. And I feel like this idea really changes the abstractions that operating systems should provide, because maybe OSes should not just be providing this model of files as a sequence of bytes, but this higher-level file system. And I think that's a really important point that we need to be thinking about as we're looking at this new level CRDT-like model, and how does that impact the entire way how software is developed? I think there's a potential for just rethinking a lot of the stack that has built up a huge amount of cruft over the past decades, and potential to really simplify and make things more powerful at the same time. Yeah, local first file system, to me, is kind of the end state. And maybe that's not quite a file system in the sense of how we think about it today, but a persistence layer that has certainly these concepts baked into it, but I think also just reflects the changing user expectations. People want Google Docs and Notion and Figma, and they expect that their email and calendar will seamlessly sync across all their devices, and then you have other collaborators in the mix. So your files go from being these pretty static things on the disk. You know, you press Command-S or Control-S, and it'll just be a little bit more static. And every once in a while, it does a binary dump of your work that you can load later, and instead it becomes a continuous stream of changes coming from a lot of different sources. They come from, I've got my phone and my computer, and Mark's got his tablet and his phone, and Martin, you've got your computer, and we're all contributing to a document, and those changes are all streaming together and need to be coalesced and made sense of. And I think that's the place where, for example, Dropbox, much as I love it, or iCloud, which I think in a lot of ways is a really good direction, but both of those are essentially dead ends because they just take the classic static binary file and put it on the network, which is good, but it only takes you so far because, again, people want Google Docs. That's just the end of it. And that puts every single company that's going to build an application of this sort, they have to build the kind of infrastructure necessary to do that. And we've seen where, I think, Figma's the most dramatic example. They just took Sketch, and ported it to kind of a real-time, collaborative, web-first environment. And the word just there is carrying a lot of weight because, in fact, it's this huge engineering project, and they need to raise a bunch of venture capital, but then once you had it, it was so incredibly valuable to have that collaboration. And then, of course, they built far beyond the initial, let's just do Sketch on the web. But any company that wants to do something like that, and increasingly that's, if not table stakes, from user expectation standpoint, they're going to be able to do that. But you have to do that same thing. You've got to drop tens of millions of dollars on big teams to do that. And it seems strange when I think many and most, at least productivity applications, want something similar to that. So if that were built into the operating system the same way that a file system is, or we propose this idea of a Firebase-style thing for local first and CRDTs, which could be maybe more developer infrastructure, maybe that's all we need. But it's also what you guys were speaking about earlier with AWS could run a generic sync service. I don't know exactly what the interface looks like. It's more of a developer thing or an end user thing. But basically every single application needs this. And the fact that it is a huge endeavor that costs so much money and requires really top talent to do at all, let alone continue running over time and scale up, just seems like a mismatch with what the world needs and wants right now. Yeah, and now I want to riff on this, Adam, because the strongest version of that vision is not only are all these apps using a local first file system, they're all using the same one, in the same way that now for our legacy apps, all your files from different applications are written to the same disk in the same way. And furthermore, any application can access and read and write any other data. So you sort of disconnect the data from the application and you can end to end them on top of each other. And then this gets to the final thing here, which is one of those programs could be programs that users write. So you sort of have end user programming against real time synced and collaborative data. And not only is that cool because end user programming is interesting, but programming against data doesn't really work when it's halfway around the world. Like it just kind of physically, if you need to navigate the data or follow links, it's just too slow. You need all the data locally, which is indeed the whole promise of what we're talking about here. Well, not to mention maybe the fact that, you know, the auth token dance and whatever, you got to register your application. It just comes back to this. Yeah, end user programming is 100% about agency, which as we said in the start is kind of at the core of local first. And yeah, it's gotten increasingly harder to program your own stuff for a bunch of reasons. But one is, yeah, the data is way over there and in the care of this company, and they give you their one front end to it. If you're very lucky, they'll build an API. And if you're even luckier, they'll let you allocate an API token as an individual, not a company, to just write a little script to do something. Whereas I did a lot more automation in my personal life back when so much was just a Unix shell and a file system on my local computer and a world where you can write not so much scripts, but I think of them more as bots. Yeah. I think we even prototyped this at the very tail end of that Trello clone project, which is we said now that you've got this stream of changes that you're consuming from different places, the bot could be just one more of these. And if you want to do something like automatically moving a card to a new location when something is triggered, that should be straightforward to do. And some world like that where you have now these streams of events, streams of data that are being coalesced that includes not just the devices of all the people, but also the individual programs that you may choose to write that sort of contribute to this whole evolving document. That's a very exciting future for me. Yes. And if we can get to that point where it's so easy to write the collaborative software that you can just have software be collaborative by default and so easy to have this streaming integration with bots that you just do it by default, then we're in a situation where this can actually be used in practical reality. Well, I think we should wrap it there. Thanks, everyone, for listening. If you have feedback, write us on Twitter at Twitter.com. We're also on Facebook at MuseApp HQ. And we're on email, hello at MuseApp.com. You can help us out with a review on Apple Podcasts. And Martin, I'm so glad that you're pushing this vision of the world forward, even though we're not working together as directly right at the moment. We hope that our efforts over here to try to prove local first in a commercial context, both that it can be viable for a small team, but also produces a great user experience. That's, at least for now, a great contribution and you're continuing to push the state of the art in the science world. Hopefully we can together and along with all the other folks who are doing great work in this field, see reconvene maybe in two years and have some good news to report. Definitely. Thanks for having me. Thank you.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment