Transcript
Merrill Lutsky (00:00): LLMs are great at generating code. They're also really, really good now at reviewing it and understanding it and improving it.
Daniel Darling (00:19): Welcome to the Five Year Frontier Podcast, a preview of the future through the eyes of the innovators shaping our world. Through short, insight-packed discussions, I seek to bring you a glimpse what a key industry could look like five years out. I'm your host Daniel Darling of Venture Capitalists at Focal, where I spend my days with founders at the very start of their journey to transform an industry. The best have a distinct vision of what's to come, a guiding North Star they're building towards and that's what I'm here to share with you.
Today's episode is about the future of coding. We cover multiplying engineering output, vibe coding bottlenecks, agents as reviewer, AI roll-ups and the future of developing software. Our guide will be Merill Lutsky, CEO of Graphite. Bringing AI acceleration and automation to Code Review. Founded in 2020 out of New York, Graphite has become a key part of the developer ecosystem.
Daniel Darling (01:06): As more code is generated with AI, they enable developers to scale the evaluation, testing, and review process before it is released. A growing bottleneck that has become incredibly important. This startup has raised over 70 million from leading VCs such as Accel, A16Z, Menlo Ventures, as well as receiving strategic investment from model provider Anthropic. Last year, Graphite grew its revenue 20-fold and is trusted by over 45,000 developers at top engineering organizations such as Shopify and Figma. His second startup, Merrill, has helped develop to manage software products from high-output engineering companies such as Square, Oscar Insurance and Self Made. He holds a degree in Applied Maths and Economics from Harvard. Hi Merrill, great to see you. Thanks for coming on to chat with me.
Merrill Lutsky (01:48): Thanks for having me Daniel.
Daniel Darling (01:49): You're building Graphite in one of the hottest spaces of AI, which is really about increasing the productivity of developers writing code. Can you help us quantify just how much more productive an AI-enabled developer is compared to a peer who isn't taking advantage of all these AI tools?
Merrill Lutsky (02:11): That's a great question. I think I'll caveat this by saying that I think we've only begun to scratch the surface of how much more productive developers can be with AI. One of our beliefs at Graphite is that a lot of the bottleneck at large companies is not how quickly you can write code, but actually how quickly you can get it through what we call the outer loop of the development process. So code review, testing, deployments, everything that happens once you put that PR up. That's often what's slowest to get through versus the time to actually write code. But even with that, I think we're seeing typically anywhere from 10 to 30% more pull requests and more volume of code that teams that are using tools like Cursor and Windsurf are creating today. There have been some studies as well that look at just how much more quickly engineers can solve tasks with these tools. And it's about those studies,
Merrill Lutsky (03:04): there was like a McKinsey one that was like 20 to 45%. There was like GitHub did one and I think MIT did one as well that found it was about, engineers are about 2x faster on simple tasks when they're using AI. So it really does. And even just anecdotally using it ourselves here, we find that for a lot of the simpler tasks, it really improves how quickly we can just get the code up and running and achieve what we want to achieve. So it's pretty undeniable that these tools are incredibly powerful for helping developers be more productive. I think the question now is just how do we adapt the rest of the process to kind of keep up with them. And there's a gap between that.
Merrill Lutsky (03:46): You're 2x faster, but you're only producing 10 to 20 to 30% more code. Where is that delta there? It to me indicates that there's still so much untapped potential there. And then as the models get better, I expect that gap will even grow even bigger until we figure out the rest of the story.
Daniel Darling (04:04): Yeah, and I think that's a great framework because really we've never been more enabled than ever, both developers and even non-developers to write code. And you've got this whole movement around vibe coding or copilot assisted code generation. And so that's producing a huge amount of code in the market. But while your peers like Cursor and Windsurf are really helping push that part forward in terms of generating code, you're really focused on, okay, then you need to check the code, then you need to assess it, then you need to make sure that it's production-ready, as that bottleneck. Maybe you can give us a little bit about how you're doing that and how you see that as the bottleneck.
Merrill Lutsky (04:42): Our belief here has been that code review is always. Code review has always been one of the most time-intensive and just painful parts of being a software engineer. LLMs are great at generating code. They're also really, really good now at reviewing it and understanding it and improving it. And we saw a great opportunity I guess starting from the great enterprise customers that we were already working with and with enterprise and a bunch of fast-growing startup kind of take that same technology and apply it to code review and help them accelerate their process. And now that's actually been. So we've had such a good reception to that that we've kind of split that off into its own product that you can buy either with or independent of the code review platform we were building previously. The way that it works in practice is that code review agent is called Diamond.
Merrill Lutsky (05:32): You just add the Diamond GitHub app to your repo and pretty much immediately it'll start reviewing every single code change that your engineers are generating. If one of them puts up a PR within a few seconds, they'll get high-quality, high signal feedback from Diamond. It's scanning for bugs, it's for security vulnerabilities, performance improvements, style guide inconsistencies with your repo and you can even give it custom rules in English or you can just dump your code base style guide or your cursor rules into Diamond and have it just enforce those same rules and guidelines and code review. You can think of it like it's taking the time from typically engineers will wait three to four hours, sometimes longer on average for that first round of feedback from a peer on a PR. With Diamond now we bring that down to a few seconds and the magic of that is that you can then as the author, you can go and resolve all those nits and small comments within a few minutes and then by the time a human, your colleague looks at it, it's pretty much ready to go. And most of the time it just gets approved first try and you don't have to go back. You save at least one round of that back and forth that normally happens in code review.
Daniel Darling (06:47): It almost feels a little bit real time in that aspect. Do you anticipate Diamond or your agent eventually reviewing code as you write it and sort of condensing or collapsing that whole process?
Merrill Lutsky (06:59): Yeah, it's a good question. I think there is a period of time where I think it will be helpful and we've thought about offering like a local Diamond experience where as your writing changes and we already have like our VS Code extension and CLI for like creating pull requests and submitting them. But we've thought about offering real-time feedback as your as you're coding. That's something that we think would be pretty interesting. One of the reasons we're hesitant to really commit to that is that our belief is that in the not so distant future, the inner loop and the local development experience as we know it today kind of goes away. And that as you have agents that get better and better at taking tasks and just generating the code for them, most of that compute is going to be happening remotely. It's not going to be on a developer's laptop in an IDE anymore. So I think that there will be this like modality shift that happens in the next couple of years where most of the development goes from happening locally to happening remotely and being primarily agent-driven.
Merrill Lutsky (07:58): And if we bet on that world, I think then the pull request itself becomes the most interesting space and becomes the place. Today we say that Graphite is the second most important tool in the developer's workflow, at least in terms of amount of time spent per day. Mostly they're in their IDE. Over the next few years you'll see that pendulum start to swing towards the PR being the primary place where engineering is done, rather than it being like in the IDE.
Daniel Darling (08:25): Fascinating. And when you think about that and the review process that you're looking at at Graphite, how much of that will still be reliant on having the human in the loop or what parts of that will be automated and which parts will definitely be consistently in the hands of a developer?
Merrill Lutsky (08:42): I think our belief here is that we'll still have human developers in the loop for the foreseeable future on building things. I think until you get to the point where large enterprises are comfortable giving an AI full responsibility for the end user experience, all the legal implications of that experience, it's going to be very hard for them to be comfortable saying, okay, agent, go build this thing and ship it to production without any oversight. I think you still want somebody who is approving, who's reviewing this, approving it, making sure that at least taking responsibility and accountability for it when it actually goes out. Instead we're focused more on on high-level architecture, user experience, functionality. How does this feel? How is this architected? Higher-level and higher order questions that I think are better for humans to think about and are just more interesting for us. It makes code review feel less tedious than having to really carefully scan line by line and make sure that the logic is implemented exactly right. Because now we have this second layer of check to make sure that those pieces, the fundamentals are there and then allows us to focus our attention on the things that humans are still best at.
Daniel Darling (09:57): And that's a really interesting challenge for you guys on the trust factor is at what point do they completely trust the system to do that kind of review. And I'm not sure if you have a gauge of where you sit in terms of being fully trusted and saying okay, if Diamond gives it the okay, we can ship it. And also, how are the larger organizations, the enterprises sort of structuring themselves to trust this type of system or put the safeguards in place to trust this system?
Merrill Lutsky (10:25): Much as is the case with any of these technologies, it's the small, fast-moving, more early-adopting startups that are the ones that are kind of leaning the most heavily into fully agentic review. Enterprises are still very much want to have a human in the loop and that is likely the way that this will stay for some time. I just think that you'll see more and more of the workload shift from human reviewers to Diamond.
Daniel Darling (10:48): Given this kind of landscape you're putting with developers, do you see more developers being needed? They've always been a bottleneck, wanting to have more developers into the market. But if the code generation itself is being largely agentic and then you start to have more and more of the review process requiring sort of high level understanding of the architecture and design, do you think that it's more developers need coming on the market or do you think a different, higher order level of developers are needed on the market or how do you view that?
Merrill Lutsky (11:18): My view of it is that we'll likely have fewer of what we call like software engineers and developers, like professionally today. I think we'll see there's kind of this meme right now of like the, you know, single person building billion dollar company, which I think is a bit far fetched. But I do think that you are starting to see teams that need fewer engineers to achieve great outcomes. And I think that trend is likely to continue as the models get better and as agents can write more and more code. You'll have an effect where the remaining professional software engineers are going to be those that are able to have a really deep understanding of architecture, that are really good at working with AI and also that have, in a world where anyone create just about anything in minutes or hours or days, then it's much more a matter of what are you creating, how does it have brand or network effects or other attributes to it that make it stand out from everything else if it can just be generated so easily. So I think that both those engineers with deeper fundamental understanding and then also with deeper cross-disciplinary, broader cross-disciplinary skills will be the ones that remain or that are the most valuable in this agentic world.
Merrill Lutsky (12:31): The flip side of that though is that although there will be less software engineers by trade, I think then everyone will become more of a software engineer and every job will become more software engineering like or will increasingly involve creation of software, interacting with or customizing software. And the barrier to that is being lowered every single day. And I think that's more what the future looks like where you're doing something in a vertical but everything increasingly becomes software eats the world. But it's not. Software engineering doesn't quite eat the world. It's like every job just becomes a bit more software engineering like.
Daniel Darling (13:05): Yeah, you're more enabled to create software in any function exactly within an organization from there. But getting specific on a developer, like drop us into 2030. What would their workflow look like from the moment that they have an idea for a feature to shipping it?
Merrill Lutsky (13:24): Most of the ideas are actually being developed or summarized by AI at that point, but certainly for some, for humans too. You'll start by much like what we do today, defining a spec for it. Like what exactly do you want to build, what's the user need that you're trying to solve, what needs to change in order to make this happen, and what are the constraints that you can't change in making this update or adding this feature. You'll then kick off a set of agents, probably one agent, to orchestrate many others to go and actually generate the code, implement the change. You'll then get back once that's ready, you'll get back the pull request and then a sandbox environment where you can play around with it, make sure that it's doing what you want. You'll be able to then chat with it, iterate on it, give it feedback through, through a variety of different mechanisms to indicate is this what I want or not? What do I want to improve about this? And you'll kind of go much like what you have with I think Replit agent and similar tools are kind of a toy preview of what this will look like, where you'll be iterating on something, you'll have different versions of it, you'll update it, and then eventually you'll get to something that's ready to ship. And the agent will then also handle deployments, everything, roll out everything else that's required in that pipeline for you.
Daniel Darling (14:38): I love that. And it really feels like a huge unlock in terms of the quality of what you can produce as well. Do you think that that will speak to a far more iterative sort of software product as a result, in terms of being able to be kind of adaptive to user experience. More tailored, more personalized, more versions of it?
Merrill Lutsky (14:59): Yeah, absolutely. Increase the iteration speed, reduce the iteration cycle time. I think that just lends itself a lot better to do shipping faster. It also means that you can more complexity, more customization.
Daniel Darling (15:12): And that's a fascinating train of thought because you're almost then giving the customer or the user the power to create the software rather than the organization at that point. Do you think it'll be more of a marriage between those two or how would that start to evolve where it starts to adapt to a user's request?
Merrill Lutsky (15:31): I mean obviously you need a lot of guardrails around something like that, right? Like you can. Having a pathway where a user can, can like modify production data or something is a very, you know, that's like a pretty glaring attack vector. But I do think that within, you know, I don't think we've yet figured out like how that exactly like how to constrain that. But I think there will be like bounded cases where you know, for simple, for like, you know, simple extensions of something or like integrations or you know, being able to talk to like local resources or other tools. It'll become a lot easier for like individuals to or like end users to like modify the software itself and interact with it.
Daniel Darling (16:10): And that often speaks to what's banded around quite a bit, which is that English has become the primary coding language for users, which is incredibly enabling and democratizing a lot of the access to software development. Some people explain that a Google search experience, instead of returning results, will actually return applications, products and develop things on the fly. Do you agree with that kind of future vision and how does it map out?
Merrill Lutsky (16:36): I think we're already seeing this with the move over to ChatGPT and others where the format of the output varies greatly based on what the input is or what the query is. And you can kind of adapt to how you're. When you're not limited by like, okay, this is just a list of search results. You can be a lot more tailored to what is the best way to present this information. And I think that that example of like, I think having more adaptable UI in responses just gives you a lot more power and a lot more, you know, helps it be like a much more useful and that you're not constrained by one modality of response anymore.
Daniel Darling (17:12): Is there an unlock that you're starting to look for in the foundation models that would start to kick this into gear or anything on the horizon that you think is coming that maybe isn't on the radar yet of the rest of us?
Merrill Lutsky (17:24): I think that the biggest, for me the biggest gains are going to be just inference costs coming down, the speed of inference increasing. Those are the things that I'm most excited about, along with just deeper understanding of certain language. We see right now that the models are better at some languages versus others and there are certain quirks of it that it's not as well adapted to. And I think my bet for the larger model provider space is that increasingly they're going to have to focus on specific use cases. And I think that Anthropic is pretty clearly focused on enterprise, code understanding, code generation. I think you have OpenAI that's gone more for, at least historically has gone more for like consumer.
Daniel Darling (18:09): Do you think that there'll be more kind of roll ups like we saw or are seeing with OpenAI's $3 billion purchase of Windsurf and start to gobble up more of these kind of code generation and code review companies?
Merrill Lutsky (18:21): I'm starting to think so. It's pretty clear now that having access to like having a good flywheel of training data is really important and having like the human feedback on what is the model suggesting versus what then gets accepted. That's something that's just critical to building a high quality code generation and understanding model. And I think OpenAI buying Windsurf is a great example of just them trying to get closer to that data on how useful are the suggestions that they're generating and how can they own more of that? I also think it's an indicator that there's a ton of value in the app layer. There's this constant debate back and forth of is the value accumulated to the base model layer or to the app layer and pretty definitively. This is a signal that there is a ton of value. While there's obviously a ton of value in the base model layer, there's also a ton of value in the app layer to be had. And I think the fact that they're willing to pay such a premium for Windsurf freely shows both of those things to me.
Daniel Darling (19:24): Absolutely. Yeah. And super exciting to see and hopefully we start to see more of it. Want to just talk a little bit about graphite because you're growing incredibly fast. You know you grew 20x last year in terms of the amount of features shipped and how do you operate a company at that speed?
Merrill Lutsky (19:42): Yeah, it was 20x on revenue. We're all in-person, working from the office, five days a week in New York. It's not for every team, but I think for us that's been a pretty core part of our culture and how we collaborate. It reduces the barrier to talking through a problem. We could just hop in a conference room, get on a whiteboard and map something out. And having that type of culture has just enabled us to move really quickly. We also have a very I'd say that we have pretty tight iteration cycles.
Merrill Lutsky (20:08): So we run on two-week cycles for building things. So we want it to be long enough where you can build a meaningful feature and if it's longer than. If it takes more than that, you probably should break it up. The idea is that you should then have a smaller piece that you can ship in two weeks and then have another milestone the cycle after that. We also then have a lot of this is about sharing internally, testing things internally. We have a weekly show and tell on Fridays where everyone kind of shares what they've been building that we also heavily dog food our own product. So we're using Graphite every day. We're having Diamond review every single PR.
Merrill Lutsky (20:51): We very much feel both the highs and lows of our product as we're building it. And I think that having that really tight feedback loop and building something that we use every day to build it is really valuable for us in that sense. Last thing I'll say there is obviously we've leaned pretty heavily into AI tooling kind of across the company. So pretty much everyone is using Cursor, Windsurf, Claude code as well. We see a lot of our code is now being written or assisted by AI code generation. We've also used a lot for migrations, for helping with some of our copywriting, reviewing things. We try to look for opportunities. I think there's some companies that super lean in on AI.
Merrill Lutsky (21:39): Everything is done by AI in some capacity and I think we're not quite on that end of the spectrum, but I think that we are leaning pretty heavily into it and just looking for ways that we can get a lot of efficiency from it without being complete AI maxi on that side.
Daniel Darling (21:56): Complete AI maxi yet, maybe. Is there a department internally outside of the engineering department where you've been able to automate far more quickly and more successfully that surprise you?
Merrill Lutsky (22:08): I'd say on the go-to-market side we've had a lot of gains there. We're good friends with the team at Clay, we use their tools, they use Graphite. So that's been kind of cool to see what you can do in terms of enrichment and customization for outbound campaigns and understanding who's visiting Graphite who's interacting with it. That piece has helped a lot. A lot of our use of AI has been primarily engineering though, so. So we recently migrated our docs to Mintlify and that migration was all done by AI.
Merrill Lutsky (22:42): Another fun one that we built internally was we have like, along with Show and Tell, we have a weekly summary of like every change that was made and like what were kind of the main projects and themes that we were working on. And that's all like AI-generated. So finding I think it's like finding little things like that even I'd say in the product like having like in Graphite. If you use it to create pull requests will auto-generate like the description and title of the PR for you. We'll suggest fixes to CI when it fails. Just little things like that that are starting to automate more and more of the pull request life cycle. I think add up obviously Diamond, that's our biggest bet that we've made and our biggest value add on the AI side. But we're also looking for those tiny wins that we can get with AI along the way that fix some of the thousand cuts that exist today
Merrill Lutsky (23:32): in the pull requests cycle.
Daniel Darling (23:33): It's a great example about how enabled an organization or a startup can be a fast-growing startup with AI. So if you had to project five years out, which I know is a very long time in this world, what would Graphite look like as an organization?
Merrill Lutsky (23:46): I mean I think as an organization my bet is that we'll probably grow to like a couple hundred engineers, maybe a bit longer, a bit larger than that. But I think that we'll likely have some point of diminishing returns as the models get better and as we're able to just give more and more of our engineering workload over to agents, I expect that that will increasingly be power developing new features. So I expect to have a really strong still in-person, still really high quality engineering team here. But I think increasingly that will be for new features and as we're expanding more and more of that work will go to agents. Same thing on go-to-market. I think we'll likely have fewer really specialized, really great people on each role per customer that we serve. But we'll still add people as we grow. But I expect that the ratio of like output to humans needed to generate like a certain unit of output just continues to go down pretty asymptotically as this gets better.
Daniel Darling (24:45): Absolutely. Well look Merrill, thanks so much. I love looking at the future through your eyes. It's such an interesting moment in time for software development and you're right at the heart of it. So appreciate you coming on and sharing your vision with us.
Merrill Lutsky (24:57): Yeah, thanks so much for having me.
Daniel Darling (24:58): What an insightful chat with Merrill it's clear that Graphite is more than an AI coding tool. It's a glimpse into the future of how software will be built. By automating code review and orchestrating agent and human collaboration, the company is transforming productivity and shifting engineering from line-by-line work to architectural leadership as we edge towards a world where agents generate and humans guide. Graphite's approach to trust, control and iteration will define the next era of software creation. It's a future that won't just impact developers either, but all of us as we become increasingly enabled to code and to create. To follow Graphite and the work of Merrill, head over to his account on X @MerrillLutsky - I hope you enjoyed today's episode and until next time, thanks for listening and have a great rest of your day.
