Can Pull Requests Be Replaced?
January 3, 2021
Yesterday Kief Morris published a thought-proviking article, Why your team doesn’t need to use pull requests. The central thesis is that pull requests introduce a harmful human delay into the software delivery process, and that there are better alternatives. This promise excited me, as I read the introduction to the piece. But I found that the proposed solutions left me with more questions than answers.
The post generated a lot of knee-jerk reactions on Twitter, my own among them:
Thought-provoking, but I'm not convinced. The only option here I consider possibly tenable is pair programming, but I've come to believe that PP is no substitute for code review. Those who wrote the code will inevitably have a certain code blindness that a fresh reviewer won't.— Jonathan Hall (@TinyDevOps) January 2, 2021
But after more consideration, I think there’s more going on here than a tweet can adaqueately address. My goal in this article is to explain some of my thoughts on Morris’s article, and hpefully continue a discussion that may lead to some satisfying conclusions.
Agreement on the problem
Let me start by addressing the core point on which I completely agree with Morris:
“pull requests sacrifice … delivery time”
This is pretty obviously true to me. Even the smallest of pull requests represents a break in the author’s workflow:
- Complete a change.
- Wait for someone to review
- If feedback: Respond, jump to #2
When I’m training teams on trunk-based development and code review, one of the points I constantly hammer home is “Keep your PRs small, and quick to review!” Of course there are many reasons for this beyond the scope of this discussion, but for now the point is that small PRs help reduce the duration of the “wait” stage, and reduce the cost of context switching, etc.
But this is no panacea. I’ve spent countless hours, and emotional energy, reminding people to keep PRs small, and reminding them to review each other’s code. It seems that no amount of effort, policy, or good will, ever completely solves this problem. Someone is always complaining that their PRs are sitting idle too long.
Every obvious solution just amounts to new rules and more nagging, in an effort to convince people to review in a timely manner. As applied to this problem, I see WIP limits, Slack notifications, priority lists (“review code before starting a new task”), artificial schedules (“Friday is review day”), etc, all as band-aid “rules” to minimize the harm from the inevitable code review delay. Not a very sustainable approach.
It’s a real problem. It needs a better solution!
I don’t know much about Morris, but he works for ThoughtWorks, which I trust as an organization of thought leaders in this space, so I don’t want to be at all quick to dismiss his opinion.
I suspect that my disagreements with Morris mostly stem from different assumptions, and different experiences. I work mostly with small teams (30 developers or fewer), with a wide range of skill sets (from fresh bootcamp grads, to 20+-year software veterans). From his comments, and based on the fact that he works at ThoughtWorks, I expect he’s much more accustomed to working with very seasoned developers. No doubt this colors our respective views on this topic.
So let’s dive in. He summarizes some of his key points in this Tweet thread, which make responding point-by-point very easy:
Pull requests add overhead to cope with low-trust situations, e.g. to allow people you don’t know to offer contributions to your project.
I suppose pull requests do provide a safeguard for low-trust situations, but they do so, so much more. A few reasons that I think pull requests make sense on a trusted team, are:
- A timely mentoring opportunity. There’s no better time to mentor a new team member, than when they’ve just written some code, and it’s still fresh on their mind.
- Keep mainline pristine. The teams I work with always run automated tests (and linters, and potentially other automated gatekeeping) against each PR, prior to merge. This guarantees that test on mainline always pass, since it’s impossible to merge code that causes tests to fail.
- Improve code quality. A PR provides for a natural time to review a code’s quality (by whatever measure of “quality” you choose), before it arrives in mainline. A post-merge review (as discussed later) means there is at least a small time when “low quality” code lives in mainline, which may then be built upon by another team member before review. This point may not matter much on a team of seasoned developers, but in my experience, such teams are rare.
- Manual checks. A PR is a natural place to put any other manual checks that must happen in a software development workflow. Many teams create a check list in their PR template, to track things like manual QA sign-off, writing documentation, stakeholder notifications, etc. Of course these things can always be handled outside of a PR, but if you need anything like this, then including it in the PR means no extra process overhead.
PRs delay feedback on whether your code will pass tests once it is merged and made ready for deployment.
In my experience, this simply isn’t true. On every project I’ve worked on in the last decade (excepting one team that refused automated testing entirely–I didn’t work with them for long), we have been able to run our entire automated test suite before allowing a merge. For certain types and sizes of applications, this isn’t possible, but except in extreme situations, it should still be possible to run all unit tests prior to merge.
PRs also create extra work to make sure the code is deployable after merging.
This is also contrary to my experience. As I mentioned above, I always encourage teams to run automated tests before merging. GitHub, GitLab, and practically any other VCS interface, can be easily configured to prohibit merging when tests fail. And GitHub allows requiring up-to-date branches before merging, too, which addresses the concern of “deployable after merging”, as well.
Automatically deploying and testing merged code is immediate feedback that proves your code is ready for deployment. You can get to this point on every commit, without interrupting anyone else.
I agree with this, but unless I’m missing some profound point, I don’t see how it’s an argument against pull requests. I routinely see all of these benefits on teams using PRs. The task of automatically deploying an in-progress pull request does require a certain amount of up-front engineering, but when possible, I believe it’s absolutely worth it.
This means you’re not wasting anyone’s time having them review code that might not work when merged and made deployable.
This has never been a problem on any of my teams. As explained above, automated tests are run for each PR, so nobody ever has to bother with a review of a PR with failing tests.
The point is, use automation to make sure every commit is immediately assembled into a potentially deployable state (which means merged with everyone else’s work), and deploy and test it, before asking someone to interrupt their work to look at your code.
I completely, whole-heartedly agree with, and preach this. Where I seem to differ with Morris is that I believe the goal should be to do all of this before hitting “merge”.
Pull requests are an alternative to CI and CD, and that’s fine if it works for you.
I honestly don’t understand this statement. To me pull requests work hand-in-hand with CI and CD. It’s obviously possible to do pull requests without CI and CD, and it is also possible to do CI and CD without pull requests. It seems to me they solve entirely different problems, so I don’t see them as alternatives.
Morris offers three specific alternatives to the Pull Request approach:
- Pair programming, in which two developers write each piece of code during the same session, thus, according to Morris, eliminating the need for a separate code review
- Periodic review, in which reviews are batched and done at possibly scheduled intervals
- Pipeline approvals, in which review happens after merge with mainline, but before deployment
None of these approaches feels right to me, but each for different reasons, so let me address my concerns on each. But I’ll go in reverse order.
This particular approach is the least attractive to me. First, because I see it as attempting to solve a problem I’ve never faced. Specifically, to quote from the article:
This is conceptually similar to a pull request in that it is a gate in the delivery process, but you place the gate after code integration and automated tests.
As I detailed above, I believe the goal ought to be to run automated tests before merging is even an option. This allows the gate to occur after automated tests, without introducing extra burden.
I also explained above that I typically require all PR branches to be up to date with mainline before allowing merge, which solves the problem of gating after code integration.
For the latter point, sufficiently busy repos will have high contention, and may trigger rebase races. In such a case, perhaps some form of pipeline approvals would be appropriate. But I’d rather leave this question to people who deal with this specific problem. GitLab has a Merge Trains feature, which aims to help solve this. I’ve never had a reason to use it, so can’t comment on its efficacy. I expect GitHub, Bitbucket, and others, have similar features, or you may be able to roll-your-own, if necessary.
My second contention with this solution is that I’m uneasy with it’s “shift right” approach (contrasting with shift left). This is of course very subjective, so I don’t expect anything like a consensus on this point. But I have spent a lot of time trying to find creative ways to shift as much human interaction as possible “left” of the “merge” button. Making the choice to move something “right” of the merge button feels to me like a step in the wrong direction.
A lot of ink has been spilled on the topic of batch sizes, lead largely by the Lean Manufacturing methodology, from which so many Agile and DevOps authors and practitioners have borrowed. This causes me to puzzle over the suggestion of increasing batch size for the purpose of review. Of course, none of this batch-size talk is Gospel truth, and I believe every team should use the batch size that works best for them, and for their situation.
But speaking from anecdotal evidence, smaller batch sizes inevitably increase the quality of code review. The larger a code review, the more tempting it is to skim, and skip over bits.
Perhaps that is appropriate for certain types of review, or for certain purposes. Morris doesn’t go into detail about what he expects from a code review, so maybe this is an appropriate trade-off for his assumed use-case.
But for the reasons I advise teams to use code review, I’m highly confident this would mean a big loss. For clarity, here are some ways I believe periodic batch reviews would lead to lower-quality reviews:
- Less thorough review, as the reviewers are more likely to skim, especially near the end of the designated review period, leading up to beer’o’clock.
- Longer feedback loop. As a code author, a review I receive 30 minutes after a push will be more impactful than one I receive 5 days later. Getting back into a 5-day-previous mindset can be a chore, and is far less efficient than addressing concerns immediately.
- Lower code quality. One of the chief reasons I advocate code review is to improve code readability. If unreadable code is already in mainline, much of that benefit is lost.
- Less motivation to improve. Developers are lazy. I’m speaking about myself. If I have working code in production, and a colleague says “You know, you could improve this thing easily with…”, I’m far more likely to accept the suggestion during pre-merge review, than during post-merge review. The temptation to shrug it off as “yeah, good idea, maybe next time” is a lot higher in the latter case.
Like many people, I expect, I’m a fan of the idea of pair programming (PP). I’ve only ever done it in isolation, either as an experiment, or as an exercise in cross-training and team building. I have worked with people who advocate PP as part of their daily routine. I’ve also worked with people who consider PP a reasonable alternative to after-the-fact code review.
There’s plenty of room for good people to differ on this point, but in my view, while pair programing offers a lot of great potential benefits, it is not a substitute for code review. There’s a certian “code blindness” that comes from writing code (even in a pair, or mob). To me the most important thing a code review can tell us is “will this code make sense to a stranger in 6-18 months?” Pairing doesn’t adequately answer this question.
And this is to say nothing of the question of whether pairing is a viable alternative to solo programming in all cases. Many proponents (I expect Morris among them) say it is. I believe these proponents have good reasons to say so, and I’m in no position to disagree. But I also recognize that most people don’t agree, and I hope we can find a solution that works well for them.
For those who agree with Morris that pairing is always appropriate, and that it is sufficient for code review, then I take no issue with this approach.
But I’ll also point out that this approach in no way conflicts with pull requests–it can be treated as complimentary. With one group I helped a few years ago, we used the typical PR workflow, and one of the teams practiced occasional pair programming. Their approach was simply for one of the developers to create and push the PR, and the other to immediately approve it. No special cases, or different tooling was required.
A good alternative?
Now that I’ve expressed my concerns about Morris’s three alternatives, I wish I had my own alternative to offer instead. Unfortunately, I don’t.
But I do have a bit of an inkling of some characteristics a good alternative ought to have:
- Feedback should be as immediate as possible.
- A developer should be able to complete a task with the minimum of interruptions.
- It should be simple to execute.
- It should make no assumptions about the seniority of the team members.
I don’t have a conclusion. I guess that’s kind of the point.
I completely agree with Morris that reviewing PRs introduces a disruption. I think I disagree with him about how big that disruption needs to be, but all the same, I’d love to find a way to kill the PR code review delay, and all of “rules” it implies.
Unfortunately, up to now, this seems the least bad option.
What other alternatives are there to consider? What alternative or contrarian view do you take to anything I’ve said?
When is a pull request too big?
Smaller pull requests are faster to write and easier to review. Here are 4 tests to see if your PR might be too big.
The one code review method to rule them all
Find the method that works for you, and don't let anyone guilt you into thinking your way is "wrong".
Improve your software delivery