On Fri, Oct 04, 2013 at 11:42:29AM +0100, Andy Parkins wrote:
> On Friday 04 October 2013 12:30:07 Mike Hearn wrote:
> > Git makes it easy to fork peoples work off and create long series of
> > commits that achieve some useful goal. That's great for many things.
> > Unfortunately, code review is not one of those things.
> > 
> > I'd like to make a small request - when submitting large, complex pieces of
> > work for review, please either submit it as one giant squashed change, or
> 
> Don't do this.  It throws away all of the good stuff that git lets you record.  
> There is more to a git branch than just the overall difference.  Every single 
> log message and diff is individually valuable.  It's easy to make a squashed 
> diff from many little commits; it's impossible to go the other way.
> 
> Command line for you so you don't have to think about it:
> 
>   git diff $(git merge-base master feature-branch) feature-branch 
> 
> git-merge-base finds the common ancestor between master and feature-branch, 
> and then compares feature-branch against that.

Git is a revision *communication* system that happens to also make for a
good revision *control* system.

Remember that every individual commit is two things: what source code
has changed, and a message explaining why you thought that change should
be made. Commits aren't valuable in of themselves, they're valuable
because they serve to explain to the other people you are working with
why you thought a change should be made. Sometimes it makes sense to
explain your changes in 10 commits, sometimes it makes sense to squash
them all up into one commit, but there's no hard and fast rule other
than "Put yourself in your fellow coders' shoes - what's the best way to
explain to them what you are trying to accomplish and why?" You may have
generated a lot of little commits in the process of creating your patch
that tell a story that no-one else cares about, or equally by squashing
everything into one big commit you wind up with a tonne of changes with
little explanation as to why they were made.

Two caveats apply however: git-bisect works best if every commit in the
tree you are trying to debug works well enough that you can run tests
without errors - that is you don't "break the build". Don't make commits
that don't compile at the very least, and preferably everything you do
should be refactored to the point where the commit as a whole "works".

The second caveat is more specific to Bitcoin: people tend to rebase
their pull-requests over and over again until they are accepted, but
that also means that code review done earlier doesn't apply to the later
code pushed. Bitcoin is a particularly high profile, and high profit,
target for people trying to get malicious code into the codebase. It may
be the case that we would be better off asking reviewers making small
changes to their pull-requests to add additional commits for those
changes rather than rebasing, to make it clear what changes were
actually made so that code reviewers don't have to review the whole
patch from scratch. After all, the place where the most eyes will
actually look at the commits is during the pull-req process; after the
code has been pulled the audience for those commits is in most cases
almost no-one.

Having said that, there's currently a lot of other holes in the review
and source code integrity process, so fixing this problem is probably
not the low-hanging fruit right now.


FWIW personally I tend to review patches by both looking at the
individual commits to try to understand why someone wanted to make a
change, as well as all commits merged into one diff for a "what actually
changed here?" review.

-- 
'peter'[:-1]@petertodd.org