Advanced merging on file level

When merging two branches, I usually just start a merge with git merge <target> and resolve everything by git mergetool (which is configured to be kdiff3 in my case). However, sometimes resolving the conflicts can be tough. Especially if the merge is large and there are non-trivial conflicts for which you don’t immediately know the correct solution. In such cases it can be convenient to be able to unwind the process of resolving a conflicted file a bit. You may want to resolve a file partially, which kdiff3 doesn’t allow. Or you may want to edit the files a bit (i.e. ours, theirs and the common ancestor), before starting the merge. This could be to straighten up the formatting to be consistent accross all three file versions, or something else.

Luckily, git allows that. This page and this page are a helpful read when it comes to merging files “manually”. The following python script is created to automate the process of creating the three file version, ours, theirs and common, and do stuff with it. It is intended to be tweakable and to be adjusted to whatever the circumstances require.

Deleting merged local branches in git

When working on a project, the number of branches tend to increase linearly with the work being done. Personally, I don’t like to have all this clutter in my branch list. When I look at the branches, I just want to see the active stuff. A common way to deal with this problem in git is to archive branches. In a previous post I have shown a script that helps you to archive these merged branches automatically and interactively. If this is not an option, because e.g. the conventions of the project don’t allow you to do so, you can alternatively just settle with deleting your local branches. The following script automates that.

Here is an example of the script in action.

$ python ~/dotfiles/delete_local_merged_branches.py -h usage: delete_local_merged_branches.py [-h] [master]

positional arguments: master Name of the master branch. Defaults to 'master'.

optional arguments: -h, --help show this help message and exit

$ python ~/dotfiles/delete_local_merged_branches.py NextMajor Merged local branches: DR-011776_prototype-2 DR-011870 DR-011923 DR-011974

Delete local branch DR-011776_prototype-2? [y/n]y 
Deleted branch DR-011776_prototype-2 (was d814812). 
Delete local branch DR-011870? [y/n]y 
Deleted branch DR-011870 (was efbe881). 
Delete local branch DR-011923? [y/n]y 
Deleted branch DR-011923 (was 1e0d52d). 
Delete local branch DR-011974_add_cppcheck? [y/n]n

Archiving git branches

When working on a project, the number of branches tend to increase linearly with the work being done. Personally, I don’t like to have all this clutter in my branch list. When I look at the branches, I just want to see the active stuff. A common way to deal with this problem in git is to archive branches. Git doesn’t natively supported the concept of an archived branch, but the usual way to emulate this is by creating a tag called archive/<branchname> and removing the branch after that. If you have many branches, it is convenient to be able to do this automatically and interactively. I wrote a little python script to make the whole thing take little time and effort.

Here is an example of the script in action.

$ python archive_merged_branches.py
Archive branch DR-010559? [y/n]y
Archive branch DR-010579? [y/n]y
Archive branch DR-010580? [y/n]y
Archive branch DR-010586? [y/n]y
Archive branch DR-010707? [y/n]y
Archive branch chiel/doc? [y/n]n

Created archive tags:
    DR-010559
    DR-010579
    DR-010580
    DR-010586
    DR-010707
Push archive tags to remote? [y/n]y
To <remote host>
 * [new tag]         archive/DR-010559 -> archive/DR-010559
To <remote host>
 * [new tag]         archive/DR-010579 -> archive/DR-010579
To <remote host>
 * [new tag]         archive/DR-010580 -> archive/DR-010580
To <remote host>
 * [new tag]         archive/DR-010586 -> archive/DR-010586
To <remote host>
 * [new tag]         archive/DR-010707 -> archive/DR-010707

Corresponding remote branches:
    origin/DR-010559
    origin/DR-010579
    origin/DR-010580
    origin/DR-010586
    origin/DR-010707
Delete remote branches? [y/n]y
To <remote host>
 - [deleted]         DR-010559

error: unable to delete 'DR-010579': remote ref does not exist
error: failed to push some refs to <remote host>

To <remote host>
 - [deleted]         DR-010580

To <remote host>
 - [deleted]         DR-010586

To <remote host>
 - [deleted]         DR-010707


Corresponding local branches:
    DR-010559
    DR-010579
    DR-010580
    DR-010586
    DR-010707
Delete local branches? [y/n]y
Deleted branch DR-010559 (was a3cdf9b).
Deleted branch DR-010579 (was 398ca49).
Deleted branch DR-010580 (was b8107c6).
Deleted branch DR-010586 (was 918a0c5).
Deleted branch DR-010707 (was 898fe26).

Grep, git-grep, Ack and Ag

There have been quite a few tools around for finding stuff in a bunch of files, something I wanna do quite frequently. Which tool should we use? Let’s briefly compare four of the most well known tools, grep, git grep, ack and ag.

Grep

Grep is a command-line utility for searching plain-text data sets for lines matching a regular expression. Grep was originally developed for the Unix operating system, but is available today for all Unix-like systems. It’s what I’ve been using so far to search through source files. It can be slow in big repositories with many binary artifacts, even when telling grep to skip those. But on individual files it is blazingly fast.

Ack

Ack has started in 2005 as an improvement over grep. If you know GNU grep, you know most of ack’s switches, too. Word-only searching with -w, case-insensitive searching with -i, etcetera. However, I find that it’s not really an improvement over grep. The claimed improvements from betterthangrep.com are

  1. It’s fast | Ack is slow compared to Grep
  2. It’s portable | Grep is even available on the most ancient unix systems, and on windows via mingw or gitbash
  3. It ignores VCS directories | This is a file find feature, not a pattern match feature
  4. Better search results | I.e. a whitelist of filetypes, which I don’t find an improvement. Again, this is a file find feature.
  5. Easy filetype specifications | This is a file find feature
  6. Creates lists of files without searching | This is a file find feature
  7. Match highlighting | Grep can do this with the –color switch
  8. Perl regular expressions | Grep can do this with the -P switch
  9. Command switches much like GNU grep | Well, guess what…
  10. “ack” is shorter than “grep” to type | alias ack=grep

So, Ack is slower and the easier CLI interface with slightly more advanced file finding features don’t justify the complete rewrite, that could have been just a CLI wrapper.

Ag

Ag is a mostly-compatible clone of Ack, started in 2011. Written in C, mostly by Geoff Greer. It uses tricks like pthreads, memory-mapped IO, Boyer-Moore-Horspool strstr(), and PCRE’s JIT to improve performance. It also supports some extra features, such as obeying gitignore/hgignore/svn:ignore.

Git grep

The searching tool shipped with git, called ‘git grep’ defaults to searching in the files that are tracked by git. This is fast and convenient, since Git grep can easily search through previous revisions, although I rarely need this.

git grep foo HEAD^
git checkout HEAD^; grep -R foo .; git checkout --

Git grep also improves the way filetypes can be specified

git grep 'test 123' -- \*.cpp \*.h

as opposed to

grep 'test 123' --include=\*.cpp --include=\*.h

Most importantly, git grep is faster than ack and ag, since it both utilizes the VCS file index and the blazingly fast pattern match tool called grep. So, we should not use Ack or Ag, but stick with

alias gg=git grep -IPn --color # For searching in repositories
alias gr=grep -rIPn --color # For searching outside outside repositories (or maybe non-tracked files)