There have been quite a few tools around for finding stuff in a bunch of files, something I wanna do quite frequently. Which tool should we use? Let’s briefly compare four of the most well known tools, grep, git grep, ack and ag.
Grep is a command-line utility for searching plain-text data sets for lines matching a regular expression. Grep was originally developed for the Unix operating system, but is available today for all Unix-like systems. It’s what I’ve been using so far to search through source files. It can be slow in big repositories with many binary artifacts, even when telling grep to skip those. But on individual files it is blazingly fast.
Ack has started in 2005 as an improvement over grep. If you know GNU grep, you know most of ack’s switches, too. Word-only searching with -w, case-insensitive searching with -i, etcetera. However, I find that it’s not really an improvement over grep. The claimed improvements from betterthangrep.com are
- It’s fast | Ack is slow compared to Grep
- It’s portable | Grep is even available on the most ancient unix systems, and on windows via mingw or gitbash
- It ignores VCS directories | This is a file find feature, not a pattern match feature
- Better search results | I.e. a whitelist of filetypes, which I don’t find an improvement. Again, this is a file find feature.
- Easy filetype specifications | This is a file find feature
- Creates lists of files without searching | This is a file find feature
- Match highlighting | Grep can do this with the –color switch
- Perl regular expressions | Grep can do this with the -P switch
- Command switches much like GNU grep | Well, guess what…
- “ack” is shorter than “grep” to type | alias ack=grep
So, Ack is slower and the easier CLI interface with slightly more advanced file finding features don’t justify the complete rewrite, that could have been just a CLI wrapper.
Ag is a mostly-compatible clone of Ack, started in 2011. Written in C, mostly by Geoff Greer. It uses tricks like pthreads, memory-mapped IO, Boyer-Moore-Horspool strstr(), and PCRE’s JIT to improve performance. It also supports some extra features, such as obeying gitignore/hgignore/svn:ignore.
The searching tool shipped with git, called ‘git grep’ defaults to searching in the files that are tracked by git. This is fast and convenient, since Git grep can easily search through previous revisions, although I rarely need this.
git grep foo HEAD^
git checkout HEAD^; grep -R foo .; git checkout --
Git grep also improves the way filetypes can be specified
git grep 'test 123' -- \*.cpp \*.h
as opposed to
grep 'test 123' --include=\*.cpp --include=\*.h
Most importantly, git grep is faster than ack and ag, since it both utilizes the VCS file index and the blazingly fast pattern match tool called grep. So, we should not use Ack or Ag, but stick with
alias gg=git grep -IPn --color # For searching in repositories
alias gr=grep -rIPn --color # For searching outside outside repositories (or maybe non-tracked files)