drush grep, search raw content in drupal with regular expressions

I was looking for a way to search raw content (before input filters are applied) in my Drupal blog using regular expressions, à la grep; I googled to see what other people had come up with to solve the same problem and I found an article about Searching the Drupal Database by Regular Expression, which pointed also to the scanner module, however those solutions have both some limitations: ad-hoc Drupal scripting, only MySQL supported, and I didn't want to have a module installed for that anyways; so I tried the Drush way and I found it the most convenient one.

Drush is a good tool to know if you are somewhat into Drupal, with Drush you can do almost anything you do with the Drupal admin UI, only faster and in a scriptable way from a CLI.

The power of Drush is that it can be extended very easily to meet our needs, by writing new commands, a first resource about that is the example command sandwich.drush.inc. Oddly enough a “How to write a new Drush command?” question was not even in the FAQ, but that's fixed now.

Anyhow, I wrote a “grep” Drush command for my issue, you can find it in the dgrep git repository. Here is an example run on my blog to check out where I used the syntaxhighlighter filter:

$ cd .../my_drupal_installation_dir
$ drush grep '/syntaxhighlighter[^}]*/'

Node: 11        Title: Web scraping with PHP and XSL
                URL: blog/2009/07/26/web-scraping-php-and-xsl
                Match: syntaxhighlighter brush:php

Node: 12        Title: Renaming a DOM element with XSL
                URL: blog/2009/08/06/renaming-dom-element-xsl
                Match: syntaxhighlighter brush:php

Node: 13        Title: Translating XML documents with XLIFF
                URL: blog/2009/09/09/translating-xml-documents-xliff
                Match: syntaxhighlighter brush:xml

Node: 16        Title: git-commit with date in the past
                URL: blog/2009/10/30/git-commit-date-past
                Match: syntaxhighlighter brush:bash

Node: 19        Title: Vim buffers: status(line) symbol
                URL: blog/2009/11/12/vim-buffers-statusline-symbol
                Match: syntaxhighlighter brush:plain

Node: 26        Title: Branding patches with git and vim
                URL: blog/2010/01/05/branding-patches-git-and-vim
                Match: syntaxhighlighter brush:bash

Node: 34        Title: On piping in shell scripts and var scoping
                URL: blog/2010/03/26/piping-shell-scripts-and-var-scoping
                Match: syntaxhighlighter brush:shell

Node: 37        Title: Neat compile/run cycle with git and OpenEmbedded
                URL: blog/2010/05/27/neat-compilerun-cycle-git-and-openembedded
                Match: syntaxhighlighter brush:bash

Node: 43        Title: AO2 runs into autorun.inf
                URL: blog/2010/09/19/ao2-runs-autoruninf
                Match: syntaxhighlighter class="brush: bash"

Node: 46        Title: List header files first in a patch with git
                URL: blog/2010/10/13/list-header-files-first-patch-git
                Match: syntaxhighlighter class="brush: cpp;" title="dinner.h"

TODO: grepping blocks is not supported yet

How cool is that?

The current dgrep code is just a prototype but I'd like it to become useful for the whole Drupal community. So if you wanna help: go try it, comment about it, fork it, and report back with any feedback or code change you might have, either here or in the relative issue on drupal.org. Thanks!

CommentsSyndicate content

Post new comment

The content of this field is kept private and will not be shown publicly. If you have a Gravatar account associated with the e-mail address you provide, it will be used to display your avatar.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Enter the code without spaces.