Sunday, April 10, 2011

System-wide Ctags, part I

I've been using Ubuntu Server on my laptop, since it seems to consume far fewer resources than a standard Desktop Ubuntu distribution, and gives me much more choice about what I'd like on my machine. For example, I don't want GNOME... I'd rather use a much lighter-weight Window and Login Manager (DWM+SLiM at the moment). Configuring X really isn't all that hard. But I digress, and will save a more detailed discussion of this topic for another post...

Like everyone else, I'll often download 'dev' packages, some of which have little to no documentation on their APIs. In addition to that I of course have a million source repos floating around, since I often like to compile things from source and keep scripts handy to easily stay up-to-date with those projects. So what I'm getting at here is that I'd like to keep a set of tags files (or a single one) for all of the source code found on my box (for languages which ctags supports, of course).

This brings up three main questions in my mind:
  • What is the most efficient way to do this in terms of disk I/O?
  • What is the most efficient way to do this in terms of disk space usage?
  • What is the most efficient way to do this in terms of CPU scheduling? Should this be a periodic task run by say, cron, or an on-demand task?
It is of course wasteful to generate tag files you will never need, for source code you'll never read. But the point of all this is that I want to have a tag file ready immediately should I decide to. How much resources are consumed in generating tags files with Exuberant Ctags? Could a better tool be created? I know fast source code indexing and cross-referencing has been studied extensively in connection with Xcode for example, and I believe in relation to Apple's GCC extensions (I'm a confessed VIM addict, but I'm also open to just about any other type of editor. However, it would seem that most good editors support ctags... so this is definitely something else to consider as well).

A little searching and I found a nice post here, the most promising idea of which seems to be "incrontab", or running cron jobs based on filesystem events rather than time periods. By combining the use of this tool with the correct selection of filesystem event triggers and some estimations of:
  • Potential tag file size based on code base size (min-max)
  • Potential running time / CPU resources consumed in tag file generation
I believe it should be possible to come up with an optimal solution to this problem.