Digging Into Spotlight

I came across a nice article on advanced searching using Spotlight that talked about search keywords like kind and date, and as I read it I thought to myself I wonder what other cool keywords you can use in Spotlight? At first blush, this seems like the type of innocent question that should be easy to quickly answer, but as often is the case with technology, things are a bit more complicated than they first appear.

Dropping to the Command Line

My first thought was just to search for help on Spotlight, and that's not a bad starting place as the page covers a number of keywords, more correctly called metadata attributes. Spotlight's help covers kind, author, date, created and by as well as the boolean operators AND, OR and NOT. But I knew there were many other metadata fields that were commonly used in files such as images and audio files. I had come across the mdls shell command before which lists the metadata fields on a file. A quick check of a JPEG image revealed all kinds of interesting data:

da-imac-01:stuff  david$ mdls IMG_3564.JPG 
kMDItemAcquisitionMake         = "Canon"
kMDItemAcquisitionModel        = "Canon EOS 10D"
kMDItemAperture                = 0.970855712890625
kMDItemBitsPerSample           = 32
kMDItemColorSpace              = "RGB"
kMDItemContentCreationDate     = 2008-04-22 18:40:25 -0400
kMDItemContentModificationDate = 2008-04-22 18:40:25 -0400
...
kMDItemFlashOnOff              = 0
kMDItemFNumber                 = 1.399999976158142
kMDItemFocalLength             = 50
...

This is a truncated list of the 50+ fields in one of my image files. Note some of the interesting ones like the aperture, flash setting and focal length.

I played around with another of the Spotlight/metadata shell commands: mdfind. This lets you do the equivalent of a Spotlight search from the command line and after a bit of trial and error, guessing the keyword names and value formats was fairly easy:

da-imac-01:stuff david$ mdfind make:canon focallength:50 flash:0 iso:125
/Users/david/Desktop/Turks, April 2008/IMG_3564.JPG
/Users/david/Pictures/iPhoto Library/Originals/2008/Turks, April 2008/IMG_3564.JPG
/Users/david/Pictures/iPhoto Library/Originals/2008/Museum Visit/IMG_3273.JPG
/Users/david/Pictures/iPhoto Library/Originals/2008/Museum Visit/IMG_3276.JPG
/Users/david/Pictures/iPhoto Library/Originals/2008/Mar 23, 2008/IMG_3288.JPG
...

Peeling the Onion with DTrace

Although these shell commands are very useful, the man pages for the commands do not list the valid search keywords. I knew there must be a list of the keywords used in Spotlight search bar that mapped to these constant names so I thought *what a great time to learn dtrace!*

For those of you who haven't heard of dtrace I encourage you to play around with it. It's a very powerful tool for doing live probing and tracing of low level activity in the operating system. After skimming this nice tutorial I tried this command in one window:

da-imac-01:bin david$ sudo dtrace -n 'syscall::open*:entry /execname == "mdfind"/ \
    { printf("%s %s", execname, copyinstr(arg0)); }'
Password:
dtrace: description 'syscall::open*:entry ' matched 3 probes

and then ran my mdfind command again in another Terminal window. The dtrace "script" says to trace all system calls whose name begins with "open" when the system call is entered, but only if they were called from the mdfind process, and then print out the name of the system call and the first argument (which in the case of open is the file or device name). That resulted in LOTs of calls like this showing mdfind opening all kinds of metadata importer files, which I assume are libraries that know how to manipulate certain types of metadata attributes:

CPU     ID                    FUNCTION:NAME
  0  18390              open_nocancel:entry mdfind /System/Library/Spotlight/\
    Audio.mdimporter
  0  18390              open_nocancel:entry mdfind /System/Library/Spotlight/\
    Audio.mdimporter/Contents
  0  17604                       open:entry mdfind /dev/autofs_nowait
  0  17604                       open:entry mdfind /System/Library/Spotlight/\
    Audio.mdimporter/Contents/Info.plist
  0  18390              open_nocancel:entry mdfind /System/Library/Spotlight/\
    Chat.mdimporter
  0  18390              open_nocancel:entry mdfind /System/Library/Spotlight/\
    Chat.mdimporter/Contents
...

But the part of the trace I was most interested in was near the very end:

...
  1  17604                       open:entry mdfind /System/Library/Frameworks/\
    CoreServices.framework/Versions/A/Frameworks/Metadata.framework/\
    Resources/MDPredicate.plist
  1  17604                       open:entry mdfind /dev/autofs_nowait
  1  17604                       open:entry mdfind /System/Library/Frameworks/\
    CoreServices.framework/Versions/A/Frameworks/Metadata.framework/\
    Resources/English.lproj/MDPredicateKeywords.plist
  1  17604                       open:entry mdfind /dev/autofs_nowait
  1  17604                       open:entry mdfind /System/Library/Frameworks/\
    CoreServices.framework/Versions/A/Frameworks/Metadata.framework/\
    Resources/English.lproj/schema.strings
...

Note the files MDPredicateKeywords.list and schema.strings in the /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/Metadata.framework/Resources/English.lproj folder. I tried looking at the schema.strings file but it was in a binary format. So I tried open schema.strings and sure enough, Xcode launched and loaded the file which contains over 400 lines of mostly metadata keyword definitions. The ones we are interested in are near the end and of the form kMDItemXXX.ShortName = yyy:

"kMDItemPixelHeight.ShortName"                = "pixelheight,height";
"kMDItemPixelWidth.ShortName"                 = "pixelwidth,width";
"kMDItemWhiteBalance.ShortName"               = "whitebalance";
"kMDItemAperture.ShortName"                   = "aperture,fstop";
"kMDItemAudioEncodingApplication.ShortName"   = "audioencodingapplication";
"kMDItemComposer.ShortName"                   = "composer,author,by";
"kMDItemLyricist.ShortName"                   = "lyricist,author,by";
"kMDItemStarRating.ShortName"                 = "starrating";

These are just a few of the dozens of entries to whet your appetite. For the most part, I've found them to work as expected, with one exception: starrating. I never got any hits using it so I tried using mdls on an MP3 that I knew had a rating set in iTunes and there was no metadata attribute set on it for the iTunes rating. So I guess all you Mac developers out there should "do as Apple says, not as Apple does."

Satisfaction

One of the things that I really like about OS X is the ability to work with the system at varying levels of depth. This diversion started with me playing around with the Spotlight search bar: a very advanced "desktop search" feature found only in the most modern operating systems. But when I wanted to learn more, I was able to easily muck around at the command line and experiment with the very same infrastructure that Spotlight is built on. Finally, I was able to leverage a very powerful, low level system tool, dtrace to probe the details of what was going on inside OS X which led me to the answer I was looking for.

|