Parsing Things Database

I've been using Things to manage my todo lists recently. I like the simplicity and ease of use of the app and even in beta, it covers most of what I want from a task management application. One of the reasons I like it is that is uses an XML file for storing all the content, which makes it easy for me to process the data if there is a feature Things doesn't support. The other day I wanted to print out my to-do list and Things doesn't really have much support for printing yet, so I decided to write a little script to parse the data file and print it out to text.

If you haven't taken a look at things, it's very slick application for managing todo lists. While it's not specifically designed around the GTD process, it is close enough that it's easy for people who practice GTD - or something similar to it - to use Things to manage this process. Picture 3 Recently, I've had two different cases come up where I needed to print or export data from Things. Unfortunately, that's a feature area that is not finished. So I decided to take a closer look at the XML file format and see how difficult it would be to parse the file and create my own report.

Objects and Relationships

Things' data file is a fairly simple XML file that is primarily a collection of object elements. These elements contain attribute elements which are the properties of an object and relationship elements which can model a one-to-one or a one-to-many relationship to other objects in the file. Here is a snippet of a test file I used.

<object type="TODO" id="z159">
    <attribute name="focustype" type="int32">131072</attribute>
    <attribute name="focuslevel" type="int16">0</attribute>
    <attribute name="datemodified" type="date">227707669.31836900115013122559</attribute>
    <attribute name="datecreated" type="date">227707452.24475499987602233887</attribute>
    <attribute name="title" type="string">Todo 1.1.1</attribute>
    <attribute name="index" type="int32">0</attribute>
    <attribute name="identifier" type="string">7F63B75E-11F4-4153-B222-7506882CAD79</attribute>
    <attribute name="compact" type="bool">1</attribute>
    <relationship name="parent" type="1/1" destination="THING" idrefs="z169"></relationship>
    <relationship name="author" type="1/1" destination="COWORKER"></relationship>
    <relationship name="delegate" type="1/1" destination="COWORKER"></relationship>
    <relationship name="focus" type="1/1" destination="FOCUS" idrefs="z150"></relationship>
    <relationship name="recurrenceinstance" type="1/1" destination="TODO"></relationship>
    <relationship name="recurrencetemplate" type="1/1" destination="TODO"></relationship>
    <relationship name="scheduler" type="1/1" destination="GLOBALS"></relationship>
    <relationship name="children" type="0/0" destination="THING"></relationship>
    <relationship name="tags" type="0/0" destination="TAG" idrefs="z130 z102"></relationship>
    <relationship name="reminderdates" type="0/0" destination="REMINDER"></relationship>
</object>

You can see the relationship with the name parent which has one ID in its idrefs list. Further down, you can see the tags relationship has two references. There are three primary types of object elements:

  1. TODO: an actual todo item
  2. TAG: a tag object
  3. FOCUS: these are a generalized object type used for Projects, Areas and groupings of these (like Today, Next).

All I do to parse the document is to create a dict for each object, using the attribute and relationship sub-elements as fields in the dict. Relationships are initially stored as lists of the string idref values. Then at the end of the parsing method, once I have all the objects loaded into an ID map, I resolve the references to the actual dict objects. This leaves me with a "Pythonic" graph of the data rather which is easier to work with (IMHO) for querying and processing.

def parse_things_xml(database):
    """Parse object nodes of from of XML object elements
    and save in dicts. Parses attribute elements for dict fields
    and follows parent/children relationship elements to link up
    related nodes. Returns a list of root nodes (those without
    parents).
    """

    def parse_relationship(relationships, name):
        """Parse the specified relationship out of the list of relationships.
        Assumes there is only one relationship of the specified name.
        """
        rels = [r for r in relationships
                if r.attributes['name'].value == name]
        if not rels:
            return None
        idrefs = rels[0].attributes.has_key('idrefs') and \
                 rels[0].attributes['idrefs'].value.split(' ') or []
        return idrefs

    xmldoc = minidom.parse(database)
    objects = xmldoc.getElementsByTagName('object')

    # parse each object element into a dict storing attribute
    # child elements as dict fields and child relationship
    # elements as lists of idref values
    node_map = {}
    for obj in objects:
        node = {'id': obj.attributes['id'].value,
                'type': obj.attributes['type'].value}
        attrs = obj.getElementsByTagName('attribute')
        for attr in attrs:
            val = attr.hasChildNodes() and attr.firstChild.data or None
            node[attr.attributes['name'].value] = val
        rels = obj.getElementsByTagName('relationship')
        for rel in rels:
            relname = rel.attributes['name'].value
            node[relname] = parse_relationship(rels, relname)
        node_map[node['id']] = node

    # resolve idrefs by replacing their value with a reference to the
    # actual dict
    for node in node_map.values():
        for relname, idrefs in node.items():
            if isinstance(idrefs, list):
                relnodes = [node_map[idref] for idref in idrefs]
                if relnodes and relnodes[0].has_key('index'):
                    relnodes.sort(key=lambda n: n['index'])
                node[relname] = relnodes

    return node_map.values()

I also have a method to query the object graph based on some simple selection criteria and then a ''very'' simplistic printing routine to print out the results. The code is a little rough, but it's a start. I've organized it into two files: thingslib.py which does the parsing and querying, and things.py which is a script that you can call to select the items of interest and print them out. The code is available from my download page.

|