Parsing Things Database
I've been using Things to manage my todo lists recently. I like the simplicity and ease of use of the app and even in beta, it covers most of what I want from a task management application. One of the reasons I like it is that is uses an XML file for storing all the content, which makes it easy for me to process the data if there is a feature Things doesn't support. The other day I wanted to print out my to-do list and Things doesn't really have much support for printing yet, so I decided to write a little script to parse the data file and print it out to text.
If you haven't taken a look at things, it's very
slick application for managing todo lists. While
it's not specifically designed around the GTD
process, it is close enough that it's easy for
people who practice GTD - or something similar to
it - to use Things to manage this process.
Recently, I've had two
different cases come up where I needed to
print or export data from Things.
Unfortunately, that's a feature area that is
not finished. So I decided to take a closer
look at the XML file format and see how
difficult it would be to parse the file and
create my own report.
Objects and Relationships
Things' data file is a fairly simple XML file
that is primarily a collection of
object elements. These elements
contain attribute elements which are
the properties of an object and
relationship elements which can model
a one-to-one or a one-to-many relationship to other
objects in the file. Here is a snippet
of a test file I used.
<object type="TODO" id="z159"> <attribute name="focustype" type="int32">131072</attribute> <attribute name="focuslevel" type="int16">0</attribute> <attribute name="datemodified" type="date">227707669.31836900115013122559</attribute> <attribute name="datecreated" type="date">227707452.24475499987602233887</attribute> <attribute name="title" type="string">Todo 1.1.1</attribute> <attribute name="index" type="int32">0</attribute> <attribute name="identifier" type="string">7F63B75E-11F4-4153-B222-7506882CAD79</attribute> <attribute name="compact" type="bool">1</attribute> <relationship name="parent" type="1/1" destination="THING" idrefs="z169"></relationship> <relationship name="author" type="1/1" destination="COWORKER"></relationship> <relationship name="delegate" type="1/1" destination="COWORKER"></relationship> <relationship name="focus" type="1/1" destination="FOCUS" idrefs="z150"></relationship> <relationship name="recurrenceinstance" type="1/1" destination="TODO"></relationship> <relationship name="recurrencetemplate" type="1/1" destination="TODO"></relationship> <relationship name="scheduler" type="1/1" destination="GLOBALS"></relationship> <relationship name="children" type="0/0" destination="THING"></relationship> <relationship name="tags" type="0/0" destination="TAG" idrefs="z130 z102"></relationship> <relationship name="reminderdates" type="0/0" destination="REMINDER"></relationship> </object>
You can see the relationship with
the name parent which has one ID in
its idrefs list. Further down, you can
see the tags relationship has two
references. There are three primary types of
object elements:
- TODO: an actual todo item
- TAG: a tag object
- FOCUS: these are a generalized object type used for Projects, Areas and groupings of these (like Today, Next).
All I do to parse the document is to create a
dict for each object, using the
attribute and
relationship sub-elements as fields in
the dict. Relationships are initially
stored as lists of the string idref
values. Then at the end of the parsing method, once
I have all the objects loaded into an ID map, I
resolve the references to the actual
dict objects. This leaves me with a
"Pythonic" graph of the data rather which is easier
to work with (IMHO) for querying and
processing.
def parse_things_xml(database): """Parse object nodes of from of XML object elements and save in dicts. Parses attribute elements for dict fields and follows parent/children relationship elements to link up related nodes. Returns a list of root nodes (those without parents). """ def parse_relationship(relationships, name): """Parse the specified relationship out of the list of relationships. Assumes there is only one relationship of the specified name. """ rels = [r for r in relationships if r.attributes['name'].value == name] if not rels: return None idrefs = rels[0].attributes.has_key('idrefs') and \ rels[0].attributes['idrefs'].value.split(' ') or [] return idrefs xmldoc = minidom.parse(database) objects = xmldoc.getElementsByTagName('object') # parse each object element into a dict storing attribute # child elements as dict fields and child relationship # elements as lists of idref values node_map = {} for obj in objects: node = {'id': obj.attributes['id'].value, 'type': obj.attributes['type'].value} attrs = obj.getElementsByTagName('attribute') for attr in attrs: val = attr.hasChildNodes() and attr.firstChild.data or None node[attr.attributes['name'].value] = val rels = obj.getElementsByTagName('relationship') for rel in rels: relname = rel.attributes['name'].value node[relname] = parse_relationship(rels, relname) node_map[node['id']] = node # resolve idrefs by replacing their value with a reference to the # actual dict for node in node_map.values(): for relname, idrefs in node.items(): if isinstance(idrefs, list): relnodes = [node_map[idref] for idref in idrefs] if relnodes and relnodes[0].has_key('index'): relnodes.sort(key=lambda n: n['index']) node[relname] = relnodes return node_map.values()
I also have a method to query the object graph
based on some simple selection criteria and then a
''very'' simplistic printing routine to print out
the results. The code is a little rough, but it's a
start. I've organized it into two files:
thingslib.py which does the parsing
and querying, and things.py which is a
script that you can call to select the items of
interest and print them out. The code is available
from my download page.