breaking schema changes

Posted on March 30, 2016

It’s been a long time since I wrote any prose, for reasons which will become clear. Last time, I’d just invented a table to express the many-many relation between folders and messages. I was considering adding owners to this table, so I could express in the schema the one folder per message per user restriction:

  folder FolderId
  message MessageId
  owner UserId
  OwnerMessage owner message

Imagine my surprise when persistent rejected this definition, because:

Multiple declarations of ‘OwnerMessage’

I’d clean forgotten about the Access table. When I was first putting flare together, I had decided that there would be messages and tags, and that would be all. This simplicity had some advantages, for example I didn’t spend a lot of time tweaking the schema before I knew what I wanted. On the other hand, the code to impose meaning on this unstructured tangle was heroic and fragile. If I recall correctly, I did briefly have access times as tags too, but quickly realised that it was crazy to have a timestamp type that SQL did not understand. Hence:

  owner UserId
  message MessageId
  time UTCTime
  OwnerMessage owner message

Clearly the FoMe and Access tables were crying out to be combined. And this is the answer to the riddle at the end of my last post. Not to alter the Message table: Message stores the basic data about any message, independently of whether it belongs to a particular user, several or none, and is a fine abstraction. Instead, we need to add a new table which models the many-many relation between Messages and Users. Inspired by the fine sounds of Matt Bellamy and co, I named this table Muse.

I quickly realised that by adding a few more columns to Muse, it could subsume all the typed tag nastiness, as well as Access and FoMe. At present, the table is defined thus:

  owner UserId
  message MessageId
  folder FolderId
  access UTCTime
  isRead Bool
  class Text Maybe -- Nothing implies not yet classified
  surety Int
  trained Bool
  OwnerMessage owner message

(I’m slowly getting better at designing persistent schemata. For example, I initially made access nullable, with null representing a message that had never been accessed. But the code becomes rather cleaner if access is initially set to the time of a message (or rather muse) creation, and there is a separate isRead flag.)

The fields class, surety, and trained are a reworking of the former TagSpam tags to mesh better with bfilter, and generally be more awesome. I should probably write a separate post on this.

The Muse table necessitated a fairly major rewrite of large chunks of code. The whole idea of typed tags has gone, as has the peculiar dbMessage type that made it sane to manipulate a message and all its tags.

As discussed in previous posts, folders now have a real existence, and there are some per-folder options for automatically dealing with old messages. An interesting property of the improved schema is that most of this is handled in pure SQL.

Currently my pace of development is extremely slow, due to family commitments. Still, I got this rewrite done, to the point where I am now using it for my real email, in about 3 months. All in all, I’m very pleased with the new schema and code, and excited about future work on the project.

Oh! And in the last post I expressed the worry that “If we [duplicate the owner id in the Muse and Folder tables], then we again have a condition that SQL cannot enforce.” The condition I had in mind here is that a row in the Folder table must have the same OwnerId as any Muse table rows that reference that Folder row. But I was wrong, this is easy to enforce in SQL:

alter table folder
  add constraint "__manual_unique_for_muse_foreign"
  unique (owner, id);
alter table muse
  add constraint "__manual_muse_folder_owner_fkey"
  foreign key (folder,owner) references folder (id,owner);