2014-09-30

Posted on September 30, 2014

Time to start looking at the other end of fetch_mref(): locating an mref in a STD11 input message. I’ve already made one wrong turn in my sketch, because we want to use mmap() rather than read() if possible. (Obviously at some point we’ll need to handle messages coming from standard input, but that’s not crucial yet.)

And a second wrong turn: I’ve been using NUL-terminated strings, but using a string and a length makes way more sense. For instance, I don’t need to allocate memory for the result when searching a STD11. Although, I have thought that in a STD11 Mref: header, spaces should be allowed after the commas, and at some point we’ll have to strip those out. The simplest way by far to do that, though, would be make the fields a collection of pointers and lengths (or offsets and lengths, doesn’t really matter). That bumps the size of struct mref up from 64 bytes to 112 bytes which seems a lot. I could have a single base pointer, and store offsets as shorts. Then I could have a single base pointer, a length, and 12 field offsets, for a total of 34 bytes. That seems entirely more reasonable.

Now, with a pair of offsets for each field, we can easily handle spaces in between fields, without having to copy stuff around. The only minor snag is, we can’t then simply hand a single chunk of memory to the hash function for the mref hash. But we could just hand it each field individually. That, however gives me the idea of hashing just the fields, without the commas.

Although that appeals, it’s not right (and not merely because it would break all existing mrefs: there aren’t so many of them in the scheme of things!). It would mean that there are different mrefs with the same hash! I don’t think I want to build collisions into the system at the moment. So, mrefs stay as they are, and libmref will just have to write each field, followed by a comma, to the hash function.

All that can wait though; for now, I need to get back to locating mrefs in STD 11 messages.