Identify a Nap

Guy picked up a bug ticket. There was a Hiesenbug; sometimes, saving a new entry in the application resulted in a duplicate primary key error, which should never happen.

The error was in the message-bus implementation someone else at the company had inner-platformed together, and it didn't take long to understand why it failed.

/**
 * This generator is used to generate message ids.
 * This implementation merely returns the current timestamp as long.
 *
 * We are, thus, limited to insert 1000 new messages per second.
 * That throughput seems reasonable in regard with the overall
 * processing of a ticket.
 *
 * Might have to re-consider that if needed.
 *
 */
public class IdGenerator implements IdentifierGenerator
{

        long previousId;
       
        @Override
        public synchronized Long generate (SessionImplementor session, Object parent) throws HibernateException {
                long newId = new Date().getTime();
                if (newId == previousId) {
                        try { Thread.sleep(1); } catch (InterruptedException ignore) {}
                        newId = new Date().getTime();
                }
                return newId;
        }
}

This generates IDs based off of the current timestamp. If too many requests come in and we start seeing repeating IDs, we sleep for a second and then try again.

This… this is just an autoincrementing counter with extra steps. Which most, but I suppose not all databases supply natively. It does save you the trouble of storing the current counter value outside of a running program, I guess, but at the cost of having your application take a break when it's under heavier than average load.

One thing you might note is absent here: generate doesn't update previousId. Which does, at least, mean we won't ever sleep for a second. But it also means we're not doing anything to avoid collisions here. But that, as it turns out, isn't really that much of a problem. Why?

Because this application doesn't just run on a single server. It's distributed across a handful of nodes, both for load balancing and resiliency. Which means even if the code properly updated previousId, this still wouldn't prevent collisions across multiple nodes, unless they suddenly start syncing previousId amongst each other.

I guess the fix might be to combine a timestamp with something unique to each machine, like… I don't know… hmmm… maybe the MAC address on one of their network interfaces? Oh! Or maybe you could use a sufficiently large random number, like really large. 128-bits or something. Or, if you're getting really fancy, combine the timestamp with some randomness. I dunno, something like that really sounds like it could get you to some kind of universally unique value.

Then again, since the throughput is well under 1,000 messages per second, you could probably also just let your database handle it, and maybe not generate the IDs in code.

[Advertisement] Keep the plebs out of prod. Restrict NuGet feed privileges with ProGet. Learn more.

Identify a Nap

Featured Comments