In many places, driver's licenses work on a point system. As you commit infractions, you gain or lose points, when your point score hits a certain threshold, your insurance company raises your rates or you may even lose your driver's license. Where Christopher Walker lives, you start with twelve points, and each infraction takes a few away. Once a year, you have the option to attend a workshop on safe driving, where you can then regain a few of those points.

It's complicated and tedious, so several organizations, from the local department of motor vehicles to various insurance companies, have set up systems to manage this information. One of those organizations built a PHP application about fifteen years ago, and it gradually grew in kruft and complexity and confusion from that point forward. It works, but it's entirely unmaintainable.

So Christopher was hired to help upgrade it to something hopefully supportable. It's still in PHP, but it's redesigned to use some best practices, switch to Laravel as its framework, and basically be as modular and component-oriented as possible.

The real challenge was porting the existing data into the new system. The old schema was a mess. The "simple" problems were all around the fact that once upon a time the database only used ASCII, but was eventually upgraded to use UTF-8, but however that was done made it so that many characters like 'é' got mangled into '‡' or '§'.

But all of that was nothing compared to the problems updating the revision history tables. The other developers had given up on the revision/audit history many years ago. Instead of providing detailed reports, they simply displayed "[username] changed this participant."

The application tracked an audit log, and it was fairly thorough. At first glance, it even seemed pretty sensible. It had a timestamp, an action code (like "USRUPDATE" or "USRCREATE"), a "detailsaction" which contained what appeared to contain the new value of a modified field, and then a "request" which just seemed to log the raw SQL run to alter the table. That last one didn't seem important, so Christopher went ahead and started porting the old table to the new database.

That's when Christopher hit the first speed bump. Some of the records were sane, comprehensible audit logs. Some of them simply weren't. For some of them, the audit fields conveyed no information. For others, you needed to look at the request field and try and reconstruct what happened from the raw SQL. Except that was easier said than done: many of the queries in the audit log referenced tables and fields which no longer existed, or had been renamed at some point. By combing through the huge pile of data, Christopher was able to determine that there were only about 20 different ways those queries got deprecated, so it wasn't too hard to come up with a script that could translate them into the new architecture.

The other unusual edge case were that instead of storing SQL in the field, many stored a condensed array representing the row that was altered, like:

a:23:{s:14:"participantsid";i:123456;s:5:"titre";s:8:"Monsieur";s:3:"nom";s:5:"[LAST_NAME]";s:6:"nom_jf";s:0:"";s:6:"prenom";s:6:"[FIRST_NAME]";s:10:"profession";s:1:"0";s:14:"naissance_date";s:10:"xxxx-xx-xx";s:14:"naissance_lieu";s:15:"STRASBOURG (67)";s:8:"adresse1";s:20:"[REDACTED]";s:8:"adresse2";s:0:"";s:11:"code_postal";s:5:"12345";s:5:"ville";s:9:"[REDACTED]";s:4:"tel1";s:14:"[REDACTED]";s:4:"tel2";s:0:"";s:5:"email";s:24:"[REDACTED]@gmail.com";s:6:"membre";s:0:"";s:15:"immatriculation";s:0:"";s:2:"ac";s:3:"NON";s:12:"permisnumero";s:12:"[REDACTED]";s:10:"permisdate";s:10:"2019-01-21";s:10:"permislieu";s:9:"PREFET 67";s:8:"remarque";s:0:"";s:14:"naissance_pays";s:0:"";}

That wasn't terrible to manage, aside from the fact that the dumps didn't actually reference existing tables and fields. Christopher could figure out what the replacement tables and fields were and map the data back to actual audit log entries.

That got Christopher 90% of the way there. But 90% isn't all the way, and the last ten percent was going to take a lot more time. Or perhaps was going to be impossible to do. Because the remaining audit log records stored queries that had nothing to do with the entity that was changed. Many of them weren't even modification statements.

For example, the audit log entry that seemed to be about updating a workshop's status from "active" to "cancelled" was purportedly done by this query: SELECT lieux.departement FROM lieux JOIN stages ON stages.lieuxid = lieux.lieuxid WHERE stages.types = 'PAP' AND stages.stagesid ='123456'.

Christopher summarizes:

I don't know who decided that this was a good idea or even that this made sense, but I do understand why one of the previous developers of the app decided that "[username] changed this participant." was going to be the only info given in the revisions history.

[Advertisement] Utilize BuildMaster to release your software with confidence, at the pace your business demands. Download today!