"Magic bytes" are a common part of a file header. The first few bytes of a file can often be used to identify what type of file it is. For example, a bitmap file starts with "BM", and a PGM file always starts with "PN" where "N" is a number between 1 and 6, describing the specific variant in use, and WAV files start with "RIFF".
Many files have less human-readable magic bytes, like the ones Christer was working with. His team was working on software to manipulate a variety of different CAD file types. One thing this code needed to do is identify when the loaded file was a CAD file, but not the specific UFF file type they were looking for. In this case, they need to check that the file does not start with 0xabb0
, 0xabb1
, or 0xabb3
. It was trivially easy to write up a validation check to ensure that the files had the correct magic bytes. And yet, there is no task so easy that someone can't fall flat on their face while doing it.
This is how Christer's co-worker solved this problem:
const uint16_t *id = (uint16_t*)data.GetBuffer();
if (*id == 0xabb0 || *id == 0xABB0 || *id == 0xabb1 || *id == 0xABB1 || *id == 0xabb3 || *id == 0xABB3)
{
return 0;
}
Here we have a case of someone who isn't clear on the difference between hexadecimal numbers and strings. Now, you (and the compiler) might think that 0xABB0
and 0xabb0
are, quite clearly, the same thing. But you don't understand the power of lowercase numbers. Here we have an entirely new numbering system where 0xABB0
and 0xabb0
are not equal, which also means 0xABB0 - 0xabb0
is non-zero. An entirely new field of mathematics lies before us, with new questions to be asked. If 0xABB0 < 0xABB1
, is 0xABB0 < 0xabb1
also true? From this little code sample, we can't make any inferences, but these questions give us a rich field of useless mathematics to write papers about.
The biggest question of all, is that we know how to write lowercase numbers for A-F
, but how do we write a lowercase 3?