Wanted: Gift, Got: Gift
The Case of the Invisible Differences
I've seen this kind of output from test programs more than I'd like to remember:
1: | not ok 6172 - sent expected message |
You'll stare at that for a long time before having the clever idea of piping the test through cat -A
-- only to find that you're only expecting a carriage return and newline, but you're getting a newline. After the hundredth time, you learn to bring cat -A
into play as soon as you see this kind of weird test result, and lesson helps. The problem is that it doesn't help when diagnosing CPAN Testers reports from your code running on SqueeOS True56, a platform that you've never heard of and that nobody knows the line endings for. What's going on with the output? You can't say, so you contact the smoker and ask him to re-run the tests and tell you what the bytes in question are, but it's a pretty lousy approach. The solution is to fix the diagnostic messages to show you everything that might be relevant.
Test::BinaryData replaces the usual "have X, want Y" diagnostic with output inspired by xxd
. The above test would have produced:
1: | not ok 6172 - sent expected message |
It's pretty obvious, here, what happened. There's an extra byte in the "have," and a quick glance at the compared lines shows that the bogus byte is an 0x0d
before the shared 0x0a
.
If we've got a bunch of identical content before the difference, we get leading content for context:
1: | # have (hex) have want (hex) want |
The column between the have and want signs shows you the relationship of the two subsequences: either they are equal (=) or they are not (!).
Comparing Encoded Text
Originally, I wrote this module exclusively for comparing line endings. It had obvious applications for testing the results of encoding text strings, but I didn't know much about encoding at the time. Once I became further mired in encoding issues, though, I came back to Test::BinaryData and found it really useful. For example, why did performing the same operation from different operating systems fail, with no visible differences?
1: | # have (hex) have want (hex) want |
Both encode the name of Queensrÿche, complete with heavy metal umlaut, but the byte sequences differ. The first is LATIN SMALL LETTER Y WITH DIAERESIS, but the second is a combining sequence: LATIN SMALL LETTER Y, followed by COMBINING DIAERESIS.
These forms are identical when read, but of course computers will see them differently, and this can cause really bizarre, awful bugs. As a side note, this kind of bug led me to find Unicode::Normalize, which is something any programmer dealing with "funny characters" should know about.
Test::BinaryData is meant to compare byte strings, and will reject any comparison when either side of it contains "wide characters" -- a sign that the content is probably a non-encoded character string. And anyway, cramming a four-character "byte" into the diagnostic display wouldn't work!
See Also
perlunitut and Unicode::Normalize - for learning to do Unicode right
Test::Differences and Test::Diff - more "like
is
but better" asserts