24 days of Perl code from RJBS! Feed

Like <DATA>, Only Less Stupid

Data::Section - 2009-12-09

First, An Alternative

Data::Section was actually written when I realized I couldn't use Inline::Files. I'd wanted to use that ages ago, but it had a number of little problems. The biggest problem, unfortunately, was that it just tried to be way, way too powerful.

I like overpowered modules as much as the next guy, but here's about where I draw the line: Inline::Files is a form of source filter, and includes this warning:

It is possible that this module may overwrite the source code in files that use it. To protect yourself against this possibility, you are strongly advised to use the -backup option described in "Safety first".

The DATA Section

Data::Section (and Inline::Files) are meant to make the __DATA__ section both more powerful and more reliable. For those of you who haven't encountered it, Perl lets you access a virtual filehandle that reads non-code content at the end of your program file. In other words, if you write this program:


1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
10: 
11: 
12: 
13: 
14: 

 

my $total = 0;
while (<DATA>) {
  chomp;
  printf "Got number: %u\n", $_;
  $total += $_;
}

printf "Total: %u\n", $total;

__DATA__
2
4
6
8

 

...then your program would output:

  Got number: 2
  Got number: 4
  Got number: 6
  Got number: 8
  Total: 20

Why Data::Section Beats DATA

No Need to seek

There are a few problems with DATA, though. The most annoying is that you can't easily seek on it. Its read position is actually relative to the whole source file, so you'd need to try to tell it first, then save that, and things get messy.

Data::Section caches your data for you, so you can reread it any time you want.

Multiple "Files" Per Module

The DATA filehandle is one big entity, but Data::Section lets you provide multiple virtual files in your data:


1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 

 

package Your::Package;
use Data::Section -setup;

my $sections = Your::Package->merged_section_data;

for my $filename (keys %$sections) {
  printf "== %s ==\n%s\n", $filename, Your::Package->section_data($filename);
}

__DATA__
___[ so-far.txt ]___
Sub::Exporter
Number::Nary
Email::Pipemailer::DieHandler
Pod::Coverage::TrustPod
Games::Bowling::Scorecard
__[ coming-up.txt ]__
Mixin::Linewise
App::Cronjob
String::Formatter
Data::Section
Email::MIME::Kit

 

You'll get two "filenames" and each one will have five lines in it.

(By the way, that use Data::Section line? Yeah, that's all you have to do. Obviously, Data::Section uses Sub::Exporter, so you can rename its methods, and there are a few parameters you can use to customize them.)

Inherited Files

If you're wondering what's "merged" in the merged_section_data method, it's inherited files. In other words, if you wrote a subclass of the Your::Package code above, you could add new files in its DATA section, but the old files would still be accessible -- unless you added a so-far.txt definition to your subclass. It behaves very much like an overridden method, in that way.

Practical Uses

When is this useful? It's great to use instead of here-docs for a lot of cases. It's also a nice way to avoid having to install and manage text files. A package can store its own bundle of configuration or template files, and it will have methods to access them. Then if you want, you can specialize your pack of template files by subclassing the original one.

Even without all its new features, the simple ability to never re-seek DATA is a big one. (Please note that Data::Section caches the contents, it doesn't magically take care of re-seeking. That ends up being a real problem.) DATA is a global variable, and if code ever tries to read from it more than once, the second time will fail, and that is a really annoying bug to track down -- trust me, I've had to do so many times. So, forget about plain old DATA sections and start using Data::Section. Your future self will thank you.

See Also