24 days of Perl code from RJBS! Feed

Putting less on Your Wish List

fewer - 2010-12-11

One of Perl's lesser Known Features

I do a lot of work with email, including a lot of message rewriting. Last year, I wrote about my favorite message-rewriting helper, but this year I'll write about something somewhat more removed.

Although quite a lot of my mail-handling uses Email::MIME or Email::Simple, the most heavily-munged mail goes through the venerable MIME::Entity, which has much more sophisticated facilities for things like "message body stored on disk." When you're going to build email messages with huge attachments, and then rewrite them, this is pretty important.

Sometimes, though, we know we're going to deal with very small messages, and using the disks to store MIME content will just grind on IO. For example, we might have a program that does something like this:


1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 

 

my $parser = MIME::Parser->new;

for my $file ( File::Find::Rule->file->in('Maildir') ) {
  ...
  my $entity = $parser->parse_open( $file );

  MIME::Visitor->rewrite_parts($entity, sub { ... });
}

 

By default, this will produce temporary files in /tmp for the message and temporary working data. This is often (perhaps surprisingly) much faster than working in memory, because disk IO is native, while Perl's file-in-memory IO is implemented in (relatively) slow Perl. So, while the default behavior may be faster, it's also more expensive, at least as regards disk IO. If, like me, you often find yourself bound by disk operations, you might want a way to switch your programs into a mode that will use less disk IO operations, or "iops."

This is easy, first we update our program to have the optional code path:


1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
10: 
11: 
12: 
13: 

 

my $parser = MIME::Parser->new;

if ( less->of( 'iops' ) ) {
  $parser->output_to_core(1);
  $parser->tmp_to_core(1);
}

for my $file ( File::Find::Rule->file->in('Maildir') ) {
  ...
  my $entity = $parser->parse_open( $file );

  MIME::Visitor->rewrite_parts($entity, sub { ... });
}

 

Now, most of the time that path will not be entered. The call to less->of will usually return false. To make it true, we would need to add a line like this to the top of the program:


1: 
 

use less 'iops';
 

We don't want to edit our program every time, though -- but we're not out of luck. Instead, we can invoke it in one of two ways. The first is the normal way, which will produce lots of disk IO:

  $ perl scan-all-mail

If we want less IO, we run:

  $ perl -Mless=iops scan-all-mail

The -M flag to perl basically injects that use line to the top of our program, causing the less->of call to return true, and all message handling to be done in memory.

Disregard that, this is less 'useful'

less has long been regarded as a joke module, which did nothing until perl 5.10.0. In that release, it became something of a demonstration of how to write a lexical pragma that looks at the "hints hash" found in the 11th (!) element returned by caller. From the outside, it actually looked useful, as seen above.

Unfortunately, less is just the sort of pragma where dynamic, rather than lexical scope would be useful. We want to put that parser-optimizing code down in some "give me a new MIME parser" library that anything can call, getting a usefully-optimized parser based on what the programmer has asked to use less of most recently -- not just what he asked to use less of in the lexical scope calling of.

With that limitation in place, can you use less for anything useful? Possibly, but in the end, less is less useful than less could be if less used less lexicality.

Wait, RJBS didn't write less!

That's true. I didn't write less, and I don't think I can easily fix it given its existing interface. I did fix another big problem with it, though.

This code snippet looks great:


1: 
2: 
3: 

 

use less 'cpu';

process_lots_of_files( @filenames );

 

Or this one:


1: 
2: 
3: 

 

use less 'memory';

process_lots_of_files( @filenames );

 

This one, though, is just unbearable:


1: 
2: 
3: 

 

use less 'filehandles';

process_lots_of_files( @filenames );

 

It's not that I mind the idea of trying to minimize open filehandles. That can be a reasonable optimization, sometimes. The problem is that filehandles are countable things, and you don't use "less" with countable objects. The word "less" is for things that cannot be counted. For example:

This advent calendar entry has less merit than many others.

...but...

Rik deserves fewer words of praise today than he did yesterday.

If you are running perl 5.12.0 or newer, you can fix the nasty English suggested by the previous code snipped by installing fewer and writing:


1: 
2: 
3: 

 

use fewer 'filehandles';

process_lots_of_files( @filenames );

 

I hope this is of use.

See Also