24 days of Perl code from RJBS! Feed

Option Processing -- This Time, Invented Here!

Getopt::Long::Descriptive - 2010-12-08

Everybody's Favorite Getopt

This isn't everybody's favorite getopt. It's my favorite getopt, and it might be the favorite of some other people, but the problem with getopt libraries is that they can be optimized for so many different needs... but most of the time no optimization is needed, so we each get attached to the library we're using most often, forgetting that there are lots of reasons to choose differently.

Well, I'm not here to tolerate other people's libraries. The time for tolerance is past. I'm going to talk about my favorite getopt library. In what may be a first for the RJBS Advent Calendar, it's a library I didn't write. It was written by my former co-worker Dieter, and I remember very clearly how happy I was when I started that the "which getopt?" question had not only been answered already, but had been answered with a library that I liked using.

Our Optimizations

So, if all getopt libraries make tradeoffs, what are ours? We wanted it to be very easy to read and write option specifications, and we needed the command to get a usage message that was always correct and up to date with the usage message. Our non-technical staff all use a Unix shell, and run a lot of command-line programs. They need to have useful help messages to forestall questions about how to use things.

To use Getopt::Long::Descriptive (aka GLD), you basically describe the options you'd like to accept and get back the options parsed from @ARGV and an object representing the "usage message." Here's a fairly complete program using GLD:


1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
33: 
34: 

 

my ($opt, $usage) = describe_options(
  '%c %o recipient ...',

  [ 'template|t=s', "the HTML template for the card", { default => 'card.html' } ],
  [ 'from|f=s', "the sending address", { required => 1 } ],
  [],
  [ "text-mode" => hidden => {
    required => 1,
    one_of => [
      [ "html-only" => "send no plaintext part" ],
      [ "autotext" => "generate plaintext from HTML" ],
      [ "text-tmpl=s" => "filename for a separate plaintext template" ],
      ],
  } ],
);

my @rcpts = @ARGV;
$usage->die({ pre_text => "no recipients given!" }) unless @rcpts;

my $html_template = slurp( $opt->template );
my $text_template = $opt->text_mode eq 'html_only' ? undef
                  : $opt->text_mode eq 'autotext' ? textify($html_template)
                  : $opt->text_mode eq 'text_tmpl' ? slurp( $opt->text_tmpl )
                  : ... ;

for my $rcpt (@rcpts) {
  send_card({
    html => render( $html_template, $rcpt ),
    text => render( $text_template, $rcpt ),

    to => $rcpt,
    from => $opt->from,
  });
}

 

Some of this is obvious, especially if you know Getopt::Long, but have another slow read through the code, and then we'll walk through it piece by piece.

describe_options


1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
10: 
11: 
12: 
13: 
14: 
15: 

 

my ($opt, $usage) = describe_options(
  '%c %o recipient ...',

  [ 'template|t=s', "the HTML template for the card", { default => 'card.html' } ],
  [ 'from|f=s', "the sending address", { required => 1 } ],
  [],
  [ "text-mode" => hidden => {
    required => 1,
    one_of => [
      [ "html-only" => "send no plaintext part" ],
      [ "autotext" => "generate plaintext from HTML" ],
      [ "text-tmpl=s" => "filename for a separate plaintext template" ],
      ],
  } ],
);

 

describe_options doesn't really just describe the options. It parses the command line arguments by reading @ARGV and returns an object describing the switches that were given ($opt) and another object that can be used to print out a usage message in case of error. If @ARGV can't be parsed legally, that message is printed and the program dies, something like this:

  send-holiday-card [-ft] [long options...] recipient ...
    -t --template     the HTML template for the card
    -f --from         the sending address
  
    --html-only       set no plaintext part
    --autotext        generate plaintext from HTML
    --text-tmpl       filename for a separate plaintext template

The first argument to describe_options produces the first line of the usage message. It uses sprintf-like formatting, letting you get the name of the script and a quick summary of available options into the usage message.

After that, everything describes options. Each argument is an arrayref with up to three entries:


1: 
 

[ $switch, $description, \%options ]
 

The contents of $switch value might look familiar. It's just a Getopt::Long-style description of the switch. GLD is not so ambitious as to try to implement its own argument processor! It's just a layer on top of Getopt::Long and some other tools. So, every $switch value can have a few names, separated by pipes. The first name (with some mild munging, like s/-/_/, becomes the name of the accessor on $opt. That's why later we can call $opt->from to get the value given for --form.

The $description field is just the description for the option in the usage message. The description "hidden" means that an option isn't displayed in the usage message, and an entirely empty option, like [], by the way, means "put a blank line in the usage message."

%options, which is optional, changes how the switch is interpreted. It can take a bunch of options, but the most common are required and default, which change what happens if no value was given. The value can also be validated by using Params::Validate arguments.

The most interesting kind of option is one_of, as seen in:


1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 

 

  [ "text-mode" => hidden => {
    required => 1,
    one_of => [
      [ "html-only" => "send no plaintext part" ],
      [ "autotext" => "generate plaintext from HTML" ],
      [ "text-tmpl=s" => "filename for a separate plaintext template" ],
      ],
  } ],

 

one_of is usually used in combination with hidden, and creates a sort of virtual option. $opt->text_mode will tell us which of the sub-options was used, and then we can check that option's value. Only one of the sub-options can be specified. That's why we could safely write this:


1: 
2: 
3: 
4: 

 

my $text_template = $opt->text_mode eq 'html_only' ? undef
                  : $opt->text_mode eq 'autotext' ? textify($html_template)
                  : $opt->text_mode eq 'text_tmpl' ? slurp( $opt->text_tmpl )
                  : ... ;

 

one_of options are useful for establishing mutually exclusive kinds of operation, and were originally set up to create something like a "run mode." There'd be a hidden mode option, and the user would pick one_of "delete" or "add" or "list," for example. Our use of virtual options for that has faded after the creation of App::Cmd, a framework for slightly more complicated applications. App::Cmd helps make command-line applications easy to write, but it isn't so ambitious as to try to provide its own getopt implementation: to use App::Cmd properly, you need to first understand Getopt::Long::Descriptive.

See Also