24 days of Perl code from RJBS! Feed

Exporting Globs for Fun and Profit

Sub::Exporter::GlobExporter - 2010-12-19

What's a glob?

A glob (or typeglob) is a kind of scalar in Perl. It is a weird, pretty horrible thing. There really isn't any need for them in Perl 5, but instead of being jettisoned, typeglobs were enshrined in the construction of packages and classes. Anyway, that doesn't really explain what a glob is.

A glob is a scalar that may contain a reference to one variable of each type: scalar, array, hash, code, I/O handle, and format. In Perl, packages (and therefore classes) are just hashes with identifiers as keys and globs as values. So, given the following program:


1: 
2: 
3: 
4: 
5: 
6: 
7: 

 

package main; # we would have been in "main" anyway, but let's be explicit
use 5.12.0;

our $x = 1;
our @x = qw(foo bar baz);

sub x { say "I'm in a glob!" }

 

This program sets up a typeglob in $main::{x} with references to a scalar, a hash, and code so that we can get the following results in the debugger:


1: 
2: 

 

  DB<1> x $main::{x}
0 *main::x

 

We've got a glob there, just as we said.


1: 
2: 

 

  DB<2> x *{ $main::{x} }
0 *main::x

 

The * isn't a dereference, because we don't have a glob reference. It just means we want to use it as a glob.


1: 
2: 
3: 
4: 
5: 
6: 

 

  DB<3> x ${ $main::{x} }
0 1
  DB<4> x @{ $main::{x} }
0 'foo'
1 'bar'
2 'baz'

 

We can dereference the glob as a scalar or array (or anything else) to dereference the scalar or array (or whatever) in the slot for that type in the glob. We can also access those slots as if the glob was a hash; if we do that, we see the reference itself:


1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
10: 
11: 
12: 
13: 
14: 

 

  DB<5> x *{ $main::{x} }{ARRAY}
0 ARRAY(0x1008ad4d8)
   0 'foo'
   1 'bar'
   2 'baz'
  DB<6> x *{ $main::{x} }{SCALAR}
0 SCALAR(0x100829658)
   -> 1
  DB<7> x *{ $main::{x} }{CODE}
0 CODE(0x10083b518)
   -> &main::x in -:7-7
  DB<8> x *{ $main::{x} }{CODE}->()
I'm in a glob!
0 1

 

It gets weirder, though. When we said *$x above, we said we weren't dereferencing, because there was no reference. What if there is?


1: 
2: 
3: 
4: 
5: 

 

use strict;

our $x = "This is X!";

our $y = \*x;

 

Then, in the debugger...


1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
10: 
11: 

 

  DB<1> x $x
0 'This is X!'

  DB<2> x *x
0 *main::x

  DB<3> x ${ *x }
0 'This is X!'

  DB<4> p ref $y
GLOB

 

None of this is very surprising, so far. What do we do to get the string out of $y, though? Well, $y is a GLOB ref, and a GLOB is a kind of scalar, so we can dereference it with a dollar sign:


1: 
2: 
3: 
4: 
5: 

 

  DB<7> x $y
0 GLOB(0x1008297a8)
   -> *main::x
  DB<8> x $$y
0 *main::x

 

...and if you want to get at the string, we know we can scalar-dereference a glob to get its scalar entry:


1: 
2: 

 

  DB<11> x $$$y
0 'This is X!'

 

And just to make sure there are more ways to do it, we can use a * instead of a $ for some of these dereferences.


1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 

 

  DB<15> x $y
0 GLOB(0x1008297a8)
   -> *main::x
  DB<16> x *$y
0 *main::x
  DB<17> x *{$y}{SCALAR}
0 SCALAR(0x100829700)
   -> 'This is X!'
  DB<21> x ${ *$y }

 

...and that all makes sense. Then there's:


1: 
2: 
3: 
4: 

 

0 'This is X!'
  DB<22> x $*$y
$* is no longer supported at (eval 26)
  [/Users/rjbs/perl5/perlbrew/perls/perl-5.12.2/lib/5.12.2/perl5db.pl:638] line 2.

 

Oh. Right, because there used to be a $* magic variable, we need to be clearer about our line noise:


1: 
2: 

 

  DB<23> x ${ *$y }
0 'This is X!'

 

That Looks Really Dumb

Well, it is -- and the above is really just the goofy syntax around globs. In case you're wondering: yes, they do have goofy semantics as well. So, why would you ever go out of your way to deal with globs?

There are basically two reasons. The first is simple: you install subroutines in packages at runtime by putting code into globs. I wrote about this last year. This is how Sub::Exporter and Exporter work.

The other reason is more complicated. Globs in symbol tables are public, globally shared variables, so you can use them to share a variable everywhere. For example:


1: 
 

*Target::variable = \$Source::variable;
 

This makes $variable in Target an alias to $variable in $source.

You could do pretty easily with a reference, though:


1: 
2: 
3: 
4: 
5: 
6: 
7: 

 

package Christmas;

sub wishlist {
  state %wishlist;

  return \%wishlist;
}

 

...and then anywhere else in the program:


1: 
 

Christmas->wishlist->{ $person } //= [ ... ];
 

That gets us a shared variable with much tighter control over the interface than just a global variable name. So why would we do it with globs? There's one really good reason: localization. If we don't just alias one part of the glob (like we aliased the scalar part above), but assign the whole glob, then we can use Perl's local keyword to localize values within a dynamic scope.

Dynamic scope lets you save the value of a global variable and overwrite it, and when you exit the current block the old value is restored. The usual (and excellent) document explaining scoping in Perl, including local, is Mark Jason Dominus's Coping with Scoping. If you're not clear on what dynamic scoping is, you should read it.

If you are clear, then what you need to know is that if you localize an imported variable, but you didn't import the whole glob, other things that imported that variable will not see your changes. Let that sink in.

Here's an example of how importing variables can make localization fail:


1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 

 

{
  package Source;
  our $variable = 10;
  BEGIN {
    *User_1::variable = \$Source::variable;
    *User_2::variable = \$Source::variable;
  }
}

{ package User_1; sub var { $variable } }

{
  package User_2;

  say $variable; # 10, as expected
  say User_1->var; # 10, as expected

  local $variable = 20;

  say $variable; # 20, as expected
  say User_1->var; # 10, no good!
}

 

Because we only imported the scalar part of *Source::variable, it isn't localized the way we might expect. If we replace the $ on lines 5 and 6 with * then the final two say statements will both print 20.

I Never Want to Think About This Again!

This isn't the kind of thing you need to think about very often, and once you've been forced to think about it enough, you might realize that you don't want to think about it again. One way to do that is to never have shared, localized variables. That's kind of a bummer, because it's a useful thing to do. Another tactic would be to always use fully-qualified variable names to localize them. That's a bummer, too, because local names are both easier to type and a layer of indirection in front of the real variable. That is: if you only use your package's $variable, and later you want to make it point to $Other::variable instead of $Source::variable, it's only one line to update. If you had to type out $Source::variable each time, it's a much hairier problem.

So, this is a fairly esoteric problem that isn't encountered very often -- but it came up in the construction of two libraries mentioned earlier in this advent calendar: Log::Dispatchouli::Global and Global::Context. To avoid thinking about the grotty details of the problem, I built a library abstract the idea of a exported, shared glob.

Sub::Exporter::GlobExporter

Finally, today's actual code!

Here's the Sub::Exporter setup for Global::Context, which exports a shared global:


1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
10: 
11: 

 

use Sub::Exporter -setup => {
  exports => [
    ctx_init => \'_build_ctx_init',
    ctx_push => \'_build_ctx_push',
  ],
  collectors => {
    '$Context' => Sub::Exporter::GlobExporter::glob_exporter(
      Context => \'common_globref',
    )
  },
};

 

ctx_init and ctx_push are set up like plain old exports that you might see in any other use of Sub::Exporter. It's $Context that's more interesting. Because it doesn't just export a subroutine, it has to be a "collector," which is Sub::Exporter jargon for "some weird hunk of behavior." When someone uses $Context in their import statement, like this:


1: 
 

use Global::Context '$Context';
 

...then the glob_exporter routine kicks in. It's been passed two arguments. One is the name under which to install the exported glob. We used Context, so it will be installed in *Context. The other is a reference to a name of a method that will return the globref to install. In other words, the above import will result something like this happening:


1: 
2: 
3: 

 

my $globref = Global::Context->common_globref;

*Importer::Context = *$globref;

 

If we wanted, we could have done this:


1: 
2: 
3: 
4: 
5: 
6: 

 

use Global::Context '$Context' => { -as => 'ctx' };

# ...which would have meant:

my $globref = Global::Context->common_globref;
*Importer::ctx = *$globref;

 

That's how we can pick a different name in our package for a global that goes by another name elsewhere. This isn't the big benefit, though. The big benefit is that because common_globref is a method, subclasses of Global::Context can provide a different globref. We could have two different global contexts that would never interfere with one another by writing the following code:


1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
10: 
11: 
12: 
13: 
14: 

 

{
  package Context::Alpha;
  use parent 'Global::Context';
  sub common_globref { \*Context }
}

{
  package Context::Bravo;
  use parent 'Global::Context';
  sub common_globref { \*Context }
}

use Context::Alpha '$Context' => { -as => 'ctx_a' };
use Context::Bravo '$Context' => { -as => 'ctx_b' };

 

We'd end up with $ctx_a and $ctx_b, both properly shared with other things using the Alpha or Bravo contexts, and neither interfering. In other words, by using glob_exporter, we write libraries that use global variables but less strictly limited in terms of reuse.

See Also