Rolling Your Own Method Dispatch

MRO::Magic - 2010-12-24

EXPERIMENTAL CODE AHEAD

Last week, I wrote about the dangerously experimental Global::Context. This week, I'll be talking about the transcendentally experimental MRO::Magic.

Merry Christmas! Perl sucks!

Lately, just a little more than usual, we've been talking at work about the reasons that Perl sucks. That's not because we think that it's a terrible language, or that it's a mistake to use it, or that we don't like it. Let's face it, though: all programming languages suck. Sometimes, they suck in subtle and interest ways, and sometimes they're just absolute stinkers. Most languages that I've dealt with have some big problems and some little problems, and learning about them all is lots of fun.

For a good long while now, I've been feeling pretty strongly that one of Perl's biggest failings, within its own design aesthetic, is that you can't write per-instance methods. That is, whenever you write $thing->method, the method is resolved entirely based on the package associated with $thing (well, or via UNIVERSAL). Either $thing is a package name or it is a reference associated with ("blessed into") a package name. When you call the method on it, something like the following happens:

if there's a subroutine named method in the package, call it
otherwise, check all the packages listed in this package's @ISA, recursing through their @ISAs too
otherwise check in UNIVERSAL
otherwise, try to call any relevant AUTOLOAD (via the search path described in 1-3)
otherwise, throw an exception

So, imagine you had a Christmas::Present class, and you wanted to make an instance of it that would be fatal to open before December 25. You can't do something like this:

`1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12:`	`my $present = Christmas::Present->new({ to => $impatient_relative, from => $self, }); $present->ADD_METHOD(open => sub { my ($self) = @_; die "It isn't Christmas yet!" unless time >= 1293256800; $self->SUPER::open; });`

For one thing, ADD_METHOD doesn't exist. Then there's the fact that SUPER is incredibly tightly bound to packages. To make this work, we need to do something like:

`1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17:`	`{ package Christmas::Present::FatalToPeekers; use base 'Christmas::Present'; sub open { my ($self) = @_; die "It isn't Christmas yet!" unless time >= 1293256800; $self->SUPER::open; }); } my $present = Christmas::Present::FatalToPeekers->new({ to => $impatient_relative, from => $self, });`

This works reasonably well, but only if we know, when writing our code, all the permutations of behavior we might need, so we can make explicit classes. This isn't always acceptable; sometimes we need special behavior to be figured out at runtime, which means we need to do something like this:

`1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:`	`my $i = 0; my $base = 'Anonymous::Class'; sub ADD_METHOD { my ($object, $name, $code) = @_; my $new_package_name = join q{::}, $base, $i++; { no strict refs; @{ $new_package_name . '::ISA' } = (ref $object); *{ $new_package_name . "::$name" } = $code; } bless $object => $new_package_name; }`

..and that means that we end up with a bunch of horribly-named packages being populated, potentially forming weird, long chains of inheritance. Because packages are not garbage collected, these weird-o packages will stick around in memory long after the only instances blessed into them are gone. Some metaprogramming frameworks (like Moose, to name the most important example) build abstractions over this so that you can forget about the horrible crap going on behind the scenes. Furthermore, you can't call ->SUPER::method in the $code passed to ADD_METHOD, because SUPER:: is broken. Instead, you need to use something like the SUPER module to fix the problem -- another abstraction over a colossal hack. When we abstract a colossal hack to fix a stupid language design issue, this is what is commonly called a "design pattern."

Who cares?

At first, this could seem like a really esoteric problem, only likely to bother people who are doing weird stuff at the edges of good behavior. That's because Perl's object model is at least barely adequate for common use -- but it's not great for common use, because of the "everything relates to the package" paradigm.

For example, a very common implementation for objects in Perl is "blessed hashref." This is convenient, because the object's attribute values can be stored in the hash. Sometimes, though, people want to store class attributes, too. There's not a clear place to put this state. At first, you might want to put things in the package itself, in a package variable -- but then it would be visible in global scope, and as subclasses added new attributes, the state associated with a class is scattered over more and more places, which is just frustrating.

"Stop!" some purists cry! "Classes shouldn't really have state! They're just templates for instances!"

That's true, but if it's true, then it's also true that they shouldn't have new methods. When you write a class, you need two things: a template for new objects and a handle for constructing them. Perl doesn't provide a new operator the way that some languages do. Instead, we write new method that get called on classes -- which are packages -- so that classes are acting both as classes and as factories, but (and this is the problematic part) using the same mechanism. This is what leads to the age-old and always-annoying question:

What should we do if somebody calls new on an object?

In an ideal world, instances and classes wouldn't share one namespace for methods, so calling a class method on an instance or an instance method on a class would have the same, obvious answer: an "unknown method" exception could be raised.

If classes were instances (presumably of the notional class Class) then it would make plenty of sense to give them attributes, and it would be easy to dump them, because our classes wouldn't have to be packages identified by strings. They could be blessed references with attribute state in their guts.

Can't you just...

Yes. Of course. You can get around the template/factory conflation by making two classes and having the "factory" class bless things into the "template" class. This is somewhat clumsy, but it works... but it only solves this problem. It doesn't help with the "per-instance methods." In fact, it compounds that problem by doubling the number of classes that might get pseudo-anonymous subclasses generated at runtime and then forgotten about. Those classes might not be generated with the same name in future program runs using the same libraries, so object serialization tools won't always (ever?) be able to restore objects like this properly. It's a big mess.

Overloading the Arrow

The solution I'd proposed to this quite some time ago was to make it possible to overload the method invocation arrow, so that when an instance method was called like this:

`1:`	`$instance->some_method(@args);`

...and we had defined:

`1: 2: 3:`	`package Class; use overload 'method' => 'invoke_method';`

...then this would happen:

`1:`	`Class->invoke_method($instance, "some_method", \@args);`

This never happened, mostly because nobody had my passion for the idea and the necessary ability to add the feature to Perl. Instead, Florian Ragwitz approached me with a number of strategies for implementing it, all of them quite excellent, and he and I have been playing with a real solution on and off for over a year now. It's called MRO::Magic.

MRO::Magic in Action

MRO::Magic's goal is to let you totally replace what happen when a method is called via the arrow operator. As a simple demonstration of it in action, this package implements classless OO:

`1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29:`	use strict; use warnings; package Classless::Root; my %STATIC; use MRO::Magic passthru => [ qw(import export DESTROY AUTOLOAD) ], metamethod => sub { my ($invocant, $method, $args) = @_; unless (ref $invocant) { die "no metaclass method $method on $invocant" unless my $code = $STATIC{$method}; return $code->($invocant, @$args); } my $curr = $invocant; while ($curr) { return $curr->{$method}->($invocant, @$args) if exists $curr->{$method}; $curr = $curr->{parent}; } my $class = ref $invocant; die "unknown method $method called on $class object"; } ; # ...continued below...

We use MRO::Magic and tell it that some methods -- primarily important internal Perl-recognized methods -- are passed through to be called normally via package entries. That's the passthru argument.

The rest of the call sets up the code that's called when somebody calls a method, just as I described above. If it was called on the class name (a "static" method, as some call it) then we look for an entry in %STATIC.

Otherwise, we assume that the object instance is a hashref, and we look for a named entry in the hash. If we find one, that's the method and we call it. If not, we look for a parent object in the parent entry and start looking there. In other words, this works almost exactly like JavaScript. So, just like in JavaScript, we might want a basic universal parent object. We just set it up like this:

`1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22:`	my $Object = { get => sub { my ($self, $attr) = @_; my $curr = $self; while ($curr) { return $curr->{$attr} if exists $curr->{$attr}; $curr = $curr->{parent}; } return undef; }, set => sub { my ($self, $attr, $value) = @_; return $self->{$attr} = $value; }, new_child => sub { my ($parent, %attrs) = @_; bless { %attrs, parent => $parent } => $class; }, parent => undef, }; bless $Object => __PACKAGE__;

Then all we need is a static new method so we can call Classless::Root->new:

`1: 2: 3: 4: 5:`	`$STATIC{new} = sub { my ($class, %attrs) = @_; $Object->new_child(%attrs); };`

Is this awesome?

Yes, this is awesome. In about fifty lines, we've implemented a pretty decent prototype based OO system that isn't really too shabby. It won't be affected by UNIVERSAL or AUTOLOAD abuse. It won't ever require polluting the global namespace with throwaway packages or code.

Best of all, MRO::Magic isn't just a prototype based OO system, it's a toolkit for building your own OO systems with different tradeoffs than those made by the standard Perl OO toolkit. With MRO::Magic, there's more than one way to do it.

So why isn't everybody using it?

Well, MRO::Magic isn't quite done yet. As Florian and I have looked at other ways to do this, we've found a number of places where perl's internals have gotten in the way. Generally these relate to the way that the method resolution cache works. It isn't quite fair to call them bugs, since they don't cause problems with any existing use of Perl, but they sure are weird. One implementation of MRO::Magic worked around these bugs by globally invalidating the entire method resolution cache every time a method was called. Obviously, that's pretty nuts.

Beyond that, it will take a while to figure out just how to build a good MRO::Magic kit. The Classless::Root example is a nice example, but it's kludgy, and would be hard to extend. Things like passing through universal methods or extending the implementation of OO toolkits need to be seriously thought out, which is hard to do when the basics aren't quite usable.

In the end, I think that if MRO::Magic become a viable way to write code, almost no one will use it. Instead, a few alternate OO systems will be written using it, and people will use that. I look forward to implementing a few of those myself, even if only for fun. Data::Hive sure could benefit from it, though, to eliminate its use of AUTOLOAD.

If you're going to play with MRO::Magic, I suggest you do so using the versions of code on GitHub, linked below, rather than the outdated release currently on CPAN. Be warned, though: both are pretty broken, and you might want to ask on IRC what's supposed to work and what's known to be busted.

RJBS Advent Calendar 2010