24 days of Perl code from RJBS! Feed

Valid Data to All Nations

Data::Rx - 2009-12-13

Params::Validate is Okay

The CPAN has quite a few data validation libraries on it. There's Params::Validate, Data::FormValidator, and probably at least fifty others. Maybe a lot more. I've used a dozen or so of them. Some of them stink, some of them are pretty good. They are, unsurprisingly, pretty Perlocentric. It's useful, in Perl, to be able to say "I'm expecting an instance of a class that does this interface" or "this part of the data should be a regular expression" or "test this data with a custom callback."

The problem with this kind of validator is that it leads to business rules that are only expressible in Perl, or that require substantial post-implementation analysis to translate them into other languages. A while ago, I started to really want to valdiate data in both Perl and JavaScript, specifically data that I was handing back and forth between a Perl web service and two clients, written in Perl and JavaScript. I wanted to be able to validate the I/O in JavaScript, but writing all my validation twice looked like a huge pain that would lead to not wanting to validate at all. I decided I'd look for schema languages.

I found XML Schema and RELAX NG and a few others meant for XML, but they really didn't make a lot of sense for the data structures common to the batch of dynamic programming languages I wanted to support. They were very XML oriented, which is fine, because they were supposed to be. I also found Kwalify and JSON Schema, neither of which seemed really well done to me, for one reason or another. After some token pleading with friends ("Please talk me out of writing a schema system!") I decided to write my own.

Rx Schema

The Rx Schema system is very simple. It's meant to be easy to extend with your own datatypes, but to guarantee only a small set of core types that map to core datatypes in many common languages. We can start with this data structure (shown here as Perl):


1: 
2: 
3: 
4: 
5: 
6: 
7: 

 

my $data = {
  'Armand Assante' => { disposition => 'nice', present => 'Wii' },
  'Ricardo Signes' => { disposition => 'nice', present => 'Sleep' },
  'Violet Beauregarde' => { disposition => 'naughty' },

  ...
};

 

We'll make a slightly tortured schema for this, just to show off a few things. Here, I'm representing the schema in YAML for readability, but they're really just data structures. You could write it in Perl. I usually use JSON.


1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
10: 
11: 

 

---
type: //map
values:
  type: //any
  of :
    - type: //rec
      required: disposition: { type: //str, value: 'naughty' }
    - type: //rec
      required:
        present: //str
        disposition: { type: //str, value: 'nice' }

 

In other words, we require a hashref. Each value is a hashref with either one or two entries. One must be disposition. If it's naughty, there will be no other entries. If it's "nice" there must be an entry for present, which must be a string.

It's easy to use this schema to validate things:


1: 
2: 
3: 
4: 
5: 
6: 
7: 

 

use Data::Rx;

my $schema = Data::Rx->new->make_schema( $that_structure_above );

unless ($schema->check($input)) {
  die "invalid input!";
}

 

But the whole point was to have this work in many languages, and it does:


1: 
2: 
3: 
4: 
5: 
6: 

 

var rx = new Rx;
var schema = Rx.makeSchema( structure );

unless (schema.check( input )) {
  throw "invalid input!";
}

 

...or...


1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 

 

require 'Rx'

rx = Rx.new
schema = Rx.make_schema( structure )

unless schema.check( input )
  throw "invalid input!";
end

 

There are other implementations, too.

The Rx schema system is easy to extend. If you want to use regular expressions you can use the PCRE extension, which many language will be able to implement, then you can write:


1: 
2: 
3: 

 

---
type: /pcre/str
regex: '\A867-[5309]{4}\z'

 

Coming Features (or: Missing Features)

Right now, the various releases of Rx can only check whether or not something is valid. Obviously it's much more useful to be able to know why something isn't valid, if it fails. This is being implemented (in the struct-fail branch), although development has been on hold while I work on some other projects. Once that works, Rx Schema will be much more useful.

I've often talked about other ideas that could built atop Rx like RPC, URI templating, and API library autogeneration. Although I'd love to work on them, I think it's time to admit that I don't see them happening in the near future unless somebody decides to really push for it. For now, a portable schema language is all I'm aiming for -- and I think that's enough for me!

See Also