Subroutines, Coderefs, and Closures

G. Wade Johnson

Houston.pm

What is a Coderef?

It is a reference to a subroutine.
It can be stored and passed around like other scalars.
The code referred to by the coderef can be called at a later time.

Coderefs are like other references, they allow you to indirectly access the item they reference. They also allow you to treat code like it is data.

A Short Diversion

Perl 5 added references as a way to indirectly access data.
Perl supports references to: scalars, arrays, hashes, globs, filehandles, subroutines.
References are scalars, so they can be stored anywhere a scalar can be.
Opens the door to complex data structures.

A quick overview of references, read the documentation listed later to get a fuller understanding of references.

References Refresher (cont.)

Use the \ operator to create a reference.
Special syntax for arrayrefs and hashrefs.
Dereference by using the normal sigils.
References are scalars, so they can be stored anywhere a scalar can be.
Opens the door to complex data structures.

Making References

   my $sref = \$scalar;
   my $aref = \@array;
   my $href = \%hash;
   my $cref = \&func;
   # anonymous array and hash
   my $aref2 = [ 1, 2, 3 ];
   my $href2 = { a => 1, b => 2 };

Here are some examples of how to make references. Scalar references are not used nearly as often as array or hash references. Both named and anonymous references to arrays and hashes are quite useful.

Code references are much more useful than most people seem to think.

Note that you do not make a coderef with &func;. This construct calls the subroutine func passing it the arguments of subroutine that called it. This calling convention was required in Perl 4, but not in Perl 5. It's effects are different than people expect, so the general recommendation is to not call subroutines this way.

Assigning the output of this call to a variable captures the return value of the subroutine call, not a reference to the subroutine.

Using References

   print "${$sref}\n";
   my $len = @{$aref}; # the whole array
   $aref->[0] = 15; # an element
   my $keyscnt = keys %{$href}; # the whole hash
   $href->{'c'} = 3; # an element

These are basic examples of using the different kinds of references.

The curly braces are not the part of the syntax that causes the dereference. The sigil performs the dereference. The curly braces serve two purposes. They make the dereference somewhat easier to read and they specify what is being dereferenced. If the reference is stored in a more complicated structure than a single scalar, this becomes much more important.

Back to Subroutines

You create subroutines with the sub operator.
The sub operator returns a coderef.
Parameters passed in the @_ array.
Output returned as either last executed statement or return.

Now we're back to the main topic. There's really not much to say about creating subroutines.

To be precise, the sub operator returns a coderef when it is called to create an anonymous subroutine. If it is called with a subroutine name, it does not.

Subroutine/Coderef Example

   sub sum
   {
       my $sum = shift;
       $sum += $_ foreach @_;
       return $sum;
   }

   my $sumref = \&sum;
   print $sumref->( 1..6 ), "\n";

This is how you make a reference to a named subroutine. Then, we call it.

Subroutine/Coderef Example 2

   my $sumref = sub
   {
       my $sum = shift;
       $sum += $_ foreach @_;
       return $sum;
   };

   print $sumref->( 1..6 ), "\n";

You can also use sub as an operator to create an anonymous subroutine. It's interesting to note that this one does not execute at compile-time like the named form. This form executes at run-time.

Alternate Coderef Calling

You can also call through a coderef like this:

   &{$cref}( 1..6 );

I really don't like this style, but it's here for completeness.

What's the point?

Why use coderefs?

Passing code to another subroutine.
Storing code in a data structure.
Make a decision only once.

If you've never seen coderefs or function pointers before it is quite easy to underestimate how useful they are.

Passing Code to a Subroutine

sort, map, grep
UI callbacks
Applying algorithms to complex containers.

People are often surprised to find they have already been using coderefs with the list operators. If you've used Perl/Tk, you'll probably already be familiar with the callback mechanism. (I'm sure most of the UI frameworks probably do something similar. The last use just extends the first idea, but few people stumble onto it themselves.

List Operators

   @list = sort { $b <=> $a } @list;

   sub backwards { return $b cmp $a; }
   @list = sort backwards @list;

The code in the curlies is actually a coderef. Alternatively, you can use the name of a subroutine. This subroutine is special because it takes its parameters in $a and $b instead of @_. These are package variables, which can be a bit of a surprise under some circumstances.

Callbacks

    my $quit_btn = $window->Button(
        -text => "Quit",
        -command => sub { exit( 0 ); }
    );

Supplying a callback to be executed when the button is clicked.

Containers

How do you do things generically with complex data structures?

Let's do this with a concrete example: a hierarchical filesystem

One of the problems with complex data structures is access. The more complicated the data structure, the more work you need to do to get to the part of the data you want. Accessing all of the data in the structure may require a significant amount of code.

The file system is not really very complicated, but it requires more work that a list or hash to access. If you've ever tried to write code to manipulate files and directories, you are probably aware of many places where things go wrong.

Find Large Files

    sub find_large_files
    {
        my ($dir, $size) = @_;
        my @files = grep { -s $_ > $size } $dir->files();
        foreach my $d ($dir->dirs())
        {
            push @files, find_large_files( $d, $size );
        }
        return @files;
    }

In this case, I decided to process all of the files first, then the subdirectories. I could also have done the subdirectories first, or mixed the two.

Since I moved the recursion to the bottom, we could have actually used an iterative solution fairly easily, but that would have been a little harder to grasp.

Find Small Files

    sub find_small_files
    {
        my ($dir, $size) = @_;
        my @files = grep { -s $_ < $size } $dir->files();
        foreach my $d ($dir->dirs())
        {
            push @files, find_small_files( $d, $size );
        }
        return @files;
    }

Notice the two changes that would need to be made to each copy. How often do you think the second would be forgotten?

What's Common

Potentially complex container
Logic for walking container
Input parameters
Return values

Notice how much of the code is devoted to walking the container vs. the actual work we wanted to do.

The Problems

Repeated code
Copy and paste for each new version
Any change to container, lots of changes

With most of the code devoted to walking the container, the code we would need to copy would mostly be boilerplate, requiring no real thought or creativity.

A Coderef Version

    sub grep_files
    {
        my ($pred, $dir) = @_;
        my @files = grep { $pred->( $_ ) } $dir->files();
        foreach my $d ($dir->dirs())
        {
            push @files, grep_files( $pred, $d );
        }
        return @files;
    }

Except for the coderef and the grep call, this is basically identical to the other solutions. The coderef is called a predicate because it tests one parameter and returns true or false.

Using Coderef Version

    my @large_files =
        grep_files( sub { -s $_[0] > $bigsize }, $dir );

    my @small_files =
        grep_files( sub { -s $_[0] < $smallsize }, $dir );

Calling the command with the appropriate predicates. With a little more magic we could avoid adding the sub keyword, but I'm not going to go there, now.

Storing Code in a Data Structure

List of coderefs
Hash mapping strings to coderefs
Hashes containing data and coderefs

For some reason, people seem to have a harder time coming to grips with this idea.

List of Coderefs

Apply a set of operations to the same data.
Easily change the number of operations

    foreach my $cref (@code)
    {
        @data = $cref->( @data );
    }

Notice that we are looping over the code routines not over the data. This can be especially useful if the routines you are building up come from some sort of configuration or user input.

Table-Driven Code

Part of a four-function calculator.

  my %operations = (
      '+' => sub { return $_[0] + $_[1]; },
      '-' => sub { return $_[0] - $_[1]; },
      '*' => sub { return $_[0] * $_[1]; },
      '/' => sub { return $_[0] / $_[1]; },
      'q' => sub { exit( 0 ); },
  );

You could actually map any strings. Also good for language-based interfaces.

This anonymous sub approach works best when the calls fit on a single line. I would not recommend doing this with long subroutines. If you need to do more work, write a real subroutine elsewhere and use the \ to take it's reference here.

Code and Data

  my $obj = {
      'name' => 'Fred',
      'age' => 37,
      'command' => sub { return $_[0] * 2; },
  };

It is hard to come up with a generic example of this. I have used it several times, so it's funny that I can't come up with a good example. A good use would be when the subroutine is needed to perform a minor transformation on some of the data before it can be used. This would be useful when this structure is being stored with a list of other similar structures. By providing this little sub we could, temporarily, massage the data into a form needed for processing, without losing its original nature.

Unnecessary Conditional

    foreach my $file (@files)
    {
        if($is_relative)
        {
            push @files, $file
                if -s "$base_dir$file" > $size;
        }
        else
        {
            push @files, $file
                if -s $file > $size;
        }
    }

Notice that the condition will be tested for each item in the list. For a complex conditional, this could be expensive. Worse, if there are multiple conditions, the code quickly becomes really hard to understand.

Remove Unnecessary Test

    my $test;
    if($is_relative)
    {
        $test = sub { -s "$base_dir$_[0]" > $size ? ($_[0]) : () };
    }
    else
    {
        $test = sub { -s $_[0] > $size ? ($_[0]) : () };
    }

    foreach my $file (@files)
    {
        push @files, $test->( $file );
    }

The code doesn't look much smaller. However, the conditions at the beginning are only executed once, not every time through the loop. We can also move this code elsewhere to make this code easier to maintain.

Closures

Anonymous subroutine
Access to any lexical variable in scope when created
Allows in and out parameters

Lexical variables are effectively the same as my variables.

Classic Closure Example

    sub make_multiplier
    {
        my $factor = shift;
        return sub { return shift * $factor; };
    }

    my $doubler = make_multiplier( 2 );
    my $tripler = make_multiplier( 3 );

This seems to be the way most people introduce closures. Not because it is particularly useful or realistic. I think it is just easy to understand.

Uses for Closures

UI callbacks
Container algorithms
Not quite an object

Closures for UI Callbacks

Given: $obj holds an object we want to activate using its apply method.

    my $apply_btn = $window->Button(
        -text => "Apply",
        -command => sub { $obj->apply(); }
    );

One problem with the normal UI callback method is that it is hard to call methods on a separate object using a simple coderef. This problem is easy to solve by using a closure to capture a reference to the object and the method to call.

Currying

Create a new subroutine with some parameters already supplied.
Makes use of a closure to hold the parameter.
Subroutines operating on other subroutines.

Curry Example

    sub add      { return $_[0] + $_[1]; }
    sub subtract { return $_[0] - $_[1]; }

    sub bind_second
    {
        my $coderef = shift;
        my $second = shift;
        return sub { return $coderef->( $_[0], $second ); };
    }

    my $inc = bind_second( \&add, 1 );
    my $dec = bind_second( \&subtract, 1 );

Once again, not particularly useful, but it is easy to understand.

The Power of Currying

Modify one general subroutine to do multiple specific things.
Bind values to parameters to change the nature of a subroutine.

Another Currying Example

    sub filesize_test
    {
        my $pred = shift;
        return sub { $pred->( -s $_[0] ) };
    } 

    my @small_files =
        grep_files( filesize_test( sub { $_[0] < $max } ), $dir );

    my @large_files =
        grep_files( filesize_test( sub { $min < $_[0] } ), $dir );

This is pretty complicated, but you can walk through it fairly easily. As you become used to this technique, reading these kinds of expressions gets easier.

Conclusion

Not useful for everything
Powerful technique when you need it
Good addition to your toolbox

Subroutines, Closures, Coderefs

Houston.pm • Sept. 11, 2007