Houston.pm Oct 2011

The Perl Compiler

rurban - Reini Urban
cPanel, formerly Graz, Austria

See also a screencast of the YAPC talk at http://vimeo.com/14058377

What's new?

Who am I

rurban maintains cygwin perl since 5.8.8 and 3-4 modules, guts, B::* => 5.10, types, Ctypes.

Mostly doing Perl, LISP, C, bash and PHP.

Contents

Compiler was started 1995 by Malcom Beattie, abandoned 2007 by p5p, revived 2008 by me.

Very dynamic language: magic; tie; eval "require $foo;" -> which packages to import? Easy to compile, hard to make faster.

Why use B::C / perlcc?

Overview

In the Perl Compiler suite B::C are three seperate compilers:

perl toke.c/op.c - B::C - perl op walker run.c

Eliminate the whole parsing and dynamic allocation time.

The Walker (Basics)

After compilation walk the "op tree" - run.c

  int
  Perl_runops_standard(pTHX)
  {
   dVAR;
   while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) {
      PERL_ASYNC_CHECK(); /* gone with 5.14 */
   }
   TAINT_NOT;
   return 0;
  }

The Walker (Basics)

   while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) {
      PERL_ASYNC_CHECK();
   }

Observation

1. The op tree is not a "tree", it is reduced to a simple linked list of ops. Every "op" (a pp_<opname> function) returns the next op.

2. PERL_ASYNC_CHECK was called after every single op, until 5.14.

Perl Phases - the "Perl Compiler"

Normal Perl functions start at INIT, after BEGIN and CHECK.
The O modules start at CHECK, and skip INIT.

Perl Phases - the "B Compilers"

Perl Phases - the "B Compilers"

The B::C compiler, invoked via O, freezes the state in CHECK, and invokes then the walker.

$ perl -MO=C,-omyprog.c -e'print $a;'
$ cc_harness -o myprog myprog.c
$ ./myprog

B::C - Unoptimised / the walker

  perl -MO=C,-omyprog.c -e'print $a;'

  while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) ;

B::CC - The optimiser / unrolled (1)

  perl -MO=CC,-omyprog.c -e'print $a;'
  while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) ;

is unrolled to:
  lab_6804728:
        PL_op = &op_list[1];
        PL_op = pp_enter();
        PERL_ASYNC_CHECK(); /* <5.13 */
        TAINT_NOT;      /* nextstate */
        sp = PL_stack_base + cxstack[cxstack_ix].blk_oldsp;
        FREETMPS;
  lab_68046b8:
        PUSHMARK(sp);

B::CC - The optimiser / unrolled (2)

  lab_68046b8:
        PUSHMARK(sp);
...
        XPUSHs(GvSVn((GV*)PL_curpad[1]));
        PL_op = (OP*)&listop_list[0];
        PUTBACK; PL_op = pp_print(); SPAGAIN;
  lab_6804708:
        PL_op = (OP*)&listop_list[1];
        PL_op = pp_leave();
        PUTBACK;
        return NULL;

B::CC - The optimiser / unrolled (3)




Status

5.6.2, 5.8.9, 5.14 non-threaded B::C are quite usable and have the least known bugs, but the others became also pretty stable. 5.15 still has some XSLoader problems.

Best are in the following order: 5.14, 5.6.2, 5.8.9, 5.10, 5.12 non-threaded.

Status Targets

Status Summary

Projects

Which software is compiler critical?

Projects

Which software is compiler critical?

Execution time is the same (sans B::CC)

Projects

Which software is compiler critical?

Execution time is the same (sans B::CC)

Startup time is radical faster

Projects

Which software is compiler critical?

Execution time is the same (sans B::CC)

Startup time is radical faster.

Web Apps with fast response times -

1 sec more or less => good or bad software

Projects

Which software is compiler critical?

Execution time is the same (sans B::CC)

Startup time is radical faster.

Web Apps with fast response times -

Optimise static initialization - strings and arrays

New Optimisations

Optimise static initialization - strings and arrays

non-threaded ! +10-20% performance

ltrace reveils Gthr_key_ptr, gv_fetchpv, savepvn, av_extend and safesysmalloc as major culprits, the later three at startup-time.

New Optimisations

Optimise static initialization - strings and arrays

non-threaded ! +10-20% performance

common constant strings with gcc -Os => automatically optimised

New Optimisations

Optimise static initialization - strings and arrays

non-threaded ! +10-20% performance

common constant strings with gcc -Os => automatically optimised

av_extend - run-time malloc => static arrays ?

New Optimisations

av_extend - run-time malloc => static arrays ?

static arrays are impossible if not Readonly

can not be extended at run-time, need to be realloc'ed into the heap.

But certain arrays can: -fro-inc (Readonly @INC), and compad names and symbols.

New Optimisations

av_extend - run-time malloc => static arrays ?

pre-allocate faster with -fav-init or -O3 with independent_comalloc()

Same for hashes and strings (nyi).

Real Life Applications

cPanel has used B::C compiled 5.6 for a decade, and will switch to 5.14.

cPanel offers web hosting automation software that manages provider data, domains, emails, webspace. A typical large webapp. Perl startup time can be too slow for many AJAX calls which need fast initial response times.

mod_perl or pre-loaded perl-apps would help for startup-time, but not on this case, and not for reduzed memory size.

Benchmarks (by cPanel)

Larger code base => more significant startup improvements. Basically O(1)

Benchmarks (by cPanel)

Web Service Daemon
Resident Size (perl) 9756
Resident Size (perlcc) 9072

DNS Settings Client
Startup Time (perl) 0.074
Startup Time (perlcc) 0.021

HTML Template Processor
Startup Time (perl) 0.695
Startup Time (perlcc) 0.037

Plans

2011: Find and fix all remaining B::C bugs for 5.14.

2012: CC type and sub optimisations. use types.

B::C Limitations

run-time ops vs compile-time
BEGIN blocks only compile-time side-effects.

BEGIN {
   use Package; # okay
   chdir "dir"; # not okay.
   # only done at compile-time, not at the user
   print "stuff"; # okay, only at compile-time
   eval "what"; # hmm; depends
}

Move eval "require Package;" to BEGIN

B::CC Limitations

run-time ops vs compile-time +

dynamic range 1..$foo

goto/next/last $label

Undetected modules behind eval "require":
use -uModule to enforce scanning these

Testsuite

user make test (via cpan):

45x (bytecode + c -O0 - O4 + cc -O0 - O2)

=> 8 min

Testsuite

author make test:

45x bytecode + c -O0 - O4 + cc -O0 - O2 (8 min)

modules.t top100 (16 min)

+ testcore.t (16 min)

=> ~40 min

Testsuite

author make test 40 min

for 5-10 perls (5.6, 5.8, 5.10, 5.12, 5.14 / threaded + non-threaded) 5*2=10

on 5 platforms (cygwin, debian, centos, solaris, freebsd)

=> 33 h (10*5*40 = 2000min) = 1-2 days, similar to the gcc testsuite.

Testsuite

top100 modules. See webpage or svn repo for results for all tested perls / modules

With 5.8 non-threaded 3 fails File::Temp B::Hooks::EndOfScope YAML

With blead debugging + threaded 27 fails

log.modules-5.010001:pass MooseX::Types #TODO generally log.modules-5.012001-nt:fail MooseX::Types #TODO generally log.modules-5.013003-nt:pass MooseX::Types #TODO generally log.modules-5.013003d:fail MooseX::Types #TODO generally

CC

CC - User Type declarations

Currently:

my $EnameE_i; IV integer
my $EnameE_ir; IV integer in a pseudo register
my $EnameE_d; NV double


Future ideas are type qualifiers such as
my (int $foo, double $foo_d);

and attributes such as
my ($foo:Cint, $foo:Creg_int, $foo:Cdouble);

or MooseX::Types

Links

http://search.cpan.org/dist/B-C/

http://code.google.com/p/perl-compiler/

http://www.perl-compiler.org/

mailto:perl-compiler@googlegroups.com

Questions?