Perl Introduction, Part 2

G. Wade Johnson

Houston.pm

Goal of Perl

Get stuff done.

Several other programming languages have explicit goals relating to forcing a particular programming paradigm, or fostering good code design. While these sound like noble goals they often detract from the ability to solve problems with the language.

From the very beginning, Perl has been focused on solving real problems.

Community

A very important part of the Perl language is the community that surrounds it. Larry has explicitly worked to grow this community. Like any group, some members are difficult to get along with, while others are very friendly.

Documentation, Books

Learning Perl
Programming Perl
Perl Cookbook
Perl Best Practices
Perl Testing: A Developer's Notebook

The O'Reilly books are the standard texts everyone relies on. There are a few other books from other publishers that are worth having, but these should be the core of your library.

Documentation, perldoc

perldoc perltoc
perldoc perl
perldoc perlmodlib
perldoc perlrun
perldoc -f {func}
perldoc -q {keyword}
perldoc {module}

Perl has extensive documentation that that comes with every Perl distribution. The tool perldoc allows access to this documentation.

perltoc is an in-depth look at all of Perl. perl list of functions, other perldocs, etc. perlmodlib lists all of the core modules. perlrun covers command line parameters for the perl program.

On-line Docs

perldoc.perl.org is an on-line version of the perldocs.

A Quick Review: Variables

$scalar holds a number, string, or reference
@array holds a list of scalars
%hash maps strings to scalars

I won't go into these unless someone really needs a refresher.

A Quick Review: Flow Control

Conditional: if, else, elsif, unless
Looping: while, until, for, foreach
Exceptions: die, eval
Subroutines: sub

Beginning Perl Tips

use strict;
use warnings;
Be aware of command line switches
Perl is really good at munging text

notes

Command Line Switches

-n: wrap the script in a while(<>){ } loop
-p: wrap the script in a while(<>){ } continue { print; } loop
-e: Supply Perl code as an argument
-i: modify file in-place
-M: Load a module
-c: compile but do not run

The perl executable supports a large number of switches that makes it a powerful command-line tool for quick-and-dirty work. This is a list of some of the most common. It is by no means a complete list.

Munging Text

Filter mode: while(<>) { }
chomp removes newlines
Regular expressions
String manipulation: split, join, substr
Hashes for summarizing information
Transliteration: tr///

Although Perl's regular expressions are it's most infamous and powerful string-manipulation feature, don't be fooled into thinking it is the only one. You can do a lot of data processing with only a tiny bit of regex magic.

That being said, really understanding regular expressions is a necessity if you truly need to master text processing with Perl.

A Real-World Problem

Summarizing data from a set of structured text files.

This problem is a slightly modified version of a problem that I needed to solve at work. The general structure of the problem is pretty straight-forward. I have a set of data files storing a number associated with a date. One of the files is supposed to be an accumulation of the other files. I need to verify that assertion.

Approaches

By hand
Spreadsheet
Quick&Dirty Command Line Approach
Real Perl Code

The obvious solution would be to walk through the data by hand, adding up the appropriate lines and comparing with the target. I hope everyone would agree that this is not the right solution.

I've seen people solve similar problems by opening up the files with an editor, changing the format into something that Excel can read, and using it to do the work. Often, the file manipulation takes up more time than it would to write code.

I will often apply a quick and dirty approach to see explore the problem. Knowledge of the Perl command line switches is critical to this solution.

In many cases, a more structured approach is needed.

Quick and Dirty

Quick one-liner logic to determine if there is a problem.


  perl -ne'$sum+=$1 if /=(\d+)/;}{print "$sum\n";' files

One-liners are really good for answering a simple question. If the question gets too complicated, an actual script is needed.

Some of the really bad Perl code that people complain about are the result of not realizing when a real Perl program was needed.

A more complicated one-liner

Extracting a month's data.


  perl -ne'$sum{$1}+=$2 if /^(\d+\.\d+)\.\d+=(\d+)/;}
     {print "$_: $sum{$_}\n" foreach sort keys %sum;' files

Next, we would like to know if the problem is wide-spread or relatively contained. One way to do that would be to summarize the data in some relatively large subsets. Given the data, monthly groupings would be a good choice.

This is probably about as complicated as a one-liner should ever get. Much more complicated than this and you will need more help from Perl to get things right.

An Exploratory Script

Let's try to find how many days are wrong.

Now we want to figure out which days are wrong. We know from the previous tests that no more than 120 days would be affected.

A Little More Information

Since only a few days are wrong, let's see them.

This is the next set of changes I would probably make.

Good Modules to Know

CPAN
List::Util
List::MoreUtils
Data::Dumper
Devel::REPL
App::Ack
Getopt::Long

There are thousands of modules that you can use to solve parts of your problem. Here are a few that you want to learn about as soon as you can. They will save you loads of wasted time.

CPAN

CPAN may be Perl's greatest treasure. There are thousands of modules here, ranging from incredibly powerful to outright silly. After a short while doing Perl, you will learn to rely on CPAN.

More Tips, Tricks, and Traps

See previous Houston.pm presentations

I hope to get last month's talk on-line soon. Other than that, slides from most of the Houston.pm presentations are available at the Houston.pm website. One example is a Tips, Tricks, and Traps talk I gave a few years ago.

Houston.pm - February 10, 2010