) {
chomp;
if (/^\s/) {
escapes();
y/\200-\377/\000-\177/;
print "\n";
print;
print "\n";
next;
}
s{I<(.*?)>}{save("$1")}gesx;
s{C<(.*?)>}{save("$1")}gesx;
s{B<(.*?)>}{save("$1")}gesx;
if (/^=\s*(.*)/) {
my $title = $1;
$title =~ y/\200-\377/\000-\177/;
$cur++;
$slide[$cur] = $title;
print IDX qq(
$title\n);
if ($cur > 1) {
print "\n";
print qq(
);
print qq(Forward to $slide[$cur]\n);
if ($cur > 2) {
$prev = $cur - 2;
print qq(
Back to $slide[$cur - 2]\n);
}
print qq(
Up to index\n);
done();
}
open(SLIDE, "> slide$cur.html")
|| die "can't open slide$cur.html: $!";
select SLIDE;
print <Perl Style: $title
Perl Style: $title
EOF
next;
}
s!([\$\@%]\w+)!save("$1")!ge;
escapes();
s/^\*\s*/- /gm;
y/\200-\377/\000-\177/;
print;
}
if ($cur > 1) {
print "
\n";
print qq(
);
if ($cur > 2) {
$prev = $cur - 2;
print qq(Back to $slide[$cur - 2]\n);
}
print qq(
Up to index\n);
done();
}
sub done {
print <
Copyright © 1998, Tom Christiansen
All rights reserved.
EOF
}
__END__
=Everyone Has an Opinion
* Style can easily become a religious issue.
* What I am about to tell you is mostly my opinion. It
includes both general philosophy and concrete tips.
* Warning: I may not always follow my own tips. :-)
* I do not expect all of you to agree with me all the time.
Choose a style and stick with it. Consistency is critical.
* I owe indirectly K&P, K&R, S&W, Rob Pike, and Larry Wall for laying
the foundations, and directly Jon Orwant, Mark-Jason Dominus, and Nat
Torkington for reviewing early versions of these notes.
* `Under no circumstances should you program the way I say to because I
say to; program the way you think expresses best what you're trying to
accomplish in the program. And do so consistently and ruthlessly.'
--Rob Pike
=Program Perl, Not C/BASIC/Java/Pascal/etc
* `Just because you I do something a particular way doesn't mean
you I do it that way.' --I
* If you find yourself writing code that looks like C code, or BASIC, or
Java, or Pascal, you are probably short-changing yourself. You need to
learn to program idiomatic Perl -- which does not mean obfuscatory Perl.
It means Perl in its own idiom: native Perl.
* Fall not into the folly of avoiding certain Perl idioms for fear
that someone who maintains your Perl code won't understand it because
they don't know Perl. This is ridiculous! That person shouldn't be
maintaining a language they don't understand. You don't write your
English so that someone who only speaks French or German can understand
you, or use Latin letters when writing Greek.
=Elegance
* `It is very hard to get things both right (coherent and correct) and
usable (consistent enough, attractive enough).' --Dennis Ritchie
* Strive always to create code that is functional, minimal, flexible,
and understandable -- not necessarily in that order.
* Think first. Then hack. Now throw it out. Repeat.
Fred Brooks says, `Build one to throw away.'
Always rewrite your code from scratch, preferably B.
just as did it with drafts of papers in grammar school. It
improves understanding, gets the creative juices flowing, and
produces a far finer end-product.
* Sometimes making code shorter improves maintainability; other
times it does not.
=Defensive Programming
* use strict
* #!/usr/bin/perl -w
* Check B syscall return values, printing $!
* Watch for external program failures in $?
* Check $@ after C or C.
* Parameter asserts
* #!/usr/bin/perl -T
* Always have an C after a chain of Cs
* Put commas at the end of lists to so your program won't break if someone
inserts another item at the end of the list.
=The Art of Commenting Code
* Explain what the code does, don't just perl2englishify.
* Eschew gaudy block banners.
* Use comments in regexes with C.
* Comment entire blocks, not single lines.
* `Comments on data are usually much more helpful than on algorithms.'
(Rob Pike)
* `Basically, avoid comments. If your code needs a comment to be
understood, it would be better to rewrite it so it's easier to understand.'
(Rob Pike)
=On the Naming of Names (Form)
* `I eschew embedded capital letters in names; to my prose-oriented
eyes, they are too awkward to read comfortably. They jangle like bad
typography.' (Rob Pike)
* `C.'
(TheAntiPike)
* While short identifiers like $gotit are probably ok, use underscores to
separate words. It is generally easier to read $var_names_like_this than
$VarNamesLikeThis, especially for non-native speakers of English. It's
also a simple rule that works consistently with VAR_NAMES_LIKE_THIS.
* You may find it helpful to use letter case to indicate
the scope or nature of a variable. For example:
$ALL_CAPS_HERE constants only (beware clashes with perl vars!)
$Some_Caps_Here package-wide global/static
$no_caps_here function scope my() or local() variables
* Function and method names seem to work best as all
lowercase. E.g., $obj->as_string().
=On the Naming of Names (Content)
* `Procedure names should reflect what they do; function names should
reflect what they return.' --Rob Pike.
* Name objects so that they read well in English. For example, predicate
functions should usually be named with `is', `does', `can', or `has'.
Thus, C<&is_ready> is better than C<&ready> for the same function,
* Therefore, C<&canonize> as a void function (procedure),
C<&canonical_version> as a value-returning function, and C<&is_canonical> for
a boolean check.
* The C<&abc2xyz> and C<&abc_to_xyz> forms are also well established
for conversion functions or hash mappings.
* Hashes usually express some I of the keys, and are used with
the English word `of' or the possessive form. Name hashes for their
values, not their keys.
GOOD:
%color = ('apple' => 'red', 'banana' => 'yellow');
print $color{'apple'}; # Prints `red'
BAD:
%fruit = ('apple' => 'red', 'banana' => 'yellow');
print $fruit{'apple'}; # Prints `red'
=Length of Variable Names
* `The appropriate length of a name is directly proportional to the size
of its scope.' --Mark-Jason Dominus
* Length of identifiers is not a virtue; clarity is. Don't write this:
for ($index = 0; $index < @$array_pointer; $index++) {
$array_pointer->[$index] += 2;
}
When you should write:
for ($i = 0; $i < @$ap; $i++) {
$ap->[$i] += 2;
}
(One could argue for a better name than $ap, though. Or not.)
* Global variables deserve longer names than local ones,
because their context is hard to see. For example,
%State_Table is a program global, but $func might be
a local state pointer.
foreach $func (values %State_Table) { ... }
=Parallelism
* Code legibility is dramatically increased by consistency
and parallelism. Compare
my $filename = $args{PATHNAME};
my @names = @{ $args{FIELDNAMES} };
my $tab = $args{SEPARATOR};
with
my $filename = $args{PATHNAME};
my @names = @{$args{FIELDNAMES}};
my $tab = $args{SEPARATOR};
* Line up your # comments or your C<|| die> all at one column:
socket(SERVER, PF_UNIX, SOCK_STREAM, 0) || die "socket $sockname: $!";
bind (SERVER, $uaddr) || die "bind $sockname: $!";
listen(SERVER,SOMAXCONN) || die "listen $sockname: $!";
=Embrace && and || for Control and Values
* Perl's && and || operators short circuit like C's, but return
different values: they return the first thing that resolves them.
* This is most often used with ||:
++$count{ $shell || "/bin/sh" };
$a = $b || 'DEFAULT';
$x ||= 'DEFAULT';
* Sometimes it can be done with && also, usually providing the
false value is '' not 0. (False tests in Perl return '' not 0!).
$nulled_href = $href . ($add_nulls && "\0");
=Learn Precedence
* It is a myth that you can just plop in C and C
wherever you'd been using the punctuation versions. They
have difference precedences. You B learn precedence.
And a few parens seldom hurt.
print FH $data || die "Can't write to FH: $!"; # NO
print FH $data or die "Can't write to FH: $!"; # YES
$a = $b or $c; # bug: this is wrong
($a = $b) or $c; # really means this
$a = $b || $c; # better written this way
@info = stat($file) || die; # oops, scalar sense of stat!
@info = stat($file) or die; # better, now @info gets its due
* Careful with parens here:
$a % 2 ? $a += 10 : $a += 2
Really means this:
(($a % 2) ? ($a += 10) : $a) += 2
Rather than this:
($a % 2) ? ($a += 10) : ($a += 2)
=Don't Overdo `?:'
* Using C for control flow may get you talked about. Better
to use an C. And seldom if ever nest ?:.
# BAD
($pid = fork) ? waitpid($pid, 0) : exec @ARGS;
# GOOD:
if ($pid = fork) {
waitpid($pid, 0);
} else {
die "can't fork: $!" unless defined $pid;
exec @ARGS;
die "can't exec @ARGS: $!";
}
* Best as an expression:
$State = (param() != 0) ? "Review" : "Initial";
printf "%-25s %s\n", $Date{$url}
? (scalar localtime $Date{$url})
: "",
=Never define "TRUE" and "FALSE"
* The language understands booleans. Never define them
yourself! This is terrible code:
$TRUE = (1 == 1);
$FALSE = (0 == 1);
if ( ($var =~ /pattern/ == $TRUE ) { .... }
if ( ($var =~ /pattern/ == $FALSE ) { .... }
if ( ($var =~ /pattern/ eq $TRUE ) { .... }
if ( ($var =~ /pattern/ eq $FALSE ) { .... }
sub getone { return "This string is true" }
if ( getone() == $TRUE ) { .... }
if ( getone() == $FALSE ) { .... }
if ( getone() eq $TRUE ) { .... }
if ( getone() eq $FALSE ) { .... }
* Imagine the silliness of this progression, and stop at the
first one.
if ( getone() ) { .... }
if ( getone() == $TRUE ) { .... }
if ( (getone() == $TRUE) == $TRUE ) { .... }
if ( ( (getone() == $TRUE) == $TRUE) == $TRUE ) { .... }
=Embrace Pattern Matching
* Regular Expressions are your friend. More than that, they're
a whole new way of thinking.
* Just as chess players see patterns in the board positions their pieces
control, Perl adepts look at data in terms of patterns.
* Although most modern programming languages offer primitive pattern
matching tools, usually through an extra library, Perl's patterns are
directly integrated into the language core. C, $1, etc.
* Perl's patterns boast features not found in other languages' pattern
matching, features that encourage a whole different way of looking
at data.
=Changing I
* You can copy and change all at once:
chomp($answer = );
($a += $b) *= 2;
# strip to basename
($progname = $0) =~ s!^.*/!!;
# Make All Words Title-Cased
($capword = $word) =~ s/(\w+)/\u\L$1/g;
# /usr/man/man3/foo.1 changes to /usr/man/cat3/foo.1
($catpage = $manpage) =~ s/man(?=\d)/cat/;
@bindirs = qw( /usr/bin /bin /usr/local/bin );
for (@libdirs = @bindirs) { s/bin/lib/ }
print "@libdirs\n";
| /usr/lib /lib /usr/local/lib
=Negative Array Subscripts
* To get the last element in a list or array, use C<$array[-1]> instead
of C<$array[$#array]>. The former works on both lists and arrays,
but the latter does not.
* Remember that C, C, C, and C
also accept negative subscripts to count back from the end.
split(@array, -2); # pop twice
* Remember substr is lvaluable:
substr($s, -10) =~ s/ /./g;
=Embrace Hashes
* Until you start thinking in terms of hashes, you're
not thinking in Perl. They can often replace lengthy
loops or complex algorithms.
* Use a hash whenever you want to represent a set, a relation,
a table, a structure, or a record.
* The words `in', `unique', `first', and `duplicate' should
all set off Pavlovian screams of `HASH!' If you find them in
the same sentence as `array', you're probably doing something
wrong.
=Use Hashes for Sets
* Consider finding the union and intersection of
two unique arrays @a and @b:
foreach $e (@a) { $union{$e} = 1 }
foreach $e (@b) {
if ( $union{$e} ) { $isect{$e} = 1 }
$union{$e} = 1;
}
@union = keys %union;
@isect = keys %isect;
* This would be more idiomatically written as:
foreach $e (@a, @b) { $union{$e}++ && $isect{$e}++ }
@union = keys %union;
@isect = keys %isect;
=Use Hashes for the First Time
* A hash is a good way to keep track of whether
you've done something before.
* Embrace the C<... unless $seen{$item}++> notation:
%seen = ();
foreach $item (genlist()) {
func($item) unless $seen{$item}++;
}
=Use Hashes of Records, not Parallel Arrays
* Learn to use hashes of records, and maintain array or
hashes of these records, rather than using parallel
arrays. Don't do this:
$age{"Jason"} = 23;
$dad{"Jason"} = "Herbert";
When you should do:
$people{"Jason"}{AGE} = 23;
$people{"Jason"}{DAD} = "Herbert";
Or even: (note use of C here)
for $his ($people{"Jason"}) {
$his->{AGE} = 23;
$his->{DAD} = "Herbert";
}
But think B carefully before writing this:
@{ $people{"Jason"} }{"AGE","DAD"} = (23, "Herbert");
=Use $_ in Short Code
* Contrary to beginners' belief, $_ improves legibility.
Compare:
while ($line = <>) {
next if $line =~ /^#/;
$line =~ s/left/right/g;
$line =~ tr/A-Z/a-z/;
print "$ARGV:";
print $line;
}
with:
while ( <> ) {
next if /^#/;
s/left/right/g;
tr/A-Z/a-z/;
print "$ARGV:";
print;
}
=Use foreach() Loops
* A C loop's implicit aliasing and localizing
can make for a powerful construct:
foreach $e (@a, @b) { $e *= 3.14159 }
for (@lines) {
chomp;
s/fred/barney/g;
tr[a-z][A-Z];
}
* Remember you can copy and modify all at once:
foreach $n (@square = @single) { $n **= 2 }
* You can use hash slices to modify hash values, too:
# trim whitespace in the scalar, the array,
# and all the values in the hash
foreach ($scalar, @array, @hash{keys %hash}) {
s/^\s+//;
s/\s+$//;
}
=Avoid Byte Processing
* C programmers often try to process strings a byte at a time.
Don't do that! Perl makes it easy to take data in big bites.
* Don't use C. Grab the whole line and operate on it all at once.
* Even operations traditionally done a char at a time in C, like
lexing, should be done differently. For example:
@chars = split //, $input;
while (@chars) {
$c = shift @chars;
# State machine;
}
Is far too low level. Try something more like:
sub parse_expr {
local $_ = shift;
my @tokens = ();
my $paren = 0;
my $want_term = 1;
while (length) {
s/^\s*//;
if (s/^\(//) {
return unless $want_term;
push @tokens, '(';
$paren++;
$want_term = 1;
next;
}
if (s/^\)//) {
return if $want_term;
push @tokens, ')';
if ($paren < 1) {
return;
}
--$paren;
$want_term = 0;
next;
}
if (s/^and\b//i || s/^&&?//) {
return if $want_term;
push @tokens, '&';
$want_term = 1;
next;
}
if (s/^or\b//i || s/^\|\|?//) {
return if $want_term;
push @tokens, '|';
$want_term = 1;
next;
}
if (s/^not\b//i || s/^~// || s/^!//) {
return unless $want_term;
push @tokens, '~';
$want_term = 1;
next;
}
if (s/^(\w+)//) {
push @tokens, '&' unless $want_term;
push @tokens, $1 . '()';
$want_term = 0;
next;
}
return;
}
return "@tokens";
}
=Avoid Symbolic References
* Beginners often think they want to have a variable contain the name
of a variable.
$fred = 23;
$varname = "fred";
++$varname; # $fred now 24
* This works sometimes, but is a bad idea. They B. Global variables are bad because they can easily collide
accidentally.
* They do not work under the use strict pragma
* They are not true references and consequently are not reference
counted or garbage collected.
* Use a hash or a real reference instead.
=Using A Hash Instead of $$name
* Using a variable to contain the name of another variable always
suggests that perhaps someone doesn't understand hashes very well.
While you could write this:
$name = "fred";
$$name{WIFE} = "wilma"; # set %fred
$name = "barney"; # set %barney
$$name{WIFE} = "betty";
Better to write:
$folks{"fred"} {WIFE} = "wilma";
$folks{"barney"}{WIFE} = "betty";
=Avoid Testing eof
* Don't use this: (deadlock)
while (!eof(STDIN)) {
statements;
}
* Use this instead:
while () {
statements;
}
* Prompting while not eof can be a hassle. Try this:
$on_a_tty = -t STDIN && -t STDOUT;
sub prompt { print "yes? " if $on_a_tty }
for ( prompt(); ; prompt() ) {
statements;
}
=Avoid Gratuitous Backslashes
* Perl lets you choose your own delimiters on quotes and
patterns to avoid Leaning Toothpick Syndrome. Use them.
m#^/usr/spool/m(ail|queue)#
qq(Moms said, "That's all, $kid.")
tr[a-z]
[A-Z];
s { / }{::}gx;
s { \.p(m|od)$ }{}x;
=Reduce Complexity
* But place C and C near the top of the loop when possible.
* Use C and C.
* But don't use C
* Escape the tyranny of Pascal. Don't go through silly contortions to
exit a loop or a function only at the bottom. Don't write:
while (C1) {
if (C2) {
statement;
if (C3) {
statements;
}
} else {
statements;
}
}
=Reduce Complexity (solution)
* Write this instead:
while (C1) {
unless (C2) {
statement;
next;
}
statements;
next unless C3;
statements;
}
* Or perhaps even:
while (C1) {
statement, next unless C2;
statements;
next unless C3;
statements;
}
=Loop Hoisting
* Hoist repeated code out of blocks:
Before:
if (...) {
X; Y;
} else {
X; Z;
}
After:
X;
if (...) {
Y;
} else {
Z;
}
=Break Complex Tasks Up
* Break subroutines into manageable pieces.
* Don't try to fit everything into one regex.
* Play with your ARGV:
# program expects envariables
@ARGV = keys %ENV unless @ARGV;
# program expects source code
@ARGV = glob("*.[chyC]") unless @ARGV;
# program tolerates gzipped files
# from PCB 16.6
@ARGV = map { /^\.(gz|Z)$/ ? "gzip -dc $_ |" : $_ } @ARGV;
=Break Programs into Separate Processes
* Learn how to use the special forms of C.
(See also I in TSA).
# from PCB 16.5
head(100);
sub head {
my $lines = shift || 20;
return if $pid = open(STDOUT, "|-");
die "cannot fork: $!" unless defined $pid;
while () {
print;
last if --$lines < 0;
}
exit;
}
=Data-Oriented Programming
* Data structures are more important than code.
* Rob Pike says: `Data dominates. If you've chosen the right data
structures and organized things well, the algorithms will almost
always be self-evident. Data structures, not algorithms, are central to
programming. (See Brooks p. 102.)'
* Capture regularity with data, irregularity with code. (Kernighan)
* If you see similar functionality in two places, unify it.
That is called a `subroutine'.
* Consider making a hash of function pointers to represent
a state table or switch statement.
=Configuration Files
* If you need a config file, load it with C.
* This gives you full Perl power.
# from PCB 8.16
$APPDFLT = "/usr/local/share/myprog";
do "$APPDFLT/sysconfig.pl";
do "$ENV{HOME}/.myprogrc";
#in config file
$NETMASK = '255.255.255.0';
$MTU = 0x128;
$DEVICE = 'cua1';
$RATE = 115_200;
* See I and I file in Tom's Script Archive
at http://www.perl.com/CPAN/authors/id/TOMC/scripts/
=Functions as Data
* Use function pointers as function arguments or in data structures:
# from MxScreen in TSA (see also PCB 19.12)
%State_Table = (
Initial => \&show_top,
Execute => \&run_query,
Format => \&get_format,
Login => \&resister_login,
Review => \&review_selections,
Sorting => \&get_sorting,
Wizard => \&wizards_only,
);
foreach my $state (sort keys %State_Table) {
my $function = $State_Table{$state};
my $how = ($action == $function)
? SCREEN_DISPLAY
: SCREEN_HIDDEN;
$function->($how);
}
=Closures
* Clone similar functions using closures.
# from MxScreen in TSA
no strict 'refs';
for my $color (qw[red yellow orange green blue purple violet]) {
*$color = sub { qq<@_> };
}
undef &yellow; # lint happiness
*yellow = \&purple; # function aliasing
* Or similarly:
# from psgrep (in TSA, or PCB 1.18)
my %fields;
my @fieldnames = qw(FLAGS UID PID PPID PRI NICE SIZE
RSS WCHAN STAT TTY TIME COMMAND);
for my $name (@fieldnames) {
no strict 'refs';
*$name = *{lc $name} = sub () { $fields{$name} };
}
=Learn to Switch with for
* Though Perl has no built-in switch statement, this is
not a hardship but an opportunity.
* It's easy to build one. The word `for' is sometimes
pronounced `switch'.
SWITCH: for ($where) {
/In Card Names/ && do { push @flags, '-e'; last; };
/Anywhere/ && do { push @flags, '-h'; last; };
/In Rulings/ && do { last; };
die "unknown value for form variable where: `$where'";
}
* Like a series of Cs, a switch should B have a default
case, even if the default case `can't happen'.
=Switch by Using do{} Creatively
* Another interesting approach to a switch statement is arrange for a do
block to return the proper value:
$amode = do {
if ($flag & O_RDONLY) { "r" } # XXX: isn't this 0?
elsif ($flag & O_WRONLY) { ($flag & O_APPEND) ? "a" : "w" }
elsif ($flag & O_RDWR) {
if ($flag & O_CREAT) { "w+" }
else { ($flag & O_APPEND) ? "a+" : "r+" }
}
};
=Switch with for via && and ||
* Be careful that the RHS of && is always true.
$dir = 'http://www.wins.uva.nl/~mes/jargon';
for ($ENV{HTTP_USER_AGENT}) {
$page = /Mac/ && 'm/Macintrash.html'
|| /Win(dows )?NT/ && 'e/evilandrude.html'
|| /Win|MSIE|WebTV/ && 'm/MicroslothWindows.html'
|| /Linux/ && 'l/Linux.html'
|| /HP-UX/ && 'h/HP-SUX.html'
|| /SunOS/ && 's/ScumOS.html'
|| 'a/AppendixB.html';
}
=Switch Using for and do{} Even More Creatively
* Sometimes, aesthetics counts. :-)
for ($^O) {
*struct_flock = do {
/bsd/ && \&bsd_flock
||
/linux/ && \&linux_flock
||
/sunos/ && \&sunos_flock
||
die "unknown operating system $^O, bailing out";
};
}
=The Care and Feeding of Modules
* Document your modules with Pod. Test your pod with I
and with I.
* Use the Carp module's C, C, and C
routines, not C and C.
* Well written modules seldom require C<::>. Users should
get at module contents using an import or a class method call.
* Traditional modules are fine. Don't jump to full objects just because
it's considered cool. Do it only when the occasion obviously calls
for it.
* Access to objects should only be through object methods.
* Object methods should themselves access class data through pointers
on the object.
=Patches
* When you're working on someone else's code,
try to follow their lead.
* Avoid reformatting text so it doesn't create
spuriously large diffs.
* Converting tabs to spaces may be necessary to cope
with the evil folk who set hardware tabs to something
other than 8, or who have custom tab stops. B
=From Perlstyle (part 1)
* 4-column indent. Make your editor help you! [DEMO]
* Opening curly on same line as keyword, if possible; otherwise line up.
* Space before the opening curly of a multi-line BLOCK.
* One-line BLOCK may be put on one line, including curlies.
* No space before the semicolon.
* Semicolon omitted in `short' one-line BLOCK.
* Space around most operators.
* Space around a `complex' subscript (inside brackets).
* Blank lines between chunks that do different things.
* Uncuddled elses.
=From Perlstyle (part 2)
* No space between function name and its opening parenthesis.
* Space after each comma.
* Long lines broken after an operator (except `and' and `or').
* Space after last parenthesis matching on current line.
* Line up corresponding items vertically.
* Omit redundant punctuation as long as clarity doesn't suffer.
* Use here documents instead of repeated print() statements.
* Line up corresponding things vertically, especially if it'd be too
long to fit on one line anyway.