FeaturesUSENIX

 

Perl 5.0 Overview

by Tom Christiansen

The last major release of Larry Wall's freely redistributable Perl programming language was three years ago. Since then, Larry's released a few updates, but these have been almost entirely bug fixes and portability enhancements: no significant new functionality was introduced. During this time, Perl has spread from hard-core UNIX shops into regions never dreamt of by its author, from trading firms to chip designers to heavy industry.

Although its origins are firmly rooted in UNIX, Perl now runs more or less happily on operating systems as diverse as MS-DOS, Windows/NT, VMS, the Apple Macintosh, and many others. It has become the language of choice for harried systems administrators who haven't the time for slow, awkward, and unportable shell scripts, nor for inscrutable, tedious, and unportable C programs. Increasing numbers of companies are now shipping Perl and using it internally for their install programs, test suites, database interfaces, and end-user applications.

With its 5.0 release, Perl promises to address an even broader range of systems and applications programs than ever before. For certain applications, programmers will still labor long and painful hours writing in C so that they might squeeze that last bit of efficiency. Still, many of us find that a good interpreted language is plenty fast enough, especially on today's blazing hardware, which can only get faster quickly.

The new features in Perl 5.0 will not break existing code except in a few extremely rare cases where the writer was being, to quote one of C's fathers, unreasonably chummy with the compiler. This is because the internals have changed drastically, yielding a leaner, cleaner, faster, and more predictable program. The grammar is much smaller and simpler, and the language is more forgiving of mistakes and more informative when you do something questionable, instead of silently doing something you might not expect.

For example, you no longer have to remember which things must have, might have, or can't have parentheses, although the cautious programmer will continue to use them in all circumstances to make life easier for software maintainers. Even better, most of the surprise factors causing difficulty for new users, such as context sensitivity, now generate warnings (under the -w switch) when you're doing something that doesn't really make much sense, like multiplying two non-numeric strings together, using a list when you want a single element, and so forth. Running perl without -w would be like running classic C without using lint.

The single most important area which Perl now addresses is that of nested data structures and references. A given element of a list (linear array) or table (associative array) can itself be a reference to another list or table. This allows you to construct all the complex data types which you were longing for, such as binary (or n-way) trees, C-style structures and jump tables, or lists of lists of lists. For example, here's a list containing sublists:

[2, 4, [2, 9], 8,
    [8, 12, [2, 5, 0], 9], 8]
And here's a table containing other tables. (The => is just a glorified comma that visually distinguishes when you're constructing tables.)
{ 
    RED => { CRIMSON => 1, SCARLET =>2},
    BLUE => { AZURE => 1, INDIGO => 2 }, 
} 
Furthermore, these can be mixed and matched; here's a table, indexed by someone's name, whose values are each lists of names:
{ 
    John => [ Mary, Pat, Blanch ] 
    Paul => [ Sally, Jill, Jane ] 
    Mark => [ Ann, Bob, Dawn ] 
} 
We now have references and referencing and dereferencing operators to go with them. References in Perl are type-safe and type-checked, unlike C's pointers. Furthermore, variables have reference counts on them to control when storage is released.

For postfix dereferencing, you have C's -> operator, allowing you to write things like

# list element 
$r -> [3]     		
# 3-dim array
$r -> [3] -> [4] -> [17] = 3;
# table element 
$r -> {"John"}
# nested table
$r -> {"ru_utime"} -> {"tv_sec"}
The last demonstrates the most straightforward way to represent C-style structures. More elaborate methods are also possible. Multiple level constructs need not be declared - each level is created on the fly if needed, just as assigning something to a scalar variable makes it spring into existence without any previously declaration.

To create a reference to a named variable, we use a backslash where C used an ampersand:

$sref = \$some_var; 
$lref = \@some_list; 
$tref = \%some_table; 
$fref = \&some_func;
For prefix dereferencing for which C uses a *, you can use any of the four previous type specifiers: $, @, %, or &. You can use more than one of them, in fact, because you could have refs to refs, or functions returning refs.
# assign 3 to $some_var
$$sref = 3;
# pop @some_list
pop(@$lref);
# keys %some_table
@key_list = keys(%$tref);
# some_func()
&$fref($parms);
As in the shells, you may always use braces to clarify:
pop( @{ $lref } ); &{ $jump_table{$name} }($keystroke); ${${$refref}} = 1; # like $$$refref = 1
Finally, there's a ref() built-in function that returns the type of reference you've got, allowing you to write functions to print out nested data structures without knowing quite what's in them beforehand. Possible return values from the ref() are "" if it's not a reference, or SCALAR, ARRAY,HASH,CODE, or REF.
 if (ref($r) eq "HASH") {
    dump_hashtable() 
}
That's right: you don't need to use the ampersand on a function call anymore if you don't want to.

And yes, you may have user-defined data types as well! Using Perl's package system, you can write code using object-oriented programming strategies. Per-class and per-instance user-defined constructors and destructors are supported (reference counts are essential for knowing when to call the destructor), as are multiple inheritance and class-specific functions and data. For example:

$r->next_seq()
would call the next_seq() method of whatever class (package) to which $r belongs (next_seq() could get at $r, it's "this" pointer, in C++ parlance). If there isn't such a method, then a run-time search up the inheritance chain will ensue. If none is found, then an exception would be raised using Perl's exception handling mechanism. If one is found, then it's cached so that the search can be shorter the next time.

Here's another interesting possibility. These two are identical, making it slightly nicer to read and write certain constructs:

$new_ob = $old_ob->new();
$new_ob = new $old_ob;

$count = $ob->sizeof(); 
$count = sizeof $ob;
Nifty, eh? Those are user-defined methods, not built-ins. This isn't just gratuitous syntactic sugar: it falls out of the "indirect object" slot such as is found in output statements, sort routines, etc. It's really just like:
 print $fh "string\n"; 
The other major set of functionality that will greatly aid programmers is in the area of scoping. Perl now supports both static and dynamic scoping of variables. Originally only dynamic scoping was supported, but so many C and Pascal programmers were unused to it that they found themselves making strange mistakes.

It used to be that variables always sprang into existence when you first used them whether you wanted them to or not. Now, you may optionally enforce variable "declarations" on a per-block basis. For such blocks, all variable references much be either a statically-scoped local or a fully-qualified global. Any stray variable references will be flagged at compile time. This makes it much easier to write safe code that doesn't accidentally alter or create a global variable.

There's quite a bit more we don't have time to go over in detail. Here's a partial list of what's already been completed that we haven't gone over already in this article:

  • Nestability of quoted strings
  • Improved exception mechanism
  • Support for BEGIN and END subs on a per-pack age basis
  • Various bug fixes
And here's a list of some of what's anticipated to be there, but not strictly guaranteed:
  • Embeddability into C and C++: cc prog.c -lperl
  • File handle objects: $STDOUT->flush(1)
  • Separate man pages for all library functions and built-ins
  • Very easy GUI Perl applications using high-level X bindings
  • Many more libraries, including class and struct libs
  • More example code in the eg/ directory
  • A Perl profiler
  • Debugger enhancements
  • Access to POSIX 1003.1 functions
  • Mnemonics for all the "funny" variables, (e.g., $ERRNO for $!)
  • Easy extensibility using C functions
  • Various Perl development tools
If you'd like to play with an alpha release of the Perl 5.0 release, you can retrieve it from ftp.com [192.94.48.152] in pub/outgoing/perl5.0/perl5a3.tar.Z. It already contains all the functionality detailed in the long exposition above. It also has interesting files with more information: check out Changes, Todo, and Wishlist.

This article was originally published in the November/December 1993 issue of ;login:.

 

?Need help? Use our Contacts page.
Last changed: Jul 4, 1997 pc
;login: index
USENIX home