Introduction    to    Perl

PERL ( Practical Extraction and Report Language )

Table of Contents

1. Introduction to Perl 3 2. Scalar Data 5 3. Array and List Data and Associative Arrays 8 4. Control Structures 13 5. Basic I/O 16 6. Formats 17 7. Regular Expressions 20 8. Functions 25 9. Filehandles and File Tests 28 10. File and Directory Manipulation 30 11. Converting Other Languages to Perl 32 12. Examples 33 13. Difference between Perl4 and Perl5 44 14. WWW sites for Perl 45 15. References 46

Chap 1 introduces Perl briefly. It includes "What is Perl?", "Who creates Perl?", "Perl's license" and so on.

Chap 2 ~ Chap 11 introduce basic ideas of Perl. These chapters are based on Perl4. However, you will not feel much different when you use these basic ideas on Perl5. Please check the Chap 14 section for the comparison between Perl4 and Perl5. You can refer to the WWW sites for Perl and/or References for advanced Perl information.

Chap 12 includes links to on line man pages and manuals.

Chap 13 includes examples/answers for 2 famous books -- the Programming Perl (Camel book) and the Learning Perl (Llama book).

Chap 14 compares the difference between Perl4 and Perl5 briefly. Generaly speaking, Perl5 adds more features but the compatibility is very high. You can check Metronet Perl5 Info in the WWW sites for Perl for more Perl5 new features.

Chap 15 includes some WWW Perl sites for you to refer. They may have latest updated information of Perl. You can also download Perl from the FTP Archive listed in this chapter.

1. Introduction to Perl

1.What is Perl?

Perl is abbreviation for "Practical Extraction and Report Language". Perl integrates the best features of Shell programming, C, and the UNIX utilities -- grep, sed, awk and sh.

2.Who created Perl?

The inventor of Perl is Larry Wall.

3.Perl's license

Perl is distributed under the GNU Public License. That means it is essentially FREE.

4.Which platforms can Perl run on?

It can be run on various platforms such as UNIX, UNIX-like, Amiga, Macintosh, VMS, OS/2, even MS-DOS and maybe more in the near future.

5.Where can I get Perl's information?

You can get any information from the USENET newsgroup "comp.lang.perl". There is information for obtaining Perl, solving the problems, ..., etc.. There are also many experts monitoring this news group including the inventor -- Larry Wall. That means you might get response in a minute.

6.How can I begin a perl program?

The simplest way to do it is include the following line at the beginning of your perl file:

#!/bin/perl (or the path Perl located on your system)

Some may have:

#!/usr/local/bin/perl

#!/bin/perl print "Say Hi to Neon!";

#! is the Unix method for specifying a shell program. All perl commands end with a semicolon, ;.

your program : : :

With this line, the shell knows where to look for Perl to run the program.

7.Is it a Perl program or a Perl script?

It's up to you. If you prefer to call it Perl program, then call it program. If you like script more, then call it script.

8.Is Perl difficult to learn?

No, it's not. Many people(including myself) think Perl is easy to learn. There are some reasons: Since most of Perl is derived from some tools, utilities, programming languages that you may be familiar with. For example, you will think it is pretty easy for you if your are familiar with C, shell, awk or sed. You do not need to remember many things to be able to use Perl. For example, you can start your Perl script(program) by the following:

You can get your result right away. You don't have to compile your programs every time after you change something.

9.Should I program everything in Perl?

As a matter of fact, you can do anything in Perl. You, however, should not. Why? You should use the most appropriate tool for your job. Some people use Perl for shell programming, some people use Perl to replace some C programs. Perl, however, is not a good choice for very complex data structures.

<> is a special default. It tells perl to look at the calling command line to see if any files are specified. If they are, read each file in turn. If no files are specified, read from standard input. In either case, put the characters read into the special variable $_. When <> reaches end-of-file, it returns false, which terminates the while loop. $_ is the default operand of any command. In this case, $_ contains the last record read by the <> statement.

In fairness, perl has some down points.

* There are a number of "gotcha's" that are pretty typical Unix characteristics, but would confuse non-programmers. For instance, a number starting with 0 is assumed to be octal.

$num = 010; print "The number is $num. \n";

will print

The number is 8.

2. Scalar Data

A scalar is the simplest kind of data that perl manipulates. A scalar can be either a number or a string of characters.

Numbers Even though we can specify integers, floating-poit number, ..., etc.. However, internally, Perl computes only with double-precision floating-point values.

Strings Single-Quoted Strings A single-quoted string is a sequence of characters enclosed in single quotes. One thing to notice, some special characters like newline will not be interpreted within a single-quoted string. For example:

'hello' # 5 characters 'don\'t' # 5 characters 'hello\n' # 7 characters

Double-Quoted Strings As to a double-quoted string, it is much like a C string. The Backslash Escapes work within the double-quoted strings. The complete list of double-quoted string escapes is listed below:

Escape | Meaning --------+------------------------------------------ \n | Newline \r | Return \t | Tab \f | Formfeed \b | Backspace \v | Verticle tab \a | Bell \e | Escape \cC | CTRL + C \\ | Backslash \" | Double quote \l | Lowercase next letter \L | Lowercase all following letters until \E \u | Uppercase next letter \U | Uppercase all following letters until \E \E | Terminate \L or \U

There are also modifiers like these:

$a = "Big And Little"; $c = \l$a; print $c;

prints "big and little".

Another feature of double-quoted strings is that they are variable interpolated. That means that some variable names within the string are replaced by their current values when the strings are used.

Operators An operator generates a new value from one or more values. In Perl, the operators and expressions are generally a superset of most programming languages such as C. One thing you might be interesting is the operators between Numbers and Stings are different. The comparison is listed below:

Comparison | Numeric | String ------------------------+---------------+--------------- Equal | == | eq Not Equal | != | ne Less Than | < | lt Greater Than | > | gt Less than or Equal To | <= | le Greater Than or Equal To| >= | ge ------------------------+---------------+--------------- Strings that do not consist of numbers have a value of zero.

if ("abc" == "def")

is TRUE, because the strings are numerically zeros. To make this work right you have to have

if ("abc" eq "def")

There are functions for math including:

log($x) exp($x) sqrt($x) sin($x) cos($x) atan2($y,$x)

The only trig functions are sin, cos, and atan2, however, these can easily be used to compute the others.

Let's talk more about the scalar data from now on.

Scalar Variables

A variable is a name for a container that holds one or more values. A scalar variable holds a single scalar value. Scalar variable names lead by a dollar sign($) followed by a letter and then possibly more letters, digits or underscores. Be careful, uppercase and lowercase letters are distinct. That means $H and $h are different variables.

Operators on Scalar Variables

Assignment operator

The operator we use most commonly is the assignment. The examples are listed below:

$a = 5; # assign 5 to $a $b = 4; # assign 4 to $b $c = $a * $b; # assign 20 to $c $d = "Hello, World"; # assign a string to $d

Binary Assignment Operator

Like C, Perl has a shorthand for the operation of altering a variable -- the binary assignment operator. The following expressions are equivalent:

$a = $a + 2; # Without the Binary Assignment Operator $a += 2; # With the Binary Assignment Operator

Autoincrement(++) and Autodecrement(--)

Like C, Perl provides the Autoincrement and Autodecrement to simplify the expressions. The following expressions are equivalent:

$a +=1; # With the Binary Assignment Operator $a++; # With Postfix Autoincrement $++a; # With Prefix Autoincrement

$b = 3; $c = $b++; # $b is 4 and $c is 3 after this expression $d = ++$b; # both $b and $d are 5 after this expression

Same to Autodecrement.

chop() Operator

This operator removes the last character from the string variable.

$s = "Hello"; chop($s); # $s becomes "Hell"

It is useful when you read a value from < STDIN >. You can use chop() operator to remove the newline character which may cause problems in the future.

Interpolation of Scalar into Strings

Let the following examples explain what it means:

$a = "zoo"; $b = "elephants"; $c = "We can see $b in the $a."; # $c is "We can see elephants in the zoo." now

print() Operator

Whenever you want to print something on screen, you can just use the print() operator.

3. Array and List Data and Associative Arrays

An array is an ordered list of scalar data. Each element of the array is a separate scalar variable with the corresponding value.

Array Variables

Array variables hold a single array value(0 or more scalar value). The array variable names are similar to scalar variable names except the leading character. The scalar variable name begins with a dollar sign($) but array variable names are: List Array It begins with an at sign(@). Associative Array It begins with a percent sign(%).

Operators for a List Array:

Assignment(=)

The assignment operator gives an array variable a value. It is an equal sign(=) like scalar assignment operator. Perl decides whether the assignment operator is scalar or array assignment according to the variables it will assign to.

@items refers to the entire array items.

$items[1] refers to the scaler value which is the second item in the array items. Linear arrays start with the index 0.

$#items is the number of items in @items starting from 0.

There can be completely separate and unrelated variables $x, @x. %x, and &x, not to mention $X, @X, %X and &X.

There are special variables, the most important of which are $_, @_,and @ARGV.

$_ is the default scalar value. If you do not specify a variable name in a function where a scalar variable goes, the variable $_ will be used. This is a very heavily used feature of perl.

@_ is the list of arguments to a subroutine.

@ARGV is the list of arguments specified on the command line when the program is executed.

Here are some examples:

@test1 = ("Hello", "World");# It has 2 elements now. @test2 = @test1; # test2 is the same as test1. @test3 = (@test1,"pal"); # test3 is ("Hello","World","pal")

($t1, $t2) = (6, 12); # assign 6 to $t1, 12 to $t2 ($t1, $t2) = ($t2, $t1); # swap $t1 and $t2 @test = (1, 2, 3); @test = (0, @test); # @test is now (0, 1, 2, 3) @test = (@test, 4); # @test is now (0, 1, 2, 3, 4) ($tmp, @test) = @test; # @test is now (1, 2, 3, 4), $tmp is 0

$length = @test; # $length is 4 now $length = (@test); # $length is 1 now(the first element of @test)

Assigning an ARRAY to a SCALER will give the number of items in the ARRAY.

@items = (10, 20, 30); $i = @items; print "$i";

will print "3".

Element Access

We deal with the whole array by the assignment operator so far. What if we want to access some specific elements of the array? To do this, we need to use subscripting operator to refer an array element by an index. The number begins at 0 and increases 1 for each element. Here are some examples:

@test = (1, 2, 3); $t1 = @test[2]; # $t1 is 3 now $test[1] = 6; # @test is (1, 6, 3) now $test[0]++; # @test is (2, 6, 3) now $test[0] *= 5; # @test is (10, 6, 3) now

push() and pop()

These 2 operations act like a stack operation. push() will insert an element at the end of the array which is being operated. pop() will remove the last element from the array. Here are some examples:

@test = (1, 3, 5); push(@test, 2, 4, 6); # @test is (1, 3, 5, 2, 4, 6) now $last = pop(@test); # @test is (1, 3, 5, 2, 4) and $last is 6 now

Both push() and pop() will take an array variable name as the first argument.

shift() and unshift()

Unlike push() and pop() do things at the end of the array, unshift() and shift() do things at the beginning of the array. Here are some examples:

unshift(@test, 7, 8); # @test is (7, 8, 1, 3, 5, 2, 4) $first = shift(@test); # @test is (8, 1, 3, 5, 2, 4) and $first is 7

reverse()

This operator reverses the order of the element if the array. For example:

@test1 = (1, 2, 3); @test2 = reverse(@test1); # @btest2 is (3, 2, 1)

However, @test1 is unchanged. reverse() works on a copy not the original one. If you want to reverse the array, do the following:

@test1 = reverse(@test1);

sort()

It sorts the elements of the array as single strings in ascending ASCII order without changing the original list. For example:

@test1 = (10, 5, 3, 7, 47, 8); @test2 = sort(@test1); # @test2 is (10, 3, 47, 5, 7, 8)

Again, if you want to change the original list, do the following:

@test1 = sort(@test1);

Associative Arrays

An associative array is just like a list array. The difference between an associative array and a list array is that the list array uses non-negative integers as index values but the associative array uses arbitrary scalars. These scalars, also called keys, are used to retrieve the corresponding values from the associative array.

One thing we have to notice is that there is no particular order for the elements of an associative array. Whenever we want to find some specific values, we use the keys to find them. We do not have to worry how we can find them because Perl has the internal order to do this.

Most of time, people want to access the elements of the associative array rather the entire array. At this time, we need the keys to do this. The associative array is represented as:

%test

and an element of an associative array is represented as:

$test{$key} # Notice! The leading character is a dollar sign($) and so is the key.

How do we create and/or update an associative array? Here are some examples:

$test{1} = "Hello"; # creates key 1 and value "Hello" $test{2} = 100; # creates key 2 and value 100

We can also assign the key-value pairs to a list array.

@test1 = %test; # @test1 is either (1, "Hello", 2, 100) or (2, 100, 1, "Hello")

The order of the key-value pair is arbitrary and cannot be controlled. Perl has its own logic to have more efficient access. Of course, the list array can copy its values to an associative array.

%test2 = @test1; # %test2 is just like %test now.

Operators for Associative Arrays:

keys()

This operator can be used while we want to access a list of keys of the associative array. As a matter of fact, it returns the odd-number (1, 3, 5, 7, 9, ...) elements of the array.

@test3 = keys(%test); # @test3 is either (1, 2) or (2, 1)

values()

Instead of returning the keys of the associative array, this operator returns values of the associative array. That is, it returns the even-number (2, 4, 6, 8, ...) elements of the array.

@test4 = values(%test); # @test4 is either ("Hello", 100) or (100, "Hello")

each()

You can, of course, access both keys and values of the associative array by using each() operator.

$personinfo{"011-88-6257"} = "John"; $personinfo{"323-56-2943"} = "Tom"; $personinfo{"242-54-2489"} = "Sam"; print "SSN \t\t NAME\n"; print "----------------------\n"; while ( ($ssn, $name) = each(%personinfo) ) { print "$ssn \t $name\n"; }

The result looks like:

SSN NAME ---------------------- 242-54-2489 Sam 011-88-6257 John 323-56-2943 Tom

As mentioned above, the order of key-value pair is arbitrary. In this case, we can see the result. We created the associative array by "John --> Tom --> Sam" but got "Sam --> John --> Tom".

We borrowed the while() statement that we will discuss later on in the section of control structures. delete

What if we want to remove elements from an array? No problem. Perl provides the delete operator to do this. However, it will remove both key and value from the associative array. We apply the previous example to explain. If we add the following lines into the previous program, we will get the following result:

print %personinfo,"\n"; delete $personinfo{"011-88-6257"}; print %personinfo,"\n";

The result looks like:

SSN NAME ---------------------- 242-54-2489 Sam 011-88-6257 John 323-56-2943 Tom 242-54-2489Sam011-88-6257John323-56-2943Tom 242-54-2489Sam323-56-2943Tom

4. Control Structures

There are several statements provided by Perl.

if/unless statement

This is like other structured programming languages.

if (expression) { statement 1; statement 2; : : } else { statement a; statement b; : : }

There is also another form of if statement:

unless (condition) { true branch }

One thing we have to know is that the control expression is evaluated for a string value. That means, there will be no change if it is already a string but it will be converted to a string if it is a number.

On the other hand, if you want to do the reverse way, you can apply unless instead of if statement. The meaning of unless statement is "If the expression is not true, then ...".

You can also apply elsif as many times as you wish to expand your if/else statement.

if (expression 1) { : : } elsif (expression 2) { : : } else { : : }

perl has file test operators like shell scripts. perl has an extended set to tests such as:

-T true if file is text -B true if file is binary -M days since file modified -A days since file accessed -C days since file created

Other forms of the if-command are not common in other computer languages, but can be quite useful. A good example is the postfix if.

next if $var == 1;

A useful form of logic uses || or && in a command:

open (IN," Error Messages:

die is used to print an error message and then exit.

warn is used to print an error message, but continue.

This is a Unix-ism, not just a perl-ism, and it is worth explaining in some detail. This works in shell scripts, and is a handy way of writing sh/csh independent scripts.

The || is an "OR". perl needs to find out if either of the statements are "TRUE".

If the open command succeeds, it will be TRUE. Since it is TRUE, the OR will automatically be true and perl does not execute the second command to see if it is TRUE or not.

If the open fails, it returns FALSE, and perl has to see if the second command is true or false. The result doesn't matter.

while/until statement

We have seen how to use while statement in the previous section. We, now, formally introduce its syntax:

while (expression) { statement 1; statement 2; : : }

We will execute the statements inside the brackets if the expression is true via while statement. However, we may want to do the other way. Instead of true value of the expression, we can let until statement test the false value and do the execution.

until (expression) { statement 1; statement 2; : : }

for statement

This statement is pretty much like C's for statement. Here is the syntax:

for (initial_exp; test_exp; increment_exp) { statement 1; statement 2; : : }

foreach statement

This statement is pretty much like the C-shell's foreach. It takes a list of values and assigns them one at a time to a scalar variable and execute a block of statement. The syntac is:

foreach $s (@list) { statement 1; statement 2; : : } foreach $i (1 .. 100) { commands }

The foreach (1 .. 100) in the second foreach form is a range. It is equivalent to for ($i = 1; $i < 100; $i++). If no index is specified, the index will be in $_.

The commands next, last, redo, and continue are used to escape from inside a loop.

while (condition) { next if $count++ = 1; last if $record =~ /^END/; commands }

5. Basic I/O

Input from < STDIN >

The usage is:

$test1 = < STDIN >; @test2 = < STDIN >;

The first one reads a value from the standard input terminated with a newline character. The second one reads as many lines as you want to until you press "CTRL+D" to terminate it.

Diamond Operator (<>)

Perl also provides another way to read input via the diamond operator(<>). The difference between the previous one and this one is it reads data from file or files which are specified on the command line. However, if you don't specify any filename on the command line, it will read data from standard input.

Output to STDOUT

Perl uses two operators to write to standard output:

print

This is the normal output operator. The usage is like:

print "Hello, World.\n"; print 123+456;

printf

This is the formatted output. Again, it is pretty much like C's printf operator. The usage is like:

$name = "John"; $age = 23; $ssn = "121-66-3214"; printf "My name is %10s. I am %3d years old. My SSN is %11s.", $name, $age, $ssn;

The result looks like:

My name is John. I am 23 years old. My SSN is 121-66-3214. Formats :

Perl also provides the notion of a simple report template which is called format. A format in Perl contains two parts: constant part (the column header, labels, fixed text, ...) and variable part (current data).

Using a format consists of doing three things:

1.Defining a format. 2.Loading up the data to be printed into the variable portions of the format (fields). 3.Invoking the format.

Usually, the first step is done once and the other two are done repeatedly.

Defining a format

You need to use a format definition if you want to have a format. The format definition can be anywhere in the program text. A format definition looks like:

format formatname = fieldline value_1, value_2, value_3 fieldline value_4, value_5 .

Here comes an example for an address label:

format ADDRESSLABEL = ============================= @<<<<<<<<<<<<<<<<<<<<<<<<<<<< $name @<<<<<<<<<<<<<<<<<<<<<<<<<<<< $address @<<<<<<<<<<<<, @< @<<<<<<<<<< $city, $state, $zipcode ============================= .

. ends a format description

Invoking a format

You may want to invoke the format definition in your Perl program. You can do it by using the write operator.

$name = "John Starks"; $address = "1234 Erie Blvd."; $city = "Syracuse"; $state = "NY"; $zipcode = "13205"; write ADDRESSLABEL;

=============================


Introduction to Perl Contd...:
Back To Main Page: