Introduction To Perl

          The result looks like:

          =============================
          John Starks
          1234 Erie Blvd.
          Syracuse, NY 13205
          =============================

          Fieldholders

               Text Fields

               You can get left-justified field by using an at sign and followed by left angle brackets(@<<<<). Does it matter with the number  of the left angle brackets? Yes, it does. You can hold as many as characters as you specify the number of the left angle  brackets plus one. Why? Since the at sign(@) counts.

               Similarly, we can have right-justified field by @>>>> and centered field by @||||. 

               Numeric Fields

               There is also a fieldholder for numbers. Instead of (<), (>) or (|), the at sign(@) is followed by hash sign(#). The format definition may look like: 

               format ACCOUNT =
               Deposit: @#####.## Withdraw: @#####.## Balance: @#####.##
               $deposit, $withdraw, $deposit-$withdraw
               .
               Multiline Fields

               Perl also provides multiline fields to process more than one line. The fieldholder is denoted by @*. Here is the example: 

               format TEST =
               =====================
               @*
               $test
               =====================
               .
$test = "test1\ntest2\ntest3\ntest4\n";
               write;

               Then, we will get the result:

               =====================
               test1
               test2
               test3
               test4
               =====================

          The Top-of-Page Format

          Perl also allows you to have your own top-of-page format definition. It may look like this: 

          format ADDRESSLABEL_TOP =
          Address label page @<
          $%
          .

          The ($%) is used to display the page number. As a matter of fact, this variable is the number of times the top-of-page format has  been called for a particular file handle. 

7.  Regular Expressions

     Regular Expressions

     A regular expression is a pattern, a template to be matched against a string. Regular expressions are used frequently by many UNIX  programs, such as awk, ed, emacs, grep, sed, vi and other shells. Perl is a semantic superset of all of these tools. Any regular expression  that can be described in one of the UNIX tools can also be written in Perl, but not necessarily using exactly the same characters.

     In Perl, we can speak of the string test as a regular expression by enclosing the string in slashes. 

         while (<>) {
            if (/test/) {
               print "$_";
            }
         }

     What if we are not sure how many e's between "t" and "s"? We can do the following: 

         while (<>) {
            if (/te*st/) {
               print "$_";
            }
         } 

     This means "t" is followed by zero or more e's and then followed by "s" and "t".

     We, now, introduce a simple regular expression operator -- substitute. It replaces the part of a string that matches the regular expression  with another string. It looks like the s command in sed, consisting the letter s, a slash, a regular expression, a slash, a replacement string  and a final slash, looks like: 

         s /te*st/result/;

     Here, again, the $_ variable is compared with the regular expression. If they are matched, the part of the string is replaced by the  replacement string ("result"). Otherwise, nothing happens. 

     Pattern

     A regular expression is a pattern. 

          Single-Character Patterns

          The simplest and most common pattern-matching character in regular expression is a single character that matches itself. Another   common pattern matching character is the dot ".". It matches any single character except the newline character(\n).

          A pattern-matching character class is represented by a pair of open and close square brackets, and a list of characters inside. If  you want to put some special characters like ], -, ..., you need to use backslash(\). There is a shorter expression of a long and  consecutive list of numbers or characters. Use a dash(-) to represent. 

              [0123456789]        # all digits
              [0-9]               # the same with above
              [0-9\]]             # all digits and right square bracket
              [a-z0-9]            # all lowercase letters and digits
              [a-zA-Z0-9_]        # all letters and digits and underscore

          There is another negated character class which is reverse to the character class. It leads by a caret(^) character right after the left  square bracket. This negated character class matches any single character that is not in the list. For example: 

              [^0-9]              # match any single non-digit
              [^aeiouAEIOU]       # match any single non-vowel
              [^\]]               # match any character except a right square bracket

          Some readers might think, it is bothersome to type so many characters everytime. Is there an abbreviation for digits and/or characters? The answer is "Yes". Perl provides some predefined character classes for your convenience. 

          Construct  | Equivalent Class | Negated Construct | Equivalent Negated Class
          -----------+------------------+-------------------+-------------------------
          \d (digits)| [0-9]            | \D (non-digits)   | [^0-9]
          \w (words) | [a-zA-Z0-9_]     | \W (non-words)    | [^a-zA-Z0-9_]
          \s (space) | [\f\n\r\t]       | \S (non-space)    | [^\f\n\r\t]

          Grouping Patterns

          As we saw before, the asterisk(*) can be used in grouping pattern. It means "zero or more" of the character it follows. We, now, introduce other two grouping patterns. The first one is the plus sign(+), meaning "one or more" of the character it follows. The  second one is the question mark(?), meaning "zero or one"of the character it follows.

          We may, sometimes, need to specify the number of characters we want to handle. We, therefore, need the concept of general multiplier. The general multiplier consists of a pair of matching curly braces with one or two numbers inside. The format will look like: 

              /a{3,8}/            # must be found 3 a's to 8 a's
              /a{3,}/             # means 3 or more a's
              /a{3}/              # exactly 3 a's
              /x{0,3}/            # 3 or less a's

          Now, let's think about those 3 grouping patterns mentioned before. "*" is just like {0,}, "+" is just like {1,} and "?" is like {0,1}.

          Another grouping pattern operator is a pair of open and close parentheses around any part pattern. 

              /a(.)b\1/;          # It can be matched by "axbx"

          What if there are more than one pair of parentheses in the regular expression? If this is the case, the second pair of parentheses is   referenced as \2, and so on. 

             /a(.)b(.)c\1d\2/;    # It can be matched by "axbycxdy"

          Another usage is in the replacement string of a substitute command. 

              $_ = "a xxx b yyy c zzz d";
              s/b(.*)c/s\1t/;     # $_ becomes "a xxx s yyy t zzz d".

          Another grouping pattern is alternation in form of a | b | c. It can also be used for multiple characters. As a matter of fact, it is better  to use a character class for single character alternatives. 

              /sony|panasonic/;   # match either sony or panasonic

          Here are some more examples of regular expressions, and the effect of parentheses: 

              abc*                # abc, abcc, abccc and so on
              (abc)*              # abc, abcabc, abcabcabc and so on
              ^a|b                # a at the beginning of a line or b anywhere
              ^(a|b)              # either a or b at the beginning of a line
              (a|b)(c|d)          # ac, ad, bc, or bd
              (red|blue)pen       # redpen or bluepen

     Selecting a Different Target

     Sometimes, we do not want to match patterns with the $_ variable. Perl provides the =~ operator to help us for this problem. 

         $test = "Good morning!";
         $test =~ /o*/;              # true
         $test =~ /^Go+/;            # also true

     One thing we have to notice again here is we never store the input into a variable. That means, if you want to match this input again, you  won't be able to do so. However, this happens often. 

     Ignoring Case

     We, sometimes, may want to consider patterns with both uppercase and lowercase. As we know, some versions of grep provides -i flag  indicating "ignore case". Of course, Perl has a similar option. You can indicate the ignore case option by appending a lowercase "i" to the  closing slash, such as /patterns/i. 

         < STDIN > =~ /^y/i;         # accepts both "Y" and "y"

     Using a Different Delimiter

     We may, sometimes, meet such a situation: 

         $tmp =~ /\/etc\/fstab/;

     As we know, if we want to include slash characters in the regular expression, we need to use a backslash in front of each slash character.  It looks funny and unclear. Perl allows you to specify a different delimiter character. Precede any nonalphanumeric character with an "m". 

         m@/etc/fstab@               # using @ as a delimiter
         m#/etc/fstab#               # using # as a delimiter

     Special Read-Only Variables

     There are three special read-only variables: 1. $&, which is the part of the string that matched the regular expression. 2. $`, which is the  part of the string before the part that matched. 3. $', which is the part of the string after the part that matched. For example: 

         $_ = "God bless you.";
         /bless/;
         # $` is God " now
         # $& is "bless" now
         # $' is " you." now

     Substitutions

     We have know the simple form of the substitution operator: s/old_regular_expr/replacement_string/. We now introduce something  different. If you want to replace all possible matches instead of just the first match, you can append a g to the closing slash. 

         $_ = "feet feel sleep";
         s/ee/oo/g;                  # $_ becomes "foot fool sloop"

     You can also use a scalar variable as a replacement string. 

         $_ = "Say Hi to Neon!";
         $new = "Hello";
         s/Hi/$new/;                 # $_ becomes "Say Hello to Neon!"

     You can also specify an alternate target with the =~ operator. 

         $test = "This is a book.";
         $test =~ s/book/desk/;      # $test becomes "This is a desk."

     The split() and join() operators

     In Perl, there are two operators used to break and combine regular expressions. They are split() and join() operators. 

          The split() operator

          The split() operator takes a regular expression and a string and looks for all occurrences of the regular expression within the string. 

              $test1 = "This is a project for CPS600.";
              @test2 = split(/\s+/,$test1);       
              # split $test1 by using " " as delimiter
              # @test2 is ("This","is","a","project","for","CPS600.")

          If we change $test1 to be $_, we can have shorter code. Since /\s+/ is the default pattern. 

             $_ = "This is a project for CPS600.";
             @test2 = split;              # same as @test2 = split(/\s+/,$_)

          The join() operator

          The join() operator takes a list of values and combines them together with a glue string between each list element. 

              $test1 = join(/ /, @test2);

          We can have the original $test1 by the join() operator. One thing we have to notice is the glue string is not a regular expression, it is  just an ordinary string of zero or more characters. 

next if m/^\s*$/;  will skip blank lines.
For instance, if you had the character string:

          "SeisWorks 3D"   "s3d 2> /dev/null"

     as is found in launcher.dat, you could use the regular
     expression:

                                                               
          ;                                          
          if ( m/^\t"(.+)"\s*"(\S+)\s+2>\s*(.+)$/ )            
          {                                                    
               ($title, $program, $errorFile) = ($1, $2, $3);  
          }                                                    
                                                               
     to extract the title, program name, and the error file name.

     The way this works is:

          ;    reads a record.  Since it doesn't say
                         where to pu the record, it is stored in
                         $_.

          m/.../         matches a regular expression.  Since it
                         doesn't say what variable to use, it uses
                         $_.

          ^              matches the beginning of the line
          \t             matches the initial tab.
          "              matches the first "
          (              starts the first extracted string
          .+             matches one or more of any character
          )              closes the first extraction, placing it
                         in $1
          "              matches the second "
          \s*            matches zero or more spaces or tabs
          "              matches the third "
          (              starts the second extraction
          \S+            matches any characters but space or tab
          )              closes the second extraction, placing it
                         in $2
          \s+            matches one or more spaces or tabs.
          2              matches 2
          >              matches >
          \s*            matches zero or more spaces or tabs
          (              starts the third extraction
          .+             matches one or more characters
          )              closes the third extraction, placing it
                         in $3
          "              matches the fourth "
          $              matches the end of the line

          $title = $1;   puts the value from $1 into $title.




8. Functions

     We have seen some system functions such as print, split, join, sort, reverse, and so on. Let's take a look at user defined functions. 

     Defining a User Function

     A user function, usually called a subroutine or sub, is defined like: 

         sub subname {
             statement 1;
             statement 2;
             statement 3;
             statement 4;
                 :
                 :
                 :
         }

     The subname is the name of the subroutine. It can be any name. The statements inside the block are the definitions of the subroutine.  When a subroutine is called, the block of statements are executed and any return value is returned to the caller. Subroutine definitions can  be put anywhere in the program. They will be skipped on execution. Subroutine definitions are global, there are no local subroutines. If you  happen to have two subroutine definitions with the same name, the latter one will overwrite the former one without warning. 

     Invoking a User Function

     How can we call a subroutine? We must precede the subroutine name with an ampersand(&) while you are trying to invoke a subroutine. 

         &say_hi;

         sub say_hi {
            print "Say Hi to Neon!";
         }

     The result of this call will display "Say Hi to Neon!" on screen.

     A subroutine can call another subroutine, and that subroutine can call another and so on until no memory left. 

     Return Values

     Like in C, a subroutine is always part of some expression. The value of the subroutine invocation is called the return value. The return value  of a subroutine is the value of the last expression evaluated within the body of the subroutine on each invocation. 

         $a = 5;
         $b = 5;
         $c = &sumab;                # $c is 10
         $d = 5 + &sumab;            # $d is 15

         sub sumab {
             $a + $b;
         }

     A subroutine can also return a list of values when evaluated in an array context. 

         $a = 3;
         $b = 8;
         @c = &listab;               # @c is (3, 8)

         sub listab {
             ($a, $b);
         }

     The last expression evaluated means the last expression which is evaluated rather than the last expression defined in the subroutine. In  the following example, the subroutine will return $a if $a > $b, otherwise, return $b. 

         sub choose_older {
             if ($a > $b) {
                 print "Choose a\n";
                 $a;
             } else {
                 print "Choose b\n";
                 $b;
             }
         }

     Arguments

     The subroutine will be more helpful and useful if we can pass arguments. In Perl, if the subroutine invocation is followed by a list within  parentheses, the list is automatically assigned to a special variable @_ for the duration of the subroutine. The subroutine can determine the number of arguments and the value of those arguments. 

         &say_hi_to("Neon");         # display "Say Hi to Neon!"
         print &sum(3,8);            # display 11
         $test = &sum(4,9);          # $test is 13

         sub say_hi_to {
             print "Say Hi to $_[0]!\n";
         }

         sub sum {
             $_[0] + $_[1];
         }

     Excess parameters are ignored.

     What if we want to add all of the elements in the list? Here is the example: 

         print &sum(1,2,3,4,5);      # display 15
         print &sum(1,3,5,7,9);      # display 25
         print &sum(1..10);          # display 55 since 1..10 is expanded

         sub sum {
             $total = 0;
             foreach $_ (@_) {
                 $total += $_;
             }
             $total;                 # last expression evaluated
         }

     Local Variables in Functions

     We now know how to use @_ to invoke arguments in the subroutine. Now, you may want to create local versions of a list of variable  names in the subroutine. You can do it by local() operator. Here is the sum subroutine with local() operator: 

         sub sum {
             local($total);          # let $total be a local variable
             $total = 0;   
             foreach $_ (@_) {
                 $total += $_;
             }
             $total;                 # last expression evaluated
         }

     When the first body statement is executed, any current value of the global value $total is saved away and a new variable $total is created  with an undef value. When the subroutine exits, Perl discards the local variable and restores the previous global value. 

         sub larger_than {
             local($n, @list);
             ($n, @list) = @_;
             local(@result);
             foreach $_ (@list) {
                 if ($_ > $n) {
                     push(@result, $_);
                 }
             }
             @result;
         }

         @test1 = &larger_than(25, 24, 43, 18, 27, 36);
         # @test1 gets (43,27,36)
         @test2 = &larger_than(12, 22, 33, 44, 11, 55, 3, 8);
         # @test2 gets (22,33,44,55)

     We can also combine the first two lines of the above subroutine. 

         local($n, @list) = @_;

     This is, however, a common Perl like style. Here is a tip about the using of the local() operator. Try to put all of your local() operators at the  beginning of the subroutine definition before you get into the main body of the subroutine. 


9. Filehandles and File Tests


     What is a Filehandle?

     A filehandle is the name in a Perl program for an I/O connection between your Perl process and the  
outside world. Like block labels,  filehandles are used without a special prefix character. It might be confused with some reserve words. Therefore, the inventor of Perl  Larry Wall suggests people to use all UPPERCASE letters for the filehandle.

     Opening and Closing a Filehandle

          Opening a File

          In Perl, we can use open() operator to open a filehandle. You can open a file for reading, writing or appending. Here are examples  to do these actions: 

              open(FILEHANDLE,"filename");
              # open a file for reading
              open(FILEHANDLE,">outputfile");
              # open a file for writing
              open(FILEHANDLE,">>appendfile");
              # open a file for appending

          Closing a File

          After you finish with a filehandle, you can use close() operator to close the filehandle. For example: 

              close(FILEHANDLE);

     Most of times, we want to make sure whether we have opened the file successfully or not. We can use the die() operator to inform us when  the opening of a file fails. Usually, we use the following: 

        open(FILEHANDLE,"test") || die "Sorry! Cannot open the file "test".\n";

     Using Filehandles

     Once a filehandle is opened for reading, you can read lines from it just like you can read lines from < STDIN >. Same as < STDIN >, the  newly opened filehandle must be in the angle brackets. Here is an example to copy a file to another file: 

         open(FILE1,$test1) || die "Cannot open $test1 for reading";
         open(FILE2,">$test2") || die "Cannot create $b";
         while (< FILE1 >) {         # read a line from file $test1 to $_
            print FILE2 $_;          # write the line into file $test2
         }
         close(FILE1);
         close(FILE2);

     File Tests

     Sometimes, we may want to know if the file we are gonna process exists, or is readable or writable. We need the file tests to help us at  this time. Here is a table containing file tests and their meaning: 

     File Test | Meaning
     ----------+--------------------------------------------------
         -r    | File or directory is readable
         -w    | File or directory is writable
         -x    | File or directory is executable
         -o    | File or directory is owned by user
         -R    | File or directory is readable by real user
         -W    | File or directory is writable by real user
         -X    | File or directory is executable by real user
         -O    | File or directory is owned by real user
         -e    | File or directory exists
         -z    | File exists and has zero size
         -s    | File or directory exists and has nonzero size
         -f    | Entry is a plain file
         -d    | Entry is a directory
         -l    | Entry is a symlink
         -S    | Entry is a socket
         -p    | Entry is a named pipe (a "fifo")
         -b    | Entry is a block-special file (a mountable disk)
         -c    | Entry is a character-special file (an I/O device)
         -u    | File or directory is setuid
         -g    | File or directory is setgid
         -k    | File or directory has the sticky bit set
         -t    | isatty() on the filehandle is true
         -T    | File is "Text"
         -B    | File is "Binary"
         -M    | Modification age in days
         -A    | Access age in days
         -C    | Inode-modification age in days

     You can check a list of filenames to see if they exist by the following method: 

         foreach (@list_of_filenames) {
            print "$_ exists\n" if -e        # same as -e $_
         }


10. File and Directory Manipulation

     Removing a File

     Perl uses unlink() to delete files. Here are some examples: 

         unlink("test");             # delete the file "test"
         unlink("test1","test2");    # delete 2 files "test1" and "test2"
         unlink(< *.ps >);           # delete all .ps files like "rm *.ps" in 
                                       the shell

     You can also provide the selection from the users.

         print "Input the filename you want to delete: ";
         chop($filename = < STDIN >);
         unlink($filename);

     Renaming a File

     We use mv to rename files in the shell. In Perl, we use rename($old, $new). For example: 

         $old = "test1";
         $new = "test2";
         rename($old, $new);         # "test1" is changed to "test2"

     Creating Alternate Names for a File (Linking)

          Hard Links

          In the shell, we use ln old new to generate a hard link. In Perl, we use link("old", "new") to do it. However, there are some  limitations to hard links. For a hard link, the old filename can not be a directory and the new alias must be on the same filesystem. 

          Symbolic Links(Symlinks or soft links)

          In the shell, we use ln -s old new to get a symbolic link. In Perl, we use symlink("old", "new").

          When you invoke ls -l on the directory containing a symbolic link, you get an indication of both the name of the symbolic link and  where the link points. Perl provides the same information by using readlink(). 

     Making and Removing Directories

     In the shell, we use mkdir command to make a directory. In Perl, similarly, it provides mkdir() operation. However, Perl adds one  additional information in this operation. It can decide the permission at the same time. It takes two arguments: directory name and the  permission. For example: 

         mkdir("test", 0755);        # It generates a directory called "test"    and its permission is "drwxr-xr-x"

     You can use a rmdir(directory_name) to remove the directory just like rmdir directory_name in the shell. 

     Modifying Permissions

     Just like chmod command in the shell, Perl has chmod(). It takes two parts of arguments. The first part is the permission number (0644,  0755, ...) and the second part is a list of filenames. For example: 

         chmod(0644,"test1");        # change the permission of "test1" to be   "-rw-r--r--"
         chmod(0644,"test1","test2");# change the permission of both files to  be "-rw-r--r--"

     Modifying Ownership

     Like chown in the shell, Perl has chown() operation. The chown() operator takes a user ID number(UID), a group ID number(GID) and a  list of filenames. For example: 

         Assume "test"'s UID is 1234 and its GID is 56.

         chown(1234, 56, "test1", "test2");
         # make test1 and test2 belong to test and its default group.


11. Converting Other Languages to Perl



     One of the great things of Perl is that there are some programs converting from different languages to Perl.

          Converting awk programs to Perl

          It can be done dy the a2p program provided with the Perl distribution. The usage is: 

              $a2p < awkprog > perlprog

          Now, you can have the Perl program(script) ready to run. 

          Converting sed programs to Perl

          This is similar to the previous one. Instead of using a2p, you can use s2p to convert sed programs to Perl programs. 

          Converting shell programs to Perl

          Many people may ask: "What's about the shell programs?" However, there is no program converts shell programs to Perl programs. The best you can do is try to figure out the shell script and then start with Perl. You can, however, use a quick but dirty translation by   putting the major portions of the original script inside the system() calls or backquotes. You may be able to replace some  operations with native Perl. For example, replace system("rm test") with unlink("test"). 


12. Examples

Binary Encoding:

          pack      packs values into a string using a template.

                         $pi = pack("f",3.1415926); 

                    puts pi into a floating point number.

          unpack    extracts values from a string using a
                    template.

                         $pi2 = unpack("f",$pi);

          There is a long list of templates you can use.  You can
          use more than one template at a time to build up or
          extract binary data from a record.

                    l    long      32 bit signed integer
                    L    long      32 bit unsigned integer
                    s    short     16 bit signed integer
                    S    short     16 bit unsigned integer
                    f    float     32 bit floating point
                    d    double    64 bit floating point
                    A    ASCII     ASCII string
                    c    char      a single byte (character)

     System:

          There are many system oriented functions including:

          chmod     change file permissions

          fcntl     sets file control options

          fork      creates an independent sub-process.

          mkdir     make a directory


http://ajs.com/perl. 
http://www.metronet.com/perlinfo/scripts. 


   

Introduction To Perl Contd...:
Back To main Page: