The result looks like: ============================= John Starks 1234 Erie Blvd. Syracuse, NY 13205 ============================= Fieldholders Text Fields You can get left-justified field by using an at sign and followed by left angle brackets(@<<<<). Does it matter with the number of the left angle brackets? Yes, it does. You can hold as many as characters as you specify the number of the left angle brackets plus one. Why? Since the at sign(@) counts. Similarly, we can have right-justified field by @>>>> and centered field by @||||. Numeric Fields There is also a fieldholder for numbers. Instead of (<), (>) or (|), the at sign(@) is followed by hash sign(#). The format definition may look like: format ACCOUNT = Deposit: @#####.## Withdraw: @#####.## Balance: @#####.## $deposit, $withdraw, $deposit-$withdraw . Multiline Fields Perl also provides multiline fields to process more than one line. The fieldholder is denoted by @*. Here is the example: format TEST = ===================== @* $test ===================== . $test = "test1\ntest2\ntest3\ntest4\n"; write; Then, we will get the result: ===================== test1 test2 test3 test4 ===================== The Top-of-Page Format Perl also allows you to have your own top-of-page format definition. It may look like this: format ADDRESSLABEL_TOP = Address label page @< $% . The ($%) is used to display the page number. As a matter of fact, this variable is the number of times the top-of-page format has been called for a particular file handle. 7. Regular Expressions Regular Expressions A regular expression is a pattern, a template to be matched against a string. Regular expressions are used frequently by many UNIX programs, such as awk, ed, emacs, grep, sed, vi and other shells. Perl is a semantic superset of all of these tools. Any regular expression that can be described in one of the UNIX tools can also be written in Perl, but not necessarily using exactly the same characters. In Perl, we can speak of the string test as a regular expression by enclosing the string in slashes. while (<>) { if (/test/) { print "$_"; } } What if we are not sure how many e's between "t" and "s"? We can do the following: while (<>) { if (/te*st/) { print "$_"; } } This means "t" is followed by zero or more e's and then followed by "s" and "t". We, now, introduce a simple regular expression operator -- substitute. It replaces the part of a string that matches the regular expression with another string. It looks like the s command in sed, consisting the letter s, a slash, a regular expression, a slash, a replacement string and a final slash, looks like: s /te*st/result/; Here, again, the $_ variable is compared with the regular expression. If they are matched, the part of the string is replaced by the replacement string ("result"). Otherwise, nothing happens. Pattern A regular expression is a pattern. Single-Character Patterns The simplest and most common pattern-matching character in regular expression is a single character that matches itself. Another common pattern matching character is the dot ".". It matches any single character except the newline character(\n). A pattern-matching character class is represented by a pair of open and close square brackets, and a list of characters inside. If you want to put some special characters like ], -, ..., you need to use backslash(\). There is a shorter expression of a long and consecutive list of numbers or characters. Use a dash(-) to represent. [0123456789] # all digits [0-9] # the same with above [0-9\]] # all digits and right square bracket [a-z0-9] # all lowercase letters and digits [a-zA-Z0-9_] # all letters and digits and underscore There is another negated character class which is reverse to the character class. It leads by a caret(^) character right after the left square bracket. This negated character class matches any single character that is not in the list. For example: [^0-9] # match any single non-digit [^aeiouAEIOU] # match any single non-vowel [^\]] # match any character except a right square bracket Some readers might think, it is bothersome to type so many characters everytime. Is there an abbreviation for digits and/or characters? The answer is "Yes". Perl provides some predefined character classes for your convenience. Construct | Equivalent Class | Negated Construct | Equivalent Negated Class -----------+------------------+-------------------+------------------------- \d (digits)| [0-9] | \D (non-digits) | [^0-9] \w (words) | [a-zA-Z0-9_] | \W (non-words) | [^a-zA-Z0-9_] \s (space) | [\f\n\r\t] | \S (non-space) | [^\f\n\r\t] Grouping Patterns As we saw before, the asterisk(*) can be used in grouping pattern. It means "zero or more" of the character it follows. We, now, introduce other two grouping patterns. The first one is the plus sign(+), meaning "one or more" of the character it follows. The second one is the question mark(?), meaning "zero or one"of the character it follows. We may, sometimes, need to specify the number of characters we want to handle. We, therefore, need the concept of general multiplier. The general multiplier consists of a pair of matching curly braces with one or two numbers inside. The format will look like: /a{3,8}/ # must be found 3 a's to 8 a's /a{3,}/ # means 3 or more a's /a{3}/ # exactly 3 a's /x{0,3}/ # 3 or less a's Now, let's think about those 3 grouping patterns mentioned before. "*" is just like {0,}, "+" is just like {1,} and "?" is like {0,1}. Another grouping pattern operator is a pair of open and close parentheses around any part pattern. /a(.)b\1/; # It can be matched by "axbx" What if there are more than one pair of parentheses in the regular expression? If this is the case, the second pair of parentheses is referenced as \2, and so on. /a(.)b(.)c\1d\2/; # It can be matched by "axbycxdy" Another usage is in the replacement string of a substitute command. $_ = "a xxx b yyy c zzz d"; s/b(.*)c/s\1t/; # $_ becomes "a xxx s yyy t zzz d". Another grouping pattern is alternation in form of a | b | c. It can also be used for multiple characters. As a matter of fact, it is better to use a character class for single character alternatives. /sony|panasonic/; # match either sony or panasonic Here are some more examples of regular expressions, and the effect of parentheses: abc* # abc, abcc, abccc and so on (abc)* # abc, abcabc, abcabcabc and so on ^a|b # a at the beginning of a line or b anywhere ^(a|b) # either a or b at the beginning of a line (a|b)(c|d) # ac, ad, bc, or bd (red|blue)pen # redpen or bluepen Selecting a Different Target Sometimes, we do not want to match patterns with the $_ variable. Perl provides the =~ operator to help us for this problem. $test = "Good morning!"; $test =~ /o*/; # true $test =~ /^Go+/; # also true One thing we have to notice again here is we never store the input into a variable. That means, if you want to match this input again, you won't be able to do so. However, this happens often. Ignoring Case We, sometimes, may want to consider patterns with both uppercase and lowercase. As we know, some versions of grep provides -i flag indicating "ignore case". Of course, Perl has a similar option. You can indicate the ignore case option by appending a lowercase "i" to the closing slash, such as /patterns/i. < STDIN > =~ /^y/i; # accepts both "Y" and "y" Using a Different Delimiter We may, sometimes, meet such a situation: $tmp =~ /\/etc\/fstab/; As we know, if we want to include slash characters in the regular expression, we need to use a backslash in front of each slash character. It looks funny and unclear. Perl allows you to specify a different delimiter character. Precede any nonalphanumeric character with an "m". m@/etc/fstab@ # using @ as a delimiter m#/etc/fstab# # using # as a delimiter Special Read-Only Variables There are three special read-only variables: 1. $&, which is the part of the string that matched the regular expression. 2. $`, which is the part of the string before the part that matched. 3. $', which is the part of the string after the part that matched. For example: $_ = "God bless you."; /bless/; # $` is God " now # $& is "bless" now # $' is " you." now Substitutions We have know the simple form of the substitution operator: s/old_regular_expr/replacement_string/. We now introduce something different. If you want to replace all possible matches instead of just the first match, you can append a g to the closing slash. $_ = "feet feel sleep"; s/ee/oo/g; # $_ becomes "foot fool sloop" You can also use a scalar variable as a replacement string. $_ = "Say Hi to Neon!"; $new = "Hello"; s/Hi/$new/; # $_ becomes "Say Hello to Neon!" You can also specify an alternate target with the =~ operator. $test = "This is a book."; $test =~ s/book/desk/; # $test becomes "This is a desk." The split() and join() operators In Perl, there are two operators used to break and combine regular expressions. They are split() and join() operators. The split() operator The split() operator takes a regular expression and a string and looks for all occurrences of the regular expression within the string. $test1 = "This is a project for CPS600."; @test2 = split(/\s+/,$test1); # split $test1 by using " " as delimiter # @test2 is ("This","is","a","project","for","CPS600.") If we change $test1 to be $_, we can have shorter code. Since /\s+/ is the default pattern. $_ = "This is a project for CPS600."; @test2 = split; # same as @test2 = split(/\s+/,$_) The join() operator The join() operator takes a list of values and combines them together with a glue string between each list element. $test1 = join(/ /, @test2); We can have the original $test1 by the join() operator. One thing we have to notice is the glue string is not a regular expression, it is just an ordinary string of zero or more characters. next if m/^\s*$/; will skip blank lines. For instance, if you had the character string:"SeisWorks 3D" "s3d 2> /dev/null" as is found in launcher.dat, you could use the regular expression: ; if ( m/^\t"(.+)"\s*"(\S+)\s+2>\s*(.+)$/ ) { ($title, $program, $errorFile) = ($1, $2, $3); } to extract the title, program name, and the error file name. The way this works is: ; reads a record. Since it doesn't say where to pu the record, it is stored in $_. m/.../ matches a regular expression. Since it doesn't say what variable to use, it uses $_. ^ matches the beginning of the line \t matches the initial tab. " matches the first " ( starts the first extracted string .+ matches one or more of any character ) closes the first extraction, placing it in $1 " matches the second " \s* matches zero or more spaces or tabs " matches the third " ( starts the second extraction \S+ matches any characters but space or tab ) closes the second extraction, placing it in $2 \s+ matches one or more spaces or tabs. 2 matches 2 > matches > \s* matches zero or more spaces or tabs ( starts the third extraction .+ matches one or more characters ) closes the third extraction, placing it in $3 " matches the fourth " $ matches the end of the line $title = $1; puts the value from $1 into $title. 8. Functions We have seen some system functions such as print, split, join, sort, reverse, and so on. Let's take a look at user defined functions. Defining a User Function A user function, usually called a subroutine or sub, is defined like: sub subname { statement 1; statement 2; statement 3; statement 4; : : : } The subname is the name of the subroutine. It can be any name. The statements inside the block are the definitions of the subroutine. When a subroutine is called, the block of statements are executed and any return value is returned to the caller. Subroutine definitions can be put anywhere in the program. They will be skipped on execution. Subroutine definitions are global, there are no local subroutines. If you happen to have two subroutine definitions with the same name, the latter one will overwrite the former one without warning. Invoking a User Function How can we call a subroutine? We must precede the subroutine name with an ampersand(&) while you are trying to invoke a subroutine. &say_hi; sub say_hi { print "Say Hi to Neon!"; } The result of this call will display "Say Hi to Neon!" on screen. A subroutine can call another subroutine, and that subroutine can call another and so on until no memory left. Return Values Like in C, a subroutine is always part of some expression. The value of the subroutine invocation is called the return value. The return value of a subroutine is the value of the last expression evaluated within the body of the subroutine on each invocation. $a = 5; $b = 5; $c = &sumab; # $c is 10 $d = 5 + &sumab; # $d is 15 sub sumab { $a + $b; } A subroutine can also return a list of values when evaluated in an array context. $a = 3; $b = 8; @c = &listab; # @c is (3, 8) sub listab { ($a, $b); } The last expression evaluated means the last expression which is evaluated rather than the last expression defined in the subroutine. In the following example, the subroutine will return $a if $a > $b, otherwise, return $b. sub choose_older { if ($a > $b) { print "Choose a\n"; $a; } else { print "Choose b\n"; $b; } } Arguments The subroutine will be more helpful and useful if we can pass arguments. In Perl, if the subroutine invocation is followed by a list within parentheses, the list is automatically assigned to a special variable @_ for the duration of the subroutine. The subroutine can determine the number of arguments and the value of those arguments. &say_hi_to("Neon"); # display "Say Hi to Neon!" print &sum(3,8); # display 11 $test = &sum(4,9); # $test is 13 sub say_hi_to { print "Say Hi to $_[0]!\n"; } sub sum { $_[0] + $_[1]; } Excess parameters are ignored. What if we want to add all of the elements in the list? Here is the example: print &sum(1,2,3,4,5); # display 15 print &sum(1,3,5,7,9); # display 25 print &sum(1..10); # display 55 since 1..10 is expanded sub sum { $total = 0; foreach $_ (@_) { $total += $_; } $total; # last expression evaluated } Local Variables in Functions We now know how to use @_ to invoke arguments in the subroutine. Now, you may want to create local versions of a list of variable names in the subroutine. You can do it by local() operator. Here is the sum subroutine with local() operator: sub sum { local($total); # let $total be a local variable $total = 0; foreach $_ (@_) { $total += $_; } $total; # last expression evaluated } When the first body statement is executed, any current value of the global value $total is saved away and a new variable $total is created with an undef value. When the subroutine exits, Perl discards the local variable and restores the previous global value. sub larger_than { local($n, @list); ($n, @list) = @_; local(@result); foreach $_ (@list) { if ($_ > $n) { push(@result, $_); } } @result; } @test1 = &larger_than(25, 24, 43, 18, 27, 36); # @test1 gets (43,27,36) @test2 = &larger_than(12, 22, 33, 44, 11, 55, 3, 8); # @test2 gets (22,33,44,55) We can also combine the first two lines of the above subroutine. local($n, @list) = @_; This is, however, a common Perl like style. Here is a tip about the using of the local() operator. Try to put all of your local() operators at the beginning of the subroutine definition before you get into the main body of the subroutine. 9. Filehandles and File Tests What is a Filehandle? A filehandle is the name in a Perl program for an I/O connection between your Perl process and the outside world. Like block labels, filehandles are used without a special prefix character. It might be confused with some reserve words. Therefore, the inventor of Perl Larry Wall suggests people to use all UPPERCASE letters for the filehandle. Opening and Closing a Filehandle Opening a File In Perl, we can use open() operator to open a filehandle. You can open a file for reading, writing or appending. Here are examples to do these actions: open(FILEHANDLE,"filename"); # open a file for reading open(FILEHANDLE,">outputfile"); # open a file for writing open(FILEHANDLE,">>appendfile"); # open a file for appending Closing a File After you finish with a filehandle, you can use close() operator to close the filehandle. For example: close(FILEHANDLE); Most of times, we want to make sure whether we have opened the file successfully or not. We can use the die() operator to inform us when the opening of a file fails. Usually, we use the following: open(FILEHANDLE,"test") || die "Sorry! Cannot open the file "test".\n"; Using Filehandles Once a filehandle is opened for reading, you can read lines from it just like you can read lines from < STDIN >. Same as < STDIN >, the newly opened filehandle must be in the angle brackets. Here is an example to copy a file to another file: open(FILE1,$test1) || die "Cannot open $test1 for reading"; open(FILE2,">$test2") || die "Cannot create $b"; while (< FILE1 >) { # read a line from file $test1 to $_ print FILE2 $_; # write the line into file $test2 } close(FILE1); close(FILE2); File Tests Sometimes, we may want to know if the file we are gonna process exists, or is readable or writable. We need the file tests to help us at this time. Here is a table containing file tests and their meaning: File Test | Meaning ----------+-------------------------------------------------- -r | File or directory is readable -w | File or directory is writable -x | File or directory is executable -o | File or directory is owned by user -R | File or directory is readable by real user -W | File or directory is writable by real user -X | File or directory is executable by real user -O | File or directory is owned by real user -e | File or directory exists -z | File exists and has zero size -s | File or directory exists and has nonzero size -f | Entry is a plain file -d | Entry is a directory -l | Entry is a symlink -S | Entry is a socket -p | Entry is a named pipe (a "fifo") -b | Entry is a block-special file (a mountable disk) -c | Entry is a character-special file (an I/O device) -u | File or directory is setuid -g | File or directory is setgid -k | File or directory has the sticky bit set -t | isatty() on the filehandle is true -T | File is "Text" -B | File is "Binary" -M | Modification age in days -A | Access age in days -C | Inode-modification age in days You can check a list of filenames to see if they exist by the following method: foreach (@list_of_filenames) { print "$_ exists\n" if -e # same as -e $_ } 10. File and Directory Manipulation Removing a File Perl uses unlink() to delete files. Here are some examples: unlink("test"); # delete the file "test" unlink("test1","test2"); # delete 2 files "test1" and "test2" unlink(< *.ps >); # delete all .ps files like "rm *.ps" in the shell You can also provide the selection from the users. print "Input the filename you want to delete: "; chop($filename = < STDIN >); unlink($filename); Renaming a File We use mv to rename files in the shell. In Perl, we use rename($old, $new). For example: $old = "test1"; $new = "test2"; rename($old, $new); # "test1" is changed to "test2" Creating Alternate Names for a File (Linking) Hard Links In the shell, we use ln old new to generate a hard link. In Perl, we use link("old", "new") to do it. However, there are some limitations to hard links. For a hard link, the old filename can not be a directory and the new alias must be on the same filesystem. Symbolic Links(Symlinks or soft links) In the shell, we use ln -s old new to get a symbolic link. In Perl, we use symlink("old", "new"). When you invoke ls -l on the directory containing a symbolic link, you get an indication of both the name of the symbolic link and where the link points. Perl provides the same information by using readlink(). Making and Removing Directories In the shell, we use mkdir command to make a directory. In Perl, similarly, it provides mkdir() operation. However, Perl adds one additional information in this operation. It can decide the permission at the same time. It takes two arguments: directory name and the permission. For example: mkdir("test", 0755); # It generates a directory called "test" and its permission is "drwxr-xr-x" You can use a rmdir(directory_name) to remove the directory just like rmdir directory_name in the shell. Modifying Permissions Just like chmod command in the shell, Perl has chmod(). It takes two parts of arguments. The first part is the permission number (0644, 0755, ...) and the second part is a list of filenames. For example: chmod(0644,"test1"); # change the permission of "test1" to be "-rw-r--r--" chmod(0644,"test1","test2");# change the permission of both files to be "-rw-r--r--" Modifying Ownership Like chown in the shell, Perl has chown() operation. The chown() operator takes a user ID number(UID), a group ID number(GID) and a list of filenames. For example: Assume "test"'s UID is 1234 and its GID is 56. chown(1234, 56, "test1", "test2"); # make test1 and test2 belong to test and its default group. 11. Converting Other Languages to Perl One of the great things of Perl is that there are some programs converting from different languages to Perl. Converting awk programs to Perl It can be done dy the a2p program provided with the Perl distribution. The usage is: $a2p < awkprog > perlprog Now, you can have the Perl program(script) ready to run. Converting sed programs to Perl This is similar to the previous one. Instead of using a2p, you can use s2p to convert sed programs to Perl programs. Converting shell programs to Perl Many people may ask: "What's about the shell programs?" However, there is no program converts shell programs to Perl programs. The best you can do is try to figure out the shell script and then start with Perl. You can, however, use a quick but dirty translation by putting the major portions of the original script inside the system() calls or backquotes. You may be able to replace some operations with native Perl. For example, replace system("rm test") with unlink("test"). 12. Examples Binary Encoding: pack packs values into a string using a template. $pi = pack("f",3.1415926); puts pi into a floating point number. unpack extracts values from a string using a template. $pi2 = unpack("f",$pi); There is a long list of templates you can use. You can use more than one template at a time to build up or extract binary data from a record. l long 32 bit signed integer L long 32 bit unsigned integer s short 16 bit signed integer S short 16 bit unsigned integer f float 32 bit floating point d double 64 bit floating point A ASCII ASCII string c char a single byte (character) System: There are many system oriented functions including: chmod change file permissions fcntl sets file control options fork creates an independent sub-process. mkdir make a directory http://ajs.com/perl. http://www.metronet.com/perlinfo/scripts.