Article 9027 of comp.lang.perl:
Xref: feenix.metronet.com comp.lang.perl:9027
Newsgroups: comp.lang.perl
Path: feenix.metronet.com!news.utdallas.edu!corpgate!bnrgate!bnr.co.uk!uknet!pipex!howland.reston.ans.net!agate!boulder!wraeththu.cs.colorado.edu!tchrist
From: Tom Christiansen <tchrist@cs.Colorado.EDU>
Subject: Re: beginning perl user question
Message-ID: <CIAIqn.9FL@Colorado.EDU>
Originator: tchrist@wraeththu.cs.colorado.edu
Sender: news@Colorado.EDU (USENET News System)
Reply-To: tchrist@cs.colorado.edu (Tom Christiansen)
Organization: University of Colorado, Boulder
References: <2eptsd$pps@pegasus.cc.ucf.edu>
Date: Sun, 19 Dec 1993 16:20:46 GMT
Lines: 174

:-> In comp.lang.perl, jim@pegasus.cc.ucf.edu (Jim Ennis) writes:
:  I am a new Unix Systems Administrator, coming from an IBM VM/SP background
:where REXX is the primary tool for creating 'scripts'.  I am running SunOS 4.1.3
:on a Sparc 10/41 with perl.  I would like to know how to do the following
:task (if it is possible with perl, I have the camel book from O'Reilly but
:I could not find what I needed to know).

Possibly you might consult the LLama Book and look into pattern matching.
I'll see what I can do.

:I would like to read a list of usernames from a file and build delete
:commands so that I can purge student accounts at the end of the semester.
:I am used to the parse and trim commands from REXX where I can parse an
:input line and then trim off any trailing/leading blanks from a character
:string.  

Where in REXX you'd use PARSE, in Perl, you want to use regular expressions
for pattern matching or splitting up into fields.  For example,

    ($var1, $var2, $var3) = split(' ', $line);	

The vars themselves won't have any whitespace in them because we chose to
split on whitespace;  by using the special notation " " instead of the
more common /\s+/, we ignore leading whitespace (like awk).  Here are
other things to look at:

    ($var1, $var2, $var3) = split(' ');		# split $_ into vars
    split(' ');					# split $_ into @_

    split(/\s+/);				# split $_ into @_ *PLUS*
						# keep leading null fields
    split;					# ditto

    @pw = split(/:/, $pwent,10);		# split on colons, up to 10 
						# fields, leading and trailing
						# null fields preserved

Sometimes you'd like to split something up that isn't quite so nicely
formatted.  That means no separator between them that you can describe
easily.    Consider that I want a login-id followed by a phone number
out the variable $line.

    ($name, $phone) = ( $line =~ /(\w+)\s+(\d+)/ );

But what's a phone number?  Probably we want it to have minuses. We'll
assume the line is in $_ for brevity:

    ($name, $phone) = /(\w+)\s+([\d\-]+)/;

It turns out that a phone number is more complicated than that.  This
works a bit better for odd cases, where we add a + or parens and
intervening whitespace:

    ($name, $phone) = /(\w+)\s+([\d\-()\s\+]+)/;

Now, though, your $phone might have characters you don't want in it.
You can throw out all trailing space this way:

    $phone =~ s/\s+$//;

Or merely all non-digits this way:

    $phone =~ s/\D//g;

Although you can make it run faster at the expense of a bit
of memory by using a transliteration template:

    $phone =~ tr/0-9//dc;

Now what if all I wanted was all the numbers from a line?

    @nums =  ($line =~ /(\d+)/g);
    @nums =  /(\d+)/g;			# on $_

Will pull all the numbers from line and put them into @nums.  But
if you mean for numbers to be as in C or Perl notation, where you
can have octal or exponential or negative or hex, then you have 
to get much fancier.


  @nums = /([+-]?)(\d*)(\.(\d*))?([Ee]([+-]?\d+))?/;

Now, if you were using fixed width fields, you'd use unpack(), not split().
What I mean is that you know all the columns but not whether the fields
have blanks in them.  This will pull out three fields, the first being
10 bytes long, the second 20, and the last 15.  Remaining bytes on the
line will be ignored, and trailing white space in each variable will be
trimmed:

    ($var1, $var2, $var3) = unpack("A10 A20 A15", $line);

:I was using the substr command to select a field from the input record 
:from a file, but the field can be variable length (1-8 characters naturally)
:and I wanted to delete trailing blanks from the userid variable so that
:I could build a sed delete command:
:
:/username/d
:
:Actually I want to append an ':' to the username field as I build the
:delete command, so the final output would be:
:
:/username:/d

You may find that trying to get sed to do this for you is a losing battle.
Depends on how many commands you're constructing.  It would be faster
in most cases to simply code it all in Perl.

    $^I = ".BAK";  # make backup files
    while ( <> ) { # process files as argument
	next if /\buser1name:/;
	next if /\buser2name:/;
	next if /\buser3name:/;
	print;
    } 

Notice I've put a word boundary on your user name.  It would be 
much better, though, if you built up a table of usernames to 
delete from your file.  For example:

    foreach $user ( "user1", "user2", "user3" ) {
	$axed{$user}++;
    } 

    $^I = ".BAK";  # make backup files
    LINE: while ( <> ) { # process files as argument
	USER: while ( /(\w+):/g ) {
	    if ( $axed{$1} ) { next LINE }
	} 
	print;
    } 

Or as concise-is-better-freaks might code it:

    @axlist =  ( "user1", "user2", "user3" );
    @axed{@axlist} = (1) x @axlist;
    $^I = ".BAK";  # make backup files

    LINE: while ( <> ) { # process files as argument
	WORD: while ( /(\w+):/g ) {
	    next LINE if $axed{$1};
	} 
	print;
    } 

Or even:

    LINE: while ( <> ) { # process files as argument
	$axed{$1} && next LINE while ( /(\w+):/g;
	print;
    } 

:Can anyone tell me if it is possible to do what I want to do in perl?

Sure, it's possible, but you haven't sufficiently described the precise
task for me to give you an exact answer.  That's one reason why I've
tried to show you many approaches.

:I can write this in the REXX langauge in a few lines (but I have 8+ years
:experience with Rexx and about one week with perl and Unix scripts).
:I could not find any reference to a trim function (which removes blanks
:or other specified characters) in the camel book or the examples that
:were in the book.  Can variable length character strings be handled
:easily with perl?

:If perl cannot do this type of processing, are there any other packages
:that you would recommend.

You might pick up the LLama Book if you can find it.

--tom
-- 
    Tom Christiansen      tchrist@cs.colorado.edu       
      "Will Hack Perl for Fine Food and Fun"
	Boulder Colorado  303-444-3212


