Newsgroups: fj.lang.perl
Path: galaxy.trc.rwcp.or.jp!coconuts.jaist!wnoc-tyo-news!sinfony-news01!news-jp-0.abone.net!np0.iij.ad.jp!nf0.iij.ad.jp!news.iij.ad.jp!othp1.sci.jri.co.jp!othp1!mishima
From: mishima@osa.sci.jri.co.jp (Masahiro Mishima)
Subject: Re: reg exps
In-Reply-To: "Susan Molero"'s message of 3 Feb 1997 21:59:25 GMT
Content-Type: text/plain; charset=US-ASCII
Message-ID: <MISHIMA.97Feb4112126@oppc1>
Sender: news@osa.sci.jri.co.jp (NetNews)
Nntp-Posting-Host: oppc1
Organization: The Japan Research Institute,LTD.,JAPAN
References: <01bc121d$b1e8b220$LocalHost@mihost.adv>
Mime-Version: 1.0 (generated by tm-edit 7.59)
Date: Tue, 4 Feb 1997 02:21:26 GMT
Lines: 72
Xref: galaxy.trc.rwcp.or.jp fj.lang.perl:1695
X-originally-archived-at: http://galaxy.rwcp.or.jp/text/cgi-bin/newsarticle2?ng=fj.lang.perl&nb=1695&hd=a
X-reformat-date: Mon, 18 Oct 2004 15:18:22 +0900
X-reformat-comment: Tabs were expanded into 4 column tabstops by the Galaxy's archiver. See http://katsu.watanabe.name/ancientfj/galaxy-format.html for more info.

Hi Susan,

Looks like you intended to name subpatterns by putting $1 or $2 after
each pair of parentheses.  Actually, each subpattern enclosed by
parentheses will be numbered automatically by its order.

In article <01bc121d$b1e8b220$LocalHost@mihost.adv> "Susan Molero" <susanmolero@adv.es> writes:

>  if (/TI:(.*) $1PD/) {   # $1 should contain everything 
>   # between TI: and PD, i.e. the 
>   # value of field TI.
>   print "\nTI:$1\n\n";
>  }
>  if (/PD:(.*) $1TX/) { #try the same for PD as for TI before
> #but this time doesn't go inside
> #this 'if' 
>   print "\nPD:$1\n\n";
>  }

It's natural.  Because $1 contains the value corresponding to the
pattern enclosed by the 1st pair of parentheses in the last pattern
matching.  You should remove $1 from each pattern.

>  The following should print subfield '1.' of field TX:

>  if (/TX:(.*) $1 OA/) { # get in $1 the whole field TX.
>  # up to here, it works as expected 
>   if ($1=~/1.(.*) $2 2/) {  # get everything between 1.
>      # and 2., i.e., field 1, and
>      # store it in $2. 
>    print "\n2.:$2\n\n";
>   }
>  }

>  But, $2 seems to be empty, instead of having stored the string 'Field 1.'

It's also natural.  Because there were no 2nd pair of parentheses in
the last pattern matching.  Instead, $1 should contain the value of
subfield 1 at the print statement.

How about parsing the whole data at first?
Try this script with the data file name at the first argument.

+++++++++++++++++++++++++++++++++++++++++++++++
#!/usr/local/bin/perl

line:
while(<>){
    # get main field
    if( /^(\w+):(.*)/ ){
$data{$1} = $2;
$field = $1;
next line;
    }
    # get sub field
    if( /^\s+(\d+)\.(.*)/ ){
$data{$field,$1} = $2;
next line;
    }
}

# Now you can refer to each field by saying
# $data{'PD'}, $data{'TX','1'} and so on.
foreach $key ( sort keys %data ){
    print "$key : \"$data{$key}\"\n";
}
-----------------------------------------------

Hope this helps.

Regards,
- mishima
