|
Hi;
My name is Dani, I'm spanish and my english suck's a lot so I'm really sorry if you can't understand something, just tell me and I'll try to tell the same with other words.
Well, that's what I wanna do. I has a string with some content, like that, <div id=id><font color=#000000>foobar</font></div> and I need the substring <font color=#000000>foobar</font>. In order to do it I did that script:
my $foobar = "<div id=1 name=item><font color=#000000>foobar</font></div>";
$foobar =~ s/(<\/+.+>)?$//i; #first replace the </div> at the end and only 1 or 0 matching
$foobar =~ s/^(<+.+>)?//i; #then replace the <div> at the beginning and only 1 or 0 matching
print $foobar."\n";
The problem is that the result is foobar and not <font color=#000000>foobar</font>. Why does it happens? When you use ? metacharacter it must look only for 1 or 0 matching. Maybe it happens becouse perl uses the < from div tag and the > from font tag... but how to fix it? The idea is delete the first HTML tag and preserve the text and tags which it contains (in other words, get the html content in that tag).
Thank you very much to take your time on me.
Bye!
|
|
|
my $foobar = "<div id=1 name=item><font color=#000000>foobar</font></div>";
$foobar =~ s/<div[^>]*>//gi; #removes the beginning div.
$foobar =~ s/<\/div>//gi; #removes the end dir.
|
|
|
The problem is that the tag to delete isn't always a div. I tried with s/(<\w[^>]*>)?//g; and with s/(<)?\w[^>]*(>)?//g; but didn't work... any other idea?
|
|
|
#Okey now im gonna make one simple assumption,
#that you want to keep all: <font something>foobar</font>.
my $foobar = "<div id=1 name=item><font color=#000000>foobar</font></div>";
my @them_foobars = map { /(<font[^>]*>foobar<\/font>)/gi } $foobar;
#if that doesnt work try:
my $foobar = "<div id=1 name=item><font color=#000000>foobar</font></div>";
$foobar =~ s/.*(<font[^>]*>foobar<\/font>)/$1/gi;
|
|
|
Those didn't work... and any tag can be into the tag to delete. The string <div id=id><font color=#000000>foobar</font></div> was an example, but it could be <p id=foo><p id=bar>foobar</p></p> and the result must be <p id=bar>foobar</p>. The idea is to delete only the first tag (that's because of I'm using the ? metacharacter to only match 1 or 0 time).
|
|
|
#delete only the first tag.. that should be this.
my $foobar = "<div id=1 name=item><font color=#000000>foobar</font></div>";
$foobar =~ s/<[^>]+>//i; #notice that its not global, so it should stop after first substitution.
|
|
|
Yeah!! it was correct!
For that case it works:
my $foo = "<div id=\"1\" style=\"background: red none repeat scroll 0%; height: 300px; width: 300px; left: 174px; top: 21px; position: absolute; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial; z-index: 1;\">asd<p>asdasda<font color=\"#FFFFFF\">sdas</font>dasdasdasd</p></div>";
$foo =~ s/<[^>]+>//i;
$foo =~ s/<\/[^>]+>$//i;
print $foo."\n";
And the output is:
dani@gentoo-room ~/feina/wmc $ perl reg_exp.pl
asd<p>asdasda<font color="#FFFFFF">sdas</font>dasdasdasd</p>
Thank you very much :).
|
|
|