Hi,
Cryptkiddy wrote (04 Sep 2011 12:51:46 GMT) :
> The problem I am trying to solve here is that the second Regex
> actually is(or can be) appearing one or multiple lines _before_ the
> first regex in the file, but I need the regex in the correct order,
> otherwise I can't compose the second RegEx.
Fair enough.
>> Also, preferably use \s or [[:space:]], or even [[:space:]]+, instead
>> of "\ ". It's both clearer to read, and likely to be more robust on
>> the long run especially when when parsing files you don't control.
>> The "+" I suggest also allows you to remove the space in the character
>> class that follows.
> changed it to [[:space:]] .
> However the space in the following characterclass is still needed, as we might
> encounter something like:
> "$PORT$ ::1,127.0.0.1 -s 7667 ::1,127.0.0.1 ./webapps/"
> ^ ^ ^ ^
> which has multiple additional spaces inside our character class match
Fair enough.
> Arguably the [[:space:]] before and after the character class could
> be dropped but I thought it adds to clarity.
> As thus I did not put a "+" , as the additional spaces will be caught anyway,
> and the extra ones are really only for the sake of the reader.
> (However, whatever you prefere, shall be done :)
> + $regex = '\nclientApp\.' . $client_app_no . '\.args=([0-9]{1,5})[[:space:]]([0-9s:,.-]|[[:space:]])+[[:space:]]\.\/webapps\/[[:space:]]*\n';
> + # 'clientApp.$NUMBER$.args=$PORT$ ::1,127.0.0.1 ./webapps/'
I beg to disagree. This is not critical in any way. Let me explain.
I think adding the "|[[:space:]]" alternative into the mix creates a
confusion between what belongs to field separators, and what belongs
to what looks like a comma-separated list of IPs.
I think expressing in English what must match will reveal this, e.g.
"literal 'args' followed by an equal sign followed, by a port number,
followed by at least one blank character, followed by a
comma-separated list of IPs", etc. I may be wrong and I may be missing
the intent, but then it's probably because it's not expressed clearly
enough by the regexp itself and/or by comments.
BTW, such regexps are starting to be complicated enough to deserve the
use of the extended RE syntax ("x" flag).
I recommend using that extended syntax to write your regexps on
several lines and document their components along the way like I
showed above.
>> Regarding "[\s]*", if you mean "possibly followed by space(s)", I
>> think "[[:space:]]*" is clearer as it does not require the reader to
>> remember if \s is interpreted or not inside a character class.
> I have changed it to "[[:space:]]" however according to my tests it can
> actually be followed by TABs as well.
> (Probably won't. Doesn't seem likely someone mistakenly puts a TAB there)
[[:space:]] includes TAB. See "POSIX Character Classes" section in
perlrecharclass(1). That's why I asked you to use this instead of
forcing the use of the simple horizontal space, when parsing strings
you don't control.
Bye,
--
intrigeri <intrigeri@???>
| GnuPG key @ https://gaffer.ptitcanardnoir.org/intrigeri/intrigeri.asc
| OTR fingerprint @ https://gaffer.ptitcanardnoir.org/intrigeri/otr.asc
| The impossible just takes a bit longer.