Re: [code] More lexer nightmares. :(

From: Mitchell <m.att.foicica.com>
Date: Wed, 20 Nov 2013 13:09:50 -0500 (Eastern Standard Time)

Hi Michael,

On Wed, 20 Nov 2013, Michael Richter wrote:

> In the following samples of code, I'm trying to extract the dot-separated
> letters from the rest:
>
>
> - :(FOO.BAR)
> - :S(FOO.BAR)
> - :F(BAR.BAZ)
> - :S(FOO.BAR)F(BAR.BAZ)
> - :F(BAR.BAZ)S(FOO.BAR)
>
> That is to say in the resulting syntax-coloured code I should be seeing
> FOO.BAR and BAR.BAZ highlighted and nothing else.
>
> I cannot for the life of me get this to work.
>
> I have a lexer pattern "dotted_identifier" which works fine. (It's used
> all over the place in my lexer.) If I use this…
>
> ':(' * dotted_identifier * ')'
>
>
> … I get, as expected, :(FOO.BAR) highlighted. If, however, I do anything
> outside of this, I get inexplicable behaviour.
>
> #':(' * dotted_identifier * #')'
>
>
> This will find *any* instance of dotted_identifier and syntax-colour it
> along with the two characters before it. *Any* two characters before it,
> not just ":(".

That's inexplicable indeed as I cannot reproduce it with this block:

   local label_def = l.starts_line(dotted_identifier)
   local label_ref = #P(':(') * dotted_identifier * #P(')')
   local label = l.token(l.LABEL, label_def + label_ref)

The text

   - :(FOO.BAR)

and even

   ()FOO.BAR()

only highlights FOO.BAR as variables. I'm not seeing the two surrounding
characters highlighted as anything.

Keep in mind that when you use the '#' operator, you are telling LPeg not
to consume the match and try the next pattern against the same text. In
your case, `#P(':(') * dotted_identifier` will always fail since the
latter is trying to match starting at ':'.

> ':(' * S'SF'^-1 * dotted_identifier * ')'
>
>
> This will match exactly the same as the first pattern I had above. It will
> colour (as expected) :(FOO.BAR), but it will not colour :S(FOO.BAR) or
> :F(BAR.BAZ).

That should probably be

   ':' * S('SF')^1 * '(' * dotted_identifier * ')'

I don't think you want to match ":(SFOO.BAR)" or ":(FFOO.BAR)".

I do see, however, that with your original pattern, the 'S' and 'F' bits
between ':' and '(' were highlighted as variables. It seems that those
characters match your `dotted_identifier` pattern. I'm not sure if that
was intentional or not.

I hope that helps. Sorry if I'm not understanding exactly what you're
trying to achieve though.

Cheers,
Mitchell

-- 
You are subscribed to code.att.foicica.com.
To change subscription settings, send an e-mail to code+help.att.foicica.com.
To unsubscribe, send an e-mail to code+unsubscribe.att.foicica.com.
Received on Wed 20 Nov 2013 - 13:09:50 EST

This archive was generated by hypermail 2.2.0 : Thu 21 Nov 2013 - 06:39:15 EST