Re: [code] [scintillua] Match patterns between embedded lexer start/end?

From: Mitchell <m.att.foicica.com>
Date: Fri, 4 Oct 2013 08:48:51 -0400 (Eastern Daylight Time)

Hi Claire,

On Fri, 4 Oct 2013, Claire Lewis wrote:

> I’m looking for a way to match patterns between the start and end of an
> embedded lexer; that is where the exact end pattern is dependent on the
> contents of the start pattern.
>
> Is there some way (perhaps using a capture?) to safely propagate state
> between them? If so, where/how could I store such state?

This is a very difficult problem. I can think of two ways, but the first
isn't "safe" and the second is dependent on your use case. The first looks
like the following (untested):

   local state = nil

   local start_patt = --[[LPeg pattern matching range of starts]]
   local start_rule = #start * P(function(text, index)
     local matches = {text:match(--[[Lua pattern with captures]], index)}
     state = --[[process matches to get your "state"]]
   end * start
   local end_rule = P(function(text, index)
     if state == --[[one of your starts]] then
       local _, e = text:find(--[[Lua pattern matching end]], index)
       if e then return e + 1 end
     elseif state == --[[another start]] then
       ...
     end
   end)

Other than an LPeg match-time capture (which is emulated above), you
cannot use any normal LPeg captures since they are returned after the
pattern is processed, which is useless to you.

Due to Scintillua's internals, this technique suffers in the instance
where you have two or more embedded ranges with different states and jump
between editing them. The state may not update properly and your end_rule
may not match properly as a result.

The second method depends on how many possibilities there are. If there
are only a few, say `n`, then you could duplicate your embedded lexer `n`
times with `n` different _NAMEs and embed for each possible start and end
combination rule. Any more than a few and you risk hitting the maximum
number of patterns allowed in the final grammar (~32700, imposed by LPeg).
As you can imagine, this method is "safe" and would not rely on state.

Your problem is a very interesting one, but I'm not sure there is a good
solution for it. I hope this sheds some light on something though.

Cheers,
Mitchell

-- 
You are subscribed to code.att.foicica.com.
To change subscription settings, send an e-mail to code+help.att.foicica.com.
To unsubscribe, send an e-mail to code+unsubscribe.att.foicica.com.
Received on Fri 04 Oct 2013 - 08:48:51 EDT

This archive was generated by hypermail 2.2.0 : Sat 05 Oct 2013 - 06:49:51 EDT