Re: [textadept] Re: Lexer

From: mitchell <c....at.caladbolg.net>
Date: Mon, 20 Jun 2011 12:28:38 -0400 (Eastern Daylight Time)

Hi Anton,

On Sat, 18 Jun 2011, anton wrote:

> Thanks Mitchell, but not really.
>
> I don't see any difference between
> local preprocessor = token(l.PREPROCESSOR,#P('a') * P('ata'))
> and
> local preprocessor = token(l.PREPROCESSOR, P('ata'))

I apologize. I did not read your previous example more carefully. Let me
try again and I hope I am understanding your question.

You said:

>>> Thanks. But sorry, I still don't get it. Why is this not working?
>>>
>>> local preprocessor = token(l.PREPROCESSOR,#P('a') * P('ta'))

For the text 'sata', and _rules = {
   {'preprocessor',preprocessor},
   {'any_char',l.any_char}
}:

sata
|- search starts here

The first match would be 's' from l.any_char. This advances the search by
one character.

sata
  |

This time your preprocessor pattern matches an 'a', but does not consume
input, that is, it does not advance the search position.

sata
  |- caret stays here, but preprocessor pattern matches so far

The next part of your pattern says to match 'ta'. However, the character
at the current search position is an 'a' so the pattern fails to match and
we're back to matching 'a' as l.any_char and you get no styling.

So my solution was to use #P('a') followed by 'ata'. While this has no
purpose in real life, it demonstrates how you would use # in a normal
circumstance.

> I obviously looked in other lexers but I couldn't find a situation
> where
> one needs to lex some text in one style only if it comes before other
> text,
> whereas this other text should be lexed differently.

I think you're right.

> How can one for instance lex lua comment without lexing the comment
> marker "--"?

I assume you mean styling the '--' differently than the comment text since
you cannot determine what a comment is without lexing its '--' marker
first.

local comment = token(l.DEFAULT, P('--')) * token(l.COMMENT, l.nonnewline^0)

> I also ask about the following thing: can you please explain how this
> works
> local longstring = longstring * P(function(input, index)
> local level = input:match('^%[(=*)%[', index)
> if level then
> local _, stop = input:find(']'..level..']', index, true)
> return stop and stop + 1 or #input + 1
> end
> end)

This pattern matches nested Lua longstrings ([==[ string ]==]). If a [=[
sequence is found with any number of '='s, the function part of the
pattern looks ahead in the string for the ending ']=]' sequence with the
same number of '='s. If such a pattern is found, the function returns the
position at the end of the ']=]' sequence, indicating a match was found.
Otherwise it returns the end of the input, signifying that the rest of the
input should be matched as a longstring.

> What type of functions does P() accept?
> What are the parameters "input" and "index" and what should this
> function return?

The LPeg documentation is best equipped to explain this. Briefly, any
function passed to P takes two arguments: input and index. Input is the
text being matched. Index is the position LPeg is currently at in the
text. If the function returns nil, there is no match. If a number is
returned, that becomes the new position in the text LPeg starts trying to
match other things with.

mitchell

>
> Many thanks,
> anton
>
>
>
>
> On 16 Jun., 04:34, mitchell <c....at.caladbolg.net> wrote:
>> Hi Anton,
>>
>> On Wed, 15 Jun 2011, anton wrote:
>>> Thanks. But sorry, I still don't get it. Why is this not working?
>>
>>> local preprocessor = token(l.PREPROCESSOR,#P('a') * P('ta'))
>>
>> This says "match the character a, but do not consume any input, then match
>> the sequence 'ta'". Naturally this is impossible since text cannot match
>> 'a' and 'ta' at the same position. The solution is to have
>>
>>> local preprocessor = token(l.PREPROCESSOR,#P('a') * P('ata'))
>>
>> Does this help?
>>
>> mitchell
>>
>>
>>
>>
>>
>>
>>
>>> _rules = {
>>> �{ 'whitespace', ws },
>>> �{ 'preprocessor', preprocessor},
>>> �{ 'any_char', l.any_char },
>>> }
>>
>>> on "new sata drives"
>>
>>> The reference says
>>> "The order of the rules is important because of the nature of LPeg.
>>> LPeg tries to apply the first rule to the current position in the text
>>> it is matching. If there is a match, it colors that section
>>> appropriately and moves on. If there is not a match, it tries the next
>>> rule, and so on."
>>> What does
>>> "The '#' operator matches without consuming anything"
>>> mean?
>>> If I have "new sata drives" string
>>> LPeg will apply any_char 3 times, whitespace, any char, then
>>> presumably it should apply
>>> preprocessor to lex ta, and then at what position will it go on?
>>
>>> anton
>>> P.S.: Sorry...
>>
>>> On 15 Jun., 15:16, Robert <ro....at.web.de> wrote:
>>>> On Wed, Jun 15, 2011 at 2:58 PM, mitchell <c....at.caladbolg.net> wrote:
>>>>> Hi Anton,
>>
>>>>> On Wed, 15 Jun 2011, anton wrote:
>>
>>>>>> The main problem is that if I do something wrong in a module, TA
>>>>>> (well, Lua) gives a meaningful
>>>>>> error message. If I do something wrong in a lexer TA simply crashes.
>>
>>>>> Error messages will be printed to STDOUT or STDERR in Linux.
>>
>>>> Also, to quote from the manual:
>>>> "Poorly written lexers have the ability to crash Scintilla, so unsaved
>>>> data might be lost. However, these crashes have only been observed in
>>>> early lexer development, when syntax errors or pattern errors are
>>>> present. Once the lexer actually starts styling text (either correctly
>>>> or incorrectly; it does not matter), no crashes have occurred."
>>
>>>> Robert
>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "textadept" group.
>>> To post to this group, send email to textadept.at.googlegroups.com.
>>> To unsubscribe from this group, send email to textadept+unsubscribe.at.googlegroups.com.
>>> For more options, visit this group athttp://groups.google.com/group/textadept?hl=en.
>>
>> mitchell
>
> --
> You received this message because you are subscribed to the Google Groups "textadept" group.
> To post to this group, send email to textadept.at.googlegroups.com.
> To unsubscribe from this group, send email to textadept+unsubscribe.at.googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/textadept?hl=en.
>
>

mitchell
Received on Mon 20 Jun 2011 - 12:28:38 EDT

This archive was generated by hypermail 2.2.0 : Thu 08 Mar 2012 - 12:11:07 EST