Re: Lexer

From: anton <averbit....at.yandex.ru>
Date: Mon, 20 Jun 2011 11:41:34 -0700 (PDT)

THANKS!

On 20 Jun., 18:28, mitchell <c....at.caladbolg.net> wrote:
> Hi Anton,
>
> On Sat, 18 Jun 2011, anton wrote:
> > Thanks Mitchell, but not really.
>
> > I don't see any difference between
> > local preprocessor = token(l.PREPROCESSOR,#P('a') * P('ata'))
> > and
> > local preprocessor = token(l.PREPROCESSOR, P('ata'))
>
> I apologize. I did not read your previous example more carefully. Let me
> try again and I hope I am understanding your question.
>
> You said:
>
> >>> Thanks. But sorry, I still don't get it. Why is this not working?
>
> >>> local preprocessor = token(l.PREPROCESSOR,#P('a') * P('ta'))
>
> For the text 'sata', and _rules = {
> {'preprocessor',preprocessor},
> {'any_char',l.any_char}
>
> }:
>
> sata
> |- search starts here
>
> The first match would be 's' from l.any_char. This advances the search by
> one character.
>
> sata
> |
>
> This time your preprocessor pattern matches an 'a', but does not consume
> input, that is, it does not advance the search position.
>
> sata
> |- caret stays here, but preprocessor pattern matches so far
>
> The next part of your pattern says to match 'ta'. However, the character
> at the current search position is an 'a' so the pattern fails to match and
> we're back to matching 'a' as l.any_char and you get no styling.
>
> So my solution was to use #P('a') followed by 'ata'. While this has no
> purpose in real life, it demonstrates how you would use # in a normal
> circumstance.
>
> > I obviously looked in other lexers but I couldn't find a situation
> > where
> > one needs to lex some text in one style only if it comes before other
> > text,
> > whereas this other text should be lexed differently.
>
> I think you're right.
>
> > How can one for instance lex lua comment without lexing the comment
> > marker "--"?
>
> I assume you mean styling the '--' differently than the comment text since
> you cannot determine what a comment is without lexing its '--' marker
> first.
>
> local comment = token(l.DEFAULT, P('--')) * token(l.COMMENT, l.nonnewline^0)
>
> > I also ask about the following thing: can you please explain how this
> > works
> > local longstring = longstring * P(function(input, index)
> > local level = input:match('^%[(=*)%[', index)
> > if level then
> > local _, stop = input:find(']'..level..']', index, true)
> > return stop and stop + 1 or #input + 1
> > end
> > end)
>
> This pattern matches nested Lua longstrings ([==[ string ]==]). If a [=[
> sequence is found with any number of '='s, the function part of the
> pattern looks ahead in the string for the ending ']=]' sequence with the
> same number of '='s. If such a pattern is found, the function returns the
> position at the end of the ']=]' sequence, indicating a match was found.
> Otherwise it returns the end of the input, signifying that the rest of the
> input should be matched as a longstring.
>
> > What type of functions does P() accept?
> > What are the parameters "input" and "index" and what should this
> > function return?
>
> The LPeg documentation is best equipped to explain this. Briefly, any
> function passed to P takes two arguments: input and index. Input is the
> text being matched. Index is the position LPeg is currently at in the
> text. If the function returns nil, there is no match. If a number is
> returned, that becomes the new position in the text LPeg starts trying to
> match other things with.
>
> mitchell
>
>
>
>
>
>
>
> > Many thanks,
> > anton
>
> > On 16 Jun., 04:34, mitchell <c....at.caladbolg.net> wrote:
> >> Hi Anton,
>
> >> On Wed, 15 Jun 2011, anton wrote:
> >>> Thanks. But sorry, I still don't get it. Why is this not working?
>
> >>> local preprocessor = token(l.PREPROCESSOR,#P('a') * P('ta'))
>
> >> This says "match the character a, but do not consume any input, then match
> >> the sequence 'ta'". Naturally this is impossible since text cannot match
> >> 'a' and 'ta' at the same position. The solution is to have
>
> >>> local preprocessor = token(l.PREPROCESSOR,#P('a') * P('ata'))
>
> >> Does this help?
>
> >> mitchell
>
> >>> _rules = {
> >>> �{ 'whitespace', ws },
> >>> �{ 'preprocessor', preprocessor},
> >>> �{ 'any_char', l.any_char },
> >>> }
>
> >>> on "new sata drives"
>
> >>> The reference says
> >>> "The order of the rules is important because of the nature of LPeg.
> >>> LPeg tries to apply the first rule to the current position in the text
> >>> it is matching. If there is a match, it colors that section
> >>> appropriately and moves on. If there is not a match, it tries the next
> >>> rule, and so on."
> >>> What does
> >>> "The '#' operator matches without consuming anything"
> >>> mean?
> >>> If I have "new sata drives" string
> >>> LPeg will apply any_char 3 times, whitespace, any char, then
> >>> presumably it should apply
> >>> preprocessor to lex ta, and then at what position will it go on?
>
> >>> anton
> >>> P.S.: Sorry...
>
> >>> On 15 Jun., 15:16, Robert <ro....at.web.de> wrote:
> >>>> On Wed, Jun 15, 2011 at 2:58 PM, mitchell <c....at.caladbolg.net> wrote:
> >>>>> Hi Anton,
>
> >>>>> On Wed, 15 Jun 2011, anton wrote:
>
> >>>>>> The main problem is that if I do something wrong in a module, TA
> >>>>>> (well, Lua) gives a meaningful
> >>>>>> error message. If I do something wrong in a lexer TA simply crashes.
>
> >>>>> Error messages will be printed to STDOUT or STDERR in Linux.
>
> >>>> Also, to quote from the manual:
> >>>> "Poorly written lexers have the ability to crash Scintilla, so unsaved
> >>>> data might be lost. However, these crashes have only been observed in
> >>>> early lexer development, when syntax errors or pattern errors are
> >>>> present. Once the lexer actually starts styling text (either correctly
> >>>> or incorrectly; it does not matter), no crashes have occurred."
>
> >>>> Robert
>
> >>> --
> >>> You received this message because you are subscribed to the Google Groups "textadept" group.
> >>> To post to this group, send email to textadept.at.googlegroups.com.
> >>> To unsubscribe from this group, send email to textadept+unsubscribe.at.googlegroups.com.
> >>> For more options, visit this group athttp://groups.google.com/group/textadept?hl=en.
>
> >> mitchell
>
> > --
> > You received this message because you are subscribed to the Google Groups "textadept" group.
> > To post to this group, send email to textadept.at.googlegroups.com.
> > To unsubscribe from this group, send email to textadept+unsubscribe.at.googlegroups.com.
> > For more options, visit this group athttp://groups.google.com/group/textadept?hl=en.
>
Received on Mon 20 Jun 2011 - 14:41:34 EDT

This archive was generated by hypermail 2.2.0 : Thu 08 Mar 2012 - 12:11:10 EST