Re: Next generation of lexers

From: mitchell <mforal.n....at.gmail.com>
Date: Wed, 21 Apr 2010 09:51:43 -0700 (PDT)

Robert,

On Apr 19, 8:11 am, Robert <ro....at.web.de> wrote:
> On Sun, Apr 18, 2010 at 11:22 PM, mitchell <mforal.n....at.gmail.com> wrote:
> > Robert,
>
> >> As for the documentation:
> >> One problem I had (in my first attempts writing a lexer) was that I
> >> didn't have an any_char rule. Reducing other lexers I figured this
> >> out, but maybe this could be made clearer in the documentation. I'm
> >> not sure, but possibly this is mainly a problem with markup languages.
>
> > Do you have a suggestion to replace the current doc? For reference:
>
> > "You might be wondering what that any_char is doing at the bottom of
> > _rules. Its purpose is to match anything not accounted for in the
> > above rules. For example, suppose the ! character is in the input
> > text. It will not be matched by any of the first 9 rules, so without
> > any_char, the text would not match at all, and no coloring would
> > occur. any_char matches one single character and moves on. It may be
> > colored red (indicating a syntax error) if desired because it is a
> > token, not just a pattern."
>
> > Mitchell
>
> I'm not sure. To give you an example, for the Latex lexer:
>
> % a comment
> \begin{document}
> Text
> \end{document}
>
> If there is no any_char rule, everything after "Text" is not styled any more.
> With most lexers this seems not to be a problem because they have an
> identifier rule (so I don't see a difference with the lexers I tried
> when I skip the any_char rule).
>
> Maybe both the identifier and the any_char rule should be included in
> the default template or their importance be emphasized. If I don't
> have both, nothing after the first occurrence of something not covered
> by a rule gets styled.

any_char is in the lexers/template.txt

>
> I started writing a NSIS lexer (Installer scripting language for
> Windows and used for writing Portable Apps launchers :-), based on the
> keywords in Scite's nsis.properties. I encountered the following
> problem:
> The string escape sequence is $\" for a double quote within double
> quotes. To make this possible I changed the following in the
> delimited_range function:
> line 982 in lexer.lua
> else
> --    local invalid = lpeg.S(e..f..escape..b)
>       local invalid = lpeg.S(e..f..b) + lpeg.P(escape)
>     range = any - invalid + escape * any
>   end
> Otherwise $ or backslash are considered as escape characters. The same
> occurs in the delimited_range_with_embedded function. There aren't
> probably too many languages with long escape sequences, but NSIS has
> gotos, too :-)

Okay. In latest scintillua hg. Thanks.

>
> Another idea for the documentation:
> Maybe some info with respect to Textadept could be included in the beginning.
> Something like:
> When using Textadept you can create a `lexers/` directory in your
> `.textadept/` directory. You can copy a lexer from Textadept's
> `lexers/` directory to it and change it. It has precedence over the
> original lexer. Copying and renaming is a good way to start writing a
> lexer with a similar syntax.

Done. Thanks.

> And a final question:
> I have a variable declaration:
> NSIS example:
> Var Varname
> Or in Lua:
> local varname = True
>
> Is it possible to have a rule to color only the variable name? In NSIS
> they are later easily identified using the leading $ (like in bash).
> For Lua (or other languages) would it be possible to add variables to
> have them colored later on?
> With the NSIS example "Var" is defined as a keyword. I tried different
> combinations (ordering of rules, inclusion of var as a keyword) but
> wasn't succesful. At the moment I have this:
> local var_definition = 'Var' * l.space^1 * l.word * l.newline
> local var = '$' * l.word
> local variable = token('variable', var + var_definition)

local ws = token('whitespace', l.space^1)
local keyword = token('keyword', word_match { ... 'Var' ... })
local variable = #P('Var') * keyword * ws^0 * token('variable', P('$'
* l.word))

_rules = {
  ...
  { 'variable', variable }
  { 'keyword', keyword }
  ...
}

Mitchell

>
> Otherwise I can only confirm Russell's opinions, I just can't put it
> so eloquently. (Well, not his opinion about shying away from writing a
> lexer.)
>
> Robert
>
> --
> You received this message because you are subscribed to the Google Groups "textadept" group.
> To post to this group, send email to textadept.at.googlegroups.com.
> To unsubscribe from this group, send email to textadept+unsubscribe.at.googlegroups.com.
> For more options, visit this group athttp://groups.google.com/group/textadept?hl=en.

-- 
You received this message because you are subscribed to the Google Groups "textadept" group.
To post to this group, send email to textadept.at.googlegroups.com.
To unsubscribe from this group, send email to textadept+unsubscribe.at.googlegroups.com.
For more options, visit this group at http://groups.google.com/group/textadept?hl=en.
Received on Wed 21 Apr 2010 - 12:51:43 EDT

This archive was generated by hypermail 2.2.0 : Thu 08 Mar 2012 - 11:43:37 EST