Re: Next generation of lexers

From: Robert <>
Date: Mon, 19 Apr 2010 14:11:36 +0200

On Sun, Apr 18, 2010 at 11:22 PM, mitchell <> wrote:
> Robert,
>> As for the documentation:
>> One problem I had (in my first attempts writing a lexer) was that I
>> didn't have an any_char rule. Reducing other lexers I figured this
>> out, but maybe this could be made clearer in the documentation. I'm
>> not sure, but possibly this is mainly a problem with markup languages.
> Do you have a suggestion to replace the current doc? For reference:
> "You might be wondering what that any_char is doing at the bottom of
> _rules. Its purpose is to match anything not accounted for in the
> above rules. For example, suppose the ! character is in the input
> text. It will not be matched by any of the first 9 rules, so without
> any_char, the text would not match at all, and no coloring would
> occur. any_char matches one single character and moves on. It may be
> colored red (indicating a syntax error) if desired because it is a
> token, not just a pattern."
> Mitchell
I'm not sure. To give you an example, for the Latex lexer:

% a comment

If there is no any_char rule, everything after "Text" is not styled any more.
With most lexers this seems not to be a problem because they have an
identifier rule (so I don't see a difference with the lexers I tried
when I skip the any_char rule).

Maybe both the identifier and the any_char rule should be included in
the default template or their importance be emphasized. If I don't
have both, nothing after the first occurrence of something not covered
by a rule gets styled.

I started writing a NSIS lexer (Installer scripting language for
Windows and used for writing Portable Apps launchers :-), based on the
keywords in Scite's I encountered the following
The string escape sequence is $\" for a double quote within double
quotes. To make this possible I changed the following in the
delimited_range function:
line 982 in lexer.lua
-- local invalid = lpeg.S(e..f..escape..b)
      local invalid = lpeg.S(e..f..b) + lpeg.P(escape)
    range = any - invalid + escape * any
Otherwise $ or backslash are considered as escape characters. The same
occurs in the delimited_range_with_embedded function. There aren't
probably too many languages with long escape sequences, but NSIS has
gotos, too :-)

Another idea for the documentation:
Maybe some info with respect to Textadept could be included in the beginning.
Something like:
When using Textadept you can create a `lexers/` directory in your
`.textadept/` directory. You can copy a lexer from Textadept's
`lexers/` directory to it and change it. It has precedence over the
original lexer. Copying and renaming is a good way to start writing a
lexer with a similar syntax.

And a final question:
I have a variable declaration:
NSIS example:
Var Varname
Or in Lua:
local varname = True

Is it possible to have a rule to color only the variable name? In NSIS
they are later easily identified using the leading $ (like in bash).
For Lua (or other languages) would it be possible to add variables to
have them colored later on?
With the NSIS example "Var" is defined as a keyword. I tried different
combinations (ordering of rules, inclusion of var as a keyword) but
wasn't succesful. At the moment I have this:
local var_definition = 'Var' *^1 * l.word * l.newline
local var = '$' * l.word
local variable = token('variable', var + var_definition)

Otherwise I can only confirm Russell's opinions, I just can't put it
so eloquently. (Well, not his opinion about shying away from writing a


You received this message because you are subscribed to the Google Groups "textadept" group.
To post to this group, send email to
To unsubscribe from this group, send email to
For more options, visit this group at
Received on Mon 19 Apr 2010 - 08:11:36 EDT

This archive was generated by hypermail 2.2.0 : Thu 08 Mar 2012 - 11:43:36 EST