# Re: Next generation of lexers

From: Robert <ro....at.web.de>
Date: Mon, 19 Apr 2010 14:11:36 +0200

On Sun, Apr 18, 2010 at 11:22 PM, mitchell <mforal.n....at.gmail.com> wrote:
> Robert,
>
>> As for the documentation:
>> One problem I had (in my first attempts writing a lexer) was that I
>> didn't have an any_char rule. Reducing other lexers I figured this
>> out, but maybe this could be made clearer in the documentation. I'm
>> not sure, but possibly this is mainly a problem with markup languages.
>
> Do you have a suggestion to replace the current doc? For reference:
>
> "You might be wondering what that any_char is doing at the bottom of
> _rules. Its purpose is to match anything not accounted for in the
> above rules. For example, suppose the ! character is in the input
> text. It will not be matched by any of the first 9 rules, so without
> any_char, the text would not match at all, and no coloring would
> occur. any_char matches one single character and moves on. It may be
> colored red (indicating a syntax error) if desired because it is a
> token, not just a pattern."
>
> Mitchell
>
I'm not sure. To give you an example, for the Latex lexer:

% a comment
\begin{document}
Text
\end{document}

If there is no any_char rule, everything after "Text" is not styled any more.
With most lexers this seems not to be a problem because they have an
identifier rule (so I don't see a difference with the lexers I tried
when I skip the any_char rule).

Maybe both the identifier and the any_char rule should be included in
the default template or their importance be emphasized. If I don't
have both, nothing after the first occurrence of something not covered
by a rule gets styled.

I started writing a NSIS lexer (Installer scripting language for
Windows and used for writing Portable Apps launchers :-), based on the
keywords in Scite's nsis.properties. I encountered the following
problem:
The string escape sequence is $\" for a double quote within double quotes. To make this possible I changed the following in the delimited_range function: line 982 in lexer.lua else -- local invalid = lpeg.S(e..f..escape..b) local invalid = lpeg.S(e..f..b) + lpeg.P(escape) range = any - invalid + escape * any end Otherwise$ or backslash are considered as escape characters. The same
occurs in the delimited_range_with_embedded function. There aren't
probably too many languages with long escape sequences, but NSIS has
gotos, too :-)

Another idea for the documentation:
Maybe some info with respect to Textadept could be included in the beginning.
Something like:
When using Textadept you can create a lexers/ directory in your
.textadept/ directory. You can copy a lexer from Textadept's
lexers/ directory to it and change it. It has precedence over the
original lexer. Copying and renaming is a good way to start writing a
lexer with a similar syntax.

And a final question:
I have a variable declaration:
NSIS example:
Var Varname
Or in Lua:
local varname = True

Is it possible to have a rule to color only the variable name? In NSIS
they are later easily identified using the leading $(like in bash). For Lua (or other languages) would it be possible to add variables to have them colored later on? With the NSIS example "Var" is defined as a keyword. I tried different combinations (ordering of rules, inclusion of var as a keyword) but wasn't succesful. At the moment I have this: local var_definition = 'Var' * l.space^1 * l.word * l.newline local var = '$' * l.word
local variable = token('variable', var + var_definition)

Otherwise I can only confirm Russell's opinions, I just can't put it
so eloquently. (Well, not his opinion about shying away from writing a
lexer.)

Robert

--