Re: [code] [textadept] Improvements to C and Rust lexers

From: Gabriel Bertilson <>
Date: Fri, 20 Sep 2019 15:08:51 -0500

> I'm thinking of just doing away with the `#lexer.starts_line('#')` bit of the lexer. Off the top of my head I cannot think of a legitimate case of `#` in the middle of a line that would cause an adverse effect. Sure, you have the "stringize" macros, but the likelihood of one being followed by a preproc keyword seems small.
> The performance loss of trying to match a preproc rule first (using a function no less) is too hard to stomach.

Hmm, but the function won't be called very often because lpeg.Cmt only
calls the function if the pattern in its first argument matches – in
this case if optional whitespace plus # is found.

But here's a version that uses lookbehind. It allows preprocessor
directives to be lexed after whitespace; in fact I think the
preprocessor rule can be put anywhere because no other rule contains
#. It only calls a function each time # is found. (An inefficiency in
this version is that if there's not a newline before #, it slices the
input, creating a new string; unfortunately string.find doesn't let
you set the end of a match.) Also, spaces or tabs before # are lexed
as whitespace rather than as "preprocessor". I found out that if the #
and following whitespace isn't assigned to a token, it is treated as a
comment or preprocessor token depending on what the next token is, and
that can be exploited to simplify the pattern. But it's probably bad
practice because it's unclear...

— Gabriel

You are subscribed to
To change subscription settings, send an e-mail to
To unsubscribe, send an e-mail to
Received on Fri 20 Sep 2019 - 16:08:51 EDT

This archive was generated by hypermail 2.2.0 : Sat 21 Sep 2019 - 06:49:06 EDT