Re: [code] Re: [textadept] How to use lexer.starts_line?

From: Mitchell <>
Date: Tue, 19 Aug 2014 09:50:15 -0400 (Eastern Daylight Time)


On Tue, 19 Aug 2014, Joshua Krmer wrote:

> On 2014-08-18, 17:17, Mitchell wrote:
>> I think the `ws` token is too greedy, particularly because it matches
>> newline characters as well as space characters. Consider the string "
>> \n a". `ws` would match " \n ", preventing `keyword` from matching
>> properly. At the moment I see two solutions:
>> local ws = token(l.WHITESPACE, S(' \t\v')^1 + S('\r\n\f')^1)
> Thank you, Mitchell, this works. I also found another solution by
> inspecting other lexers. The following pattern works with the original
> ws definition:
> local ws = token(l.WHITESPACE,^1)
> local keyword = l.starts_line(ws^0 * token(l.KEYWORD, S('abc')))
> How it works is beyond me, though. Also, this solution has the
> following problem: If I create a buffer with content "a\n\ta\n" and
> apply my lexer afterwards, the second a is not highlighted. F5 does
> not help. The same happens if you paste the content while the lexer
> is already active. You have to type something on the second line to
> get the proper highlighting.

I still think `ws` is too greedy (matching "\n\t" before the "a"). The
`l.starts_line()` test simply checks the character that proceeds the
current match position is a newline character (Scintilla makes no
guarantee that text passed to a lexer contains an entire line of text). By
the time `keyword` tries to match "a", the `l.starts_line()` test fails,
since the current position is "a" and its preceding "\t" is not a newline

> As a side note, here is another interesting phenomenon I observed during
> my trials. It happens with the following lines (\n added to the
> pattern):
> local ws = token(l.WHITESPACE,^1)
> local keyword = l.starts_line(S(' \t\n')^0 * token(l.KEYWORD, \
> S('abc')))
> Now, if I type the string "a\n\ta" in a buffer and keep typing "a", the
> second line alternates between being highlighted and being not
> highlighted.

I think this stems from `ws` being greedy. Scintilla internally keeps
track of what text needs highlighting and passes only a subset of the
buffer to a lexer. There's probably some sort of alternation going on
behind the scenes.


You are subscribed to
To change subscription settings, send an e-mail to
To unsubscribe, send an e-mail to
Received on Tue 19 Aug 2014 - 09:50:15 EDT

This archive was generated by hypermail 2.2.0 : Wed 20 Aug 2014 - 06:37:10 EDT