Re: [code] how does scintilla trigger syntax highlighting on less than the whole text?

From: Mitchell <m.att.foicica.com>
Date: Tue, 10 Sep 2013 12:45:07 -0400 (Eastern Daylight Time)

Hi Cosmin,

On Tue, 10 Sep 2013, Cosmin Apreutesei wrote:

> Hi,
>
> I'm trying to integrate the scintilua lexers into a text editor of my
> own for the purpose of syntax highlighting. I have found that lexing
> the whole text on each key stroke is too slow. But scintilla doesn't
> have this problem so I assume that scintilla calls the lexer only on
> parts of the text and not on the entire text every time. What is the
> logic by which scintilla decides which part of the text needs
> re-lexing when the text changes? Any hints/pointers on where to look
> in the code appreciated.

As far as I know, Scintilla keeps track of the last correctly "styled"
position in the text and only notifies the lexer to style text past that
point, but within the view. For example, consider this initial view:

    +-------+
1. |foobar |
2. |barfoo |
    +-------+
3. foobaz <- text outside the view
4. barbaz <- text outside the view

Scintilla calls the lexer to style the only visible lines in the view (#1
and 2), ignoring the last two lines (#3 and 4) in the buffer. Then it
marks the end of line 2 is the last correctly styled position. When
scrolling the next line into view:

1. foobar
    +-------+
2. |barfoo |
3. |foobaz |
    +-------+
4. barbaz

Scintilla now calls the lexer to style only line 3 since it knows up to
the end of line 2 is good and line 4 is outside the view. Scintilla marks
the end of line 3 as the last correctly styled position. Etc.

Now when you insert a character, for example like this:

1. foobar
    +-------+
2. |bar-foo| <- character '-' inserted
3. |foobaz |
    +-------+
4. barbaz

Scintilla marks the 'r' before the inserted character as the last
correctly styled character and calls the lexer to style the rest of the
visible text (the latter half of #2 and all of 3).

The caveat here is that the Scintillua lexers need to match full tokens,
not characters at a time. Therefore, Scintillua's Lex() method in
LexLPeg.cxx takes the last correctly styled character position into
account and jumps back until it sees a change in styling. At this point it
knows that it is at the beginning of a token (perhaps you typed a
character in the middle of a string -- Scintillua needs to jump back to
the start of the string for the lexer to match it). Multiple language
lexers need more backtracking: they jump back to a whitespace token, which
helps them infer which language they are in.

I hope that helps,
Mitchell

-- 
You are subscribed to code.att.foicica.com.
To change subscription settings, send an e-mail to code+help.att.foicica.com.
To unsubscribe, send an e-mail to code+unsubscribe.att.foicica.com.
Received on Tue 10 Sep 2013 - 12:45:07 EDT

This archive was generated by hypermail 2.2.0 : Wed 11 Sep 2013 - 06:52:39 EDT