Re: [code] lexer for indentation driven blocks

From: Mitchell <>
Date: Mon, 22 Feb 2016 09:15:05 -0500 (EST)

Hi Carl,

On Sun, 21 Feb 2016, Carl Sturtivant wrote:

> Hello,
> I could use some clues as to how to proceed with some aspects of writing
> an awkward lexer from someone familiar with Lua, Textadept lexers,
> and/or LPEG matching.
> I'm working on a lexer for Diet templates,
> which are used to generate HTML, and can
> embed CSS, JavaScript, Markdown, HTML, and D, as well as plain text,
> lexers for all of which fortunately already exist.

Wow, coordinating all of them into a single lexer sounds like a nightmare!

> A template has a nested block structure. A nested block starts with a
> word indented more than the indentation of the word starting the
> previous line, and continues up to the first line starting with a less
> indented token.
> What's the best way to pattern match the end of a block when matching a
> block? This is necessary because e.g. embedded JavaScript can make up an
> indented block that ends by the above condition, and I need to switch to
> the JavaScript lexer for exactly that region.

I'm having trouble visualizing this. Could you provide an example? By the
way, the markdown lexer is not a _LEXBYLINE lexer and can match leading
indentation for embedded HTML. That may be worth looking into.

> Also, some ways of embedding other languages continue only until the end
> of the current line. Is there a simple way to color those part lines
> using an existing lexer and not overflow the line? I could turn on
> _LEXBYLINE but I don't know what effect that will have on multi-line
> blocks of JavaScript or other embedded text. Is there a clean technique
> to track state while matching?

I think you can just have a newline be an embedded lexer's end token. The
markdown lexer does this with embedded HTML.

Scintilla provides a 32-bit integer per line to track state with. At the
moment, Textadept exposes this via the undocumented
`buffer.line_state[line_num]` property (yes, you can use `buffer` within a
lexer). Usually you'd use bitwise operations to set/check individual bits
for state information. You can find more information here: Eventually I will add a
specific `lexer.line_state[line]` property that can manipulate line state,
because relying on `buffer.line_state` is not ideal.


You are subscribed to
To change subscription settings, send an e-mail to
To unsubscribe, send an e-mail to
Received on Mon 22 Feb 2016 - 09:15:05 EST

This archive was generated by hypermail 2.2.0 : Tue 23 Feb 2016 - 06:50:16 EST