Re: [code] [textadept] How to use lexer.starts_line?

From: Mitchell <m.att.foicica.com>
Date: Mon, 18 Aug 2014 17:17:59 -0400 (Eastern Daylight Time)

Hi Joshua,

On Mon, 18 Aug 2014, Joshua Krmer wrote:

> Hi all,
>
> I am trying to create a lexer and cannot figure out how to use the
> lexer.starts_line function. Let us say that I want to match one of the
> characters a, b or c at the beginning of a line or after some whitespace
> characters at the beginning of a line. Attached is a minimal example
> that does not work. I have also tried several variants with #-prefixed
> patterns without success. It would be great if somebody could give me
> a hint.
>
> Thanks and kind regards,
> Joshua
>
>
> local l = require('lexer')
> local token, word_match = l.token, l.word_match
> local P, R, S = lpeg.P, lpeg.R, lpeg.S
>
> local M = {_NAME = 'test_lexer'}
>
> local ws = token(l.WHITESPACE, l.space^1)
> local keyword = l.starts_line(S(' \t')^0 * token(l.KEYWORD, S('abc')))
>
> M._rules = {
> {'keyword', keyword},
> {'whitespace', ws},
> }
>
> return M

I think the `ws` token is too greedy, particularly because it matches
newline characters as well as space characters. Consider the string " \n
a". `ws` would match " \n ", preventing `keyword` from matching properly.
At the moment I see two solutions:

   local ws = token(l.WHITESPACE, S(' \t\v')^1 + S('\r\n\f')^1)

or

   local ws = token(l.WHITESPACE, l.space)

Both would match the " " before "\n", followed by "\n" by itself. Then
`keyword` would be able to properly match " a" at the beginning of the
line.

Note: The latter may be be slightly more inefficient if the file contains
lots of double-spaces or CR+LF newlines.

I hope this helps.

Cheers,
Mitchell

-- 
You are subscribed to code.att.foicica.com.
To change subscription settings, send an e-mail to code+help.att.foicica.com.
To unsubscribe, send an e-mail to code+unsubscribe.att.foicica.com.
Received on Mon 18 Aug 2014 - 17:17:59 EDT

This archive was generated by hypermail 2.2.0 : Tue 19 Aug 2014 - 06:49:23 EDT