Re: [code] partial lexing again

From: Mitchell <>
Date: Thu, 24 Oct 2013 09:27:14 -0400 (Eastern Daylight Time)

Hi Cosmin,

On Thu, 24 Oct 2013, Cosmin Apreutesei wrote:

>> You're right that for at least the HTML lexer whitespace is not enough when
>> you're inside tag elements. I need to fix this. For simpler lexers,
>> whitespace is always enough. Sorry for the confusion and trouble you've been
>> having :(
> Hi Mitchell, thanks for answering. So, for the hypertext lexer at
> least, I would have to find out if the space is inside a tag and if so
> go back to the "tag" styled position.

I committed changes[1] last night that fixes this problem for the most

> The problem is that the whitespace tagging trick is not a reliable way
> to know the language of every token. Consider the case below:
> here i'm in hypertext language<script type="text/javascript"> these
> whitespaces tell me I'm in javascript language
> </script><this-is-a-html-tag but for all I know I'm still in
> javascript lang>
> I think the list of tokens returned by the lexer need to contain
> markers for the beginning and end of each embedded language. This
> could be useful for other purposes as well (eg. select or highlight
> all javascript code etc).

Yes, you've identified a shortcoming to the whitespace method. I'm not
sure fixing it is worth the added overhead. Also, lexers are not parsers;
I feel that they are supposed to be approximations of the code you are
writing. However, if you're feeling adventurous, feel free to submit a



You are subscribed to
To change subscription settings, send an e-mail to
To unsubscribe, send an e-mail to
Received on Thu 24 Oct 2013 - 09:27:14 EDT

This archive was generated by hypermail 2.2.0 : Fri 25 Oct 2013 - 06:42:47 EDT