Re: [code] [textadept] iconv bug

From: Mitchell <m.att.foicica.com>
Date: Tue, 30 Oct 2018 20:00:02 -0400 (EDT)

Hi Gabriel,

On Tue, 30 Oct 2018, Erutuon wrote:

> A while ago I discovered a bug in string.iconv. Converting shorter
> strings from UTF-8 to UTF-16 or UTF-32 causes Textadept to crash: for
> instance, string.iconv('a', 'utf-16', 'utf-8') or string.iconv('a',
> 'utf-32', 'utf-8'). For UTF-16, converting strings shorter than 4
> bytes reliably causes the crash, while for UTF-32 the figure is 8
> bytes, though I'm not sure of the exact limits.
>
> It seems to have to do with there needing to be space for the
> byte-order mark (2 bytes in UTF-16, 4 bytes in UTF-32) as well as the
> actual characters of the string, and that the current minimum memory
> allocation for the converted string is the length of the string being
> converted plus one.
>
> I am new to C, but perhaps allocating enough room for the byte-order
> mark as well as one code point and a null terminator would fix the
> problem. For UTF-32 that would be 9 bytes (4 + 4 + 1); for UTF-16, 7
> bytes (2 + 4 + 1, in case the code point needs a surrogate pair). In
> my own binding of win-iconv[1], I had the iconv function allocate
> MB_LEN_MAX * 2 (5 * 2) to start with, and have no problems converting
> shorter strings to UTF-32 and UTF-16.

Thanks so much for the report and for your detailed explanation! I will certainly look into this when I have some time.

Cheers,
Mitchell

-- 
You are subscribed to code.att.foicica.com.
To change subscription settings, send an e-mail to code+help.att.foicica.com.
To unsubscribe, send an e-mail to code+unsubscribe.att.foicica.com.
Received on Tue 30 Oct 2018 - 20:00:02 EDT

This archive was generated by hypermail 2.2.0 : Wed 31 Oct 2018 - 06:52:36 EDT