[code] [textadept] iconv bug

From: Erutuon <arboreous.philologist.att.gmail.com>
Date: Tue, 30 Oct 2018 18:37:54 -0500

A while ago I discovered a bug in string.iconv. Converting shorter
strings from UTF-8 to UTF-16 or UTF-32 causes Textadept to crash: for
instance, string.iconv('a', 'utf-16', 'utf-8') or string.iconv('a',
'utf-32', 'utf-8'). For UTF-16, converting strings shorter than 4
bytes reliably causes the crash, while for UTF-32 the figure is 8
bytes, though I'm not sure of the exact limits.

It seems to have to do with there needing to be space for the
byte-order mark (2 bytes in UTF-16, 4 bytes in UTF-32) as well as the
actual characters of the string, and that the current minimum memory
allocation for the converted string is the length of the string being
converted plus one.

I am new to C, but perhaps allocating enough room for the byte-order
mark as well as one code point and a null terminator would fix the
problem. For UTF-32 that would be 9 bytes (4 + 4 + 1); for UTF-16, 7
bytes (2 + 4 + 1, in case the code point needs a surrogate pair). In
my own binding of win-iconv[1], I had the iconv function allocate
MB_LEN_MAX * 2 (5 * 2) to start with, and have no problems converting
shorter strings to UTF-32 and UTF-16.

— Gabriel B.

[1] https://github.com/win-iconv/win-iconv

You are subscribed to code.att.foicica.com.
To change subscription settings, send an e-mail to code+help.att.foicica.com.
To unsubscribe, send an e-mail to code+unsubscribe.att.foicica.com.
Received on Tue 30 Oct 2018 - 19:37:54 EDT

This archive was generated by hypermail 2.2.0 : Wed 31 Oct 2018 - 06:52:32 EDT