Re: [code] [textadept] Encoding and display

From: Qwerky <mr.qwerky.att.gmail.com>
Date: Wed, 24 Jul 2019 15:12:07 -0600

Gabriel,

Thank you so much!  I copied your function to menu.lua, made it a local
function, and added an entry to the Encoding submenu to call it.  When
calling it on my test file, it worked fine, and the file was displayed
properly.  This is a great help!  :-)  Too bad TA doesn't auto-detect
that encoding, as it is very common on my system.

Also, I found out that using the command line (as you did) would work
only if I left the function as global, and not as local; so with a local
function, I'm not sure how I could call it from the command line.

One question, though:  you said you "opened it with Textadept (at which
point it was displayed in the encoding ISO-8859-1)".  I'm wondering how
one can tell what encoding a file is currently displayed as?

Thanks again,

qwerky

On 2019-07-24 14:19, Gabriel Bertilson wrote:
> I had to work out a similar problem myself when trying to view Dwarf
> Fortress logs in Windows codepage 437. Using `buffer.set_encoding` or
> the buffer menu didn't work. I've forgotten my analysis of the
> problem, but I came up with this function, which changes the encoding
> as desired:
>
> function buffer_reinterpret_encoding(buffer, new_encoding)
> if buffer.encoding == new_encoding then return end
> local ok, err = pcall(function ()
> local text = buffer:get_text()
> text = text:iconv(buffer.encoding, 'UTF-8')
> text = text:iconv('UTF-8', new_encoding) -- possible error here
> buffer.encoding = new_encoding
> buffer:set_text(text)
> end)
> if not ok then
> ui.print('Error while reinterpreting encoding: ' .. tostring(err))
> end
> end
>
> The encoding in which the quotation marks are at bytes 0x91-0x94 is CP1252.
>
> I printed the bytes "\x91\x92\x93\x94" to a file, opened it with
> Textadept (at which point it was displayed in the encoding
> ISO-8859-1), entered `buffer_reinterpret_encoding(buffer, 'CP1252')`
> in the command prompt, and the quotation marks showed up as desired.
>
> — Gabriel
>
>
> On Wed, Jul 24, 2019 at 1:58 PM Qwerky <mr.qwerky.att.gmail.com> wrote:
>> Hi Mitchell,
>>
>> Okay, I hear what you are saying, but I have no clue as to encoding, so here is what I did:
>>
>> Using another editor which allows to load a file while choosing a variety of encodings, I found that it did not display correctly when loading as UTF-8 (or UTF-16, LE or BE, with or without BOM), ISO-8859-1 (or ISO-8859-2), which we already knew. But it did display correctly when loaded as ANSI, or when loaded as a large number of code pages such as CP1250 (but not correctly with a few code pages). (This other editor, by the way, displays correctly when loading the file by default, without specifically forcing a choice.) So I guess we could say that the encoding is ANSI, is that correct?
>>
>> Also, I looked at the file in hex, and those particular characters (opening quotation, etc.) had single-byte codes with values of 0x91, 0x93, and so forth.
>>
>> Does all this tell you what the encoding is?
>>
>> So then, how does one go about setting the encoding? You must remember that this is all very new to me, while to you it is a simple thing. But the only way I could see how to set the encoding, since I couldn't call 'buffer.set_encoding()' directly because I don't know the buffer number, was to edit menu.lua to add another option, which I did. When I tried setting to 'ANSI', I received the message: "invalid encoding(s)"; when setting to 'CP1250', the message "conversion failed".
>>
>> Looking at the documentation, it says "Valid encodings are ones that GNU iconv accepts." That doesn't mean much to me. Does the failure of the two settings that I tried, mean that TextAdept simply can't handle this file?
>>
>> Any help would be appreciated. Thanks.
>>
>> qwerky
>>
>> On 2019-07-24 11:50, Mitchell wrote:
>>
>> Hi,
>>
>> On Wed, 24 Jul 2019, Qwerky wrote:
>>
>> Hi Mitchell,
>>
>> Sorry, I should have mentioned that I did try changing the encoding via that
>> menu--to ISO-8859-1, to UTF-8, and even to ASCII (the latter of which
>> failed). But nothing altered the display?
>>
>>
>> Okay, then you'll have to figure out what encoding your file is in and then try calling `buffer:set_encoding()`.
>>
>> Cheers,
>> Mitchell

-- 
You are subscribed to code.att.foicica.com.
To change subscription settings, send an e-mail to code+help.att.foicica.com.
To unsubscribe, send an e-mail to code+unsubscribe.att.foicica.com.
Received on Wed 24 Jul 2019 - 17:12:07 EDT

This archive was generated by hypermail 2.2.0 : Thu 25 Jul 2019 - 06:33:34 EDT