Re: [code] [textadept] Encoding and display

From: Gabriel Bertilson <arboreous.philologist.att.gmail.com>
Date: Wed, 24 Jul 2019 15:19:34 -0500

I had to work out a similar problem myself when trying to view Dwarf
Fortress logs in Windows codepage 437. Using `buffer.set_encoding` or
the buffer menu didn't work. I've forgotten my analysis of the
problem, but I came up with this function, which changes the encoding
as desired:

function buffer_reinterpret_encoding(buffer, new_encoding)
  if buffer.encoding == new_encoding then return end
  local ok, err = pcall(function ()
    local text = buffer:get_text()
    text = text:iconv(buffer.encoding, 'UTF-8')
    text = text:iconv('UTF-8', new_encoding) -- possible error here
    buffer.encoding = new_encoding
    buffer:set_text(text)
  end)
  if not ok then
    ui.print('Error while reinterpreting encoding: ' .. tostring(err))
  end
end

The encoding in which the quotation marks are at bytes 0x91-0x94 is CP1252.

I printed the bytes "\x91\x92\x93\x94" to a file, opened it with
Textadept (at which point it was displayed in the encoding
ISO-8859-1), entered `buffer_reinterpret_encoding(buffer, 'CP1252')`
in the command prompt, and the quotation marks showed up as desired.

— Gabriel

On Wed, Jul 24, 2019 at 1:58 PM Qwerky <mr.qwerky.att.gmail.com> wrote:
>
> Hi Mitchell,
>
> Okay, I hear what you are saying, but I have no clue as to encoding, so here is what I did:
>
> Using another editor which allows to load a file while choosing a variety of encodings, I found that it did not display correctly when loading as UTF-8 (or UTF-16, LE or BE, with or without BOM), ISO-8859-1 (or ISO-8859-2), which we already knew. But it did display correctly when loaded as ANSI, or when loaded as a large number of code pages such as CP1250 (but not correctly with a few code pages). (This other editor, by the way, displays correctly when loading the file by default, without specifically forcing a choice.) So I guess we could say that the encoding is ANSI, is that correct?
>
> Also, I looked at the file in hex, and those particular characters (opening quotation, etc.) had single-byte codes with values of 0x91, 0x93, and so forth.
>
> Does all this tell you what the encoding is?
>
> So then, how does one go about setting the encoding? You must remember that this is all very new to me, while to you it is a simple thing. But the only way I could see how to set the encoding, since I couldn't call 'buffer.set_encoding()' directly because I don't know the buffer number, was to edit menu.lua to add another option, which I did. When I tried setting to 'ANSI', I received the message: "invalid encoding(s)"; when setting to 'CP1250', the message "conversion failed".
>
> Looking at the documentation, it says "Valid encodings are ones that GNU iconv accepts." That doesn't mean much to me. Does the failure of the two settings that I tried, mean that TextAdept simply can't handle this file?
>
> Any help would be appreciated. Thanks.
>
> qwerky
>
> On 2019-07-24 11:50, Mitchell wrote:
>
> Hi,
>
> On Wed, 24 Jul 2019, Qwerky wrote:
>
> Hi Mitchell,
>
> Sorry, I should have mentioned that I did try changing the encoding via that
> menu--to ISO-8859-1, to UTF-8, and even to ASCII (the latter of which
> failed). But nothing altered the display?
>
>
> Okay, then you'll have to figure out what encoding your file is in and then try calling `buffer:set_encoding()`.
>
> Cheers,
> Mitchell

-- 
You are subscribed to code.att.foicica.com.
To change subscription settings, send an e-mail to code+help.att.foicica.com.
To unsubscribe, send an e-mail to code+unsubscribe.att.foicica.com.
Received on Wed 24 Jul 2019 - 16:19:34 EDT

This archive was generated by hypermail 2.2.0 : Thu 25 Jul 2019 - 06:33:26 EDT