<div class="gmail_quote">On Thu, Jan 24, 2013 at 10:09 PM, Clinton Ebadi <span dir="ltr"><<a href="mailto:clinton@unknownlamer.org" target="_blank">clinton@unknownlamer.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb"><div class="h5">Zrajm C Akfohg <<a href="mailto:zrajm@klingonska.org">zrajm@klingonska.org</a>> writes:<br>
> On Wed, Jan 23, 2013 at 7:46 AM, Clinton Ebadi <<a href="mailto:clinton@unknownlamer.org">clinton@unknownlamer.org</a>> wrote:<br>> Zrajm C Akfohg <<a href="mailto:zrajm@klingonska.org">zrajm@klingonska.org</a>> writes:<br>
><br>> Using UTF-8 does not seem to work properly when<br>
> I set a variable in the main file (using <!--#set var="X_TITLE"<br>
> value="Innehåll." -->) and then use that variable inside a file<br>
> included with SSI (with <!--#echo var="X_TITLE" -->). <br>
><br>
> An example is here <a href="http://zrajm.org/mat/" target="_blank">http://zrajm.org/mat/</a> (look at the headline -- it's<br>
> supposed to say "Innehåll" = "Contents" in Swedish -- but instead<br>
> there are funny characters instead of the a with a ring above). It is<br>
> as if Apache has decided that the variable has latin-1 content, even<br>
> though addDefaultCharset "utf-8"; is in use).<br>
<br>
</div></div>The document in question has the following meta tag (from .head.shtml):<br>
<br>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"><br>
<br>
Which overrides the HTTP Content-Type.<br>
<br>
I think removing that should fix it; if not let me know, hopefully it's<br>
not a deficiency in domtool or our apache rig (but if it is, I want to<br>
fix it naturally).</blockquote><div><br>After removing the <meta> tag I still get the exact same behavior (still on page <a href="http://zrajm.org/mat/">http://zrajm.org/mat/</a>).</div><div><br></div><div><div> All the relevant files are written in UTF-8, as evidenced by:<br>
<br> $ file -i .foot.shtml .head.shtml index.shtml<br> .foot.shtml: text/html; charset=utf-8<br> .head.shtml: text/html; charset=utf-8<br> index.shtml: text/html; charset=utf-8</div></div><div><br></div><div>
And using wget to see the server headers give me the following</div>
<div><br><div> $ wget -O- --save-headers '<a href="http://zrajm.org/mat/">http://zrajm.org/mat/</a>'</div><div> --2013-01-25 00:58:49-- <a href="http://zrajm.org/mat/">http://zrajm.org/mat/</a></div><div> Resolving <a href="http://zrajm.org">zrajm.org</a> (<a href="http://zrajm.org">zrajm.org</a>)... 69.90.123.70</div>
<div> Connecting to <a href="http://zrajm.org">zrajm.org</a> (<a href="http://zrajm.org">zrajm.org</a>)|69.90.123.70|:80... connected.</div><div> HTTP request sent, awaiting response... 200 OK</div><div> Length: unspecified [text/html]</div>
<div> Saving to: `STDOUT'</div><div> HTTP/1.1 200 OK</div><div> Date: Thu, 24 Jan 2013 23:58:50 GMT</div><div> Server: Apache/2.2.16 (Debian)</div><div> Accept-Ranges: bytes</div><div> Vary: Accept-Encoding</div>
<div> Keep-Alive: timeout=15, max=100</div><div> Connection: Keep-Alive</div><div> Transfer-Encoding: chunked</div><div> Content-Type: text/html; charset=utf-8</div><div> X-Pad: avoid browser bug</div><div>
<br></div> ......<br><br></div><div>So it does look like Apache think the file to be in utf8 format. I would think that apache have some different opinion on the encoding of environment variables, however. May LANG, or the LC_* locale variables need to be set in Apache's environment?<br>
<br>This thread on Stack Overflow [<a href="http://stackoverflow.com/questions/539661/server-side-includes-and-character-encoding">http://stackoverflow.com/questions/539661/server-side-includes-and-character-encoding</a>] suggest using the Apache setting "AddCharset UTF-8 .shtml" -- Though it is not clear whether this will actually fix the problem or not.<br>
<br>I seem to recall (from an older discussion where I had utf-8 problems with mod_autoindex) that "AddCharset" cannot be set from doomtool, a hasty googling seems to confirm this as I can only find mentions of "addDefaultCharset" for doomtool (and not "addCharset").<br>
<br>Interestingly HTML entities seem to work *partially* as well. (&aring; and &#229; works and produces the expected "å"; but &ndash; does not work and is inserted literally on the page, &#8211 [en-dash's numeric entry] is removed completely).</div>
<div><br></div><div>/zrajm</div></div>