See the Mailing Lists Page for how to subscribe and unsubscribe.
eprints_tech messages
Please note: this page shows emails that have been sent to the eprints_tech mailing list. Some of these may be spam emails we have failed to filter.
[EP-tech] Character encoding of HTML/HTTP Form submissions
From: ePrints Support <support AT eprints.org>
Date: Thu, 24 May 2001 15:42:21 +0100
I've run into a bit of a problem in my attempts to make eprints all lovely and UTF-8. background: The basic plan is to ensure that all the inputs are either UTF-8 or clearly encoded so they can be easily turned into UTF-8, then the database can strictly contain UTF-8. When the X(HT)ML is output to be a webpage it can be in any encoding wanted (currently UTF-8). This is only limited by the perl libraries, which are still being improved. All the webpage text in eprints will be loaded from XML files, in ISO-LATIN-1 for english, but in your favorite encoding in theory. the problem: I can't find any way to identify what encoding a web browser is using to send back form data. This has never been a problem for me as I never really went outside ISO-LATIN-1, but now I want to support non-latin users, such as greek and cyrillic, what should I do? I've poked at the problem with Mozilla and Netscape 4, to no final, reliable, conclusion. As far as I'm concerned it should be as back-compatable as possible for old browsers, but I don't even have a basis for a solution yet. Nasty work arounds: * a hidden field in the form, the return value of which will tell me how the browser encoded the document. Ugh. * a selection made by the user (in my experience users will screw this up plenty, 80% won't even know what an encoding scheme is, nor should they need to) * assume that the browser will return a form in the encoding of the page the form was in, but this dosn't seem to work as I send pages to Moz or N4 in UTF-8 and get the results of the form as ISO-LATIN-1 * Set a default encoding for an archive. Eg. ISO-8859-5 for greek. Except there are THREE DIFFERENT encodings for greek, and a greek archive should probably have to understand ALL of them (sob). At least the default encoding assumption could be used to complement one of the other methods. Please, if you have any insight into this problem, let me know. -- Christopher Gutteridge support AT eprints.org ePrints Technical Support +44 23 8059 4833
[index] [prev] [next] [options] [help]




