Tech List

[index] [prev] [next] [options] [help]
See the Mailing Lists Page for how to subscribe and unsubscribe.

eprints_tech messages

Please note: this page shows emails that have been sent to the eprints_tech mailing list. Some of these may be spam emails we have failed to filter.

[EP-tech] Character encoding of HTML/HTTP Form submissions

From: ePrints Support <support AT eprints.org>
Date: Thu, 24 May 2001 15:42:21 +0100




I've run into a bit of a problem in my attempts to make eprints
all lovely and UTF-8. 

background:

The basic plan is to ensure that all the inputs are either UTF-8
or clearly encoded so they can be easily turned into UTF-8, then
the database can strictly contain UTF-8.

When the X(HT)ML is output to be a webpage it can be in any 
encoding wanted (currently UTF-8). This is only limited by the
perl libraries, which are still being improved.

All the webpage text in eprints will be loaded from XML files,
in ISO-LATIN-1 for english, but in your favorite encoding 
in theory.

the problem:

I can't find any way to identify what encoding a web browser is
using to send back form data. This has never been a problem for
me as I never really went outside ISO-LATIN-1, but now I want
to support non-latin users, such as greek and cyrillic, what 
should I do? I've poked at the problem with Mozilla and Netscape 4,
to no final, reliable, conclusion.

As far as I'm concerned it should be as back-compatable as
possible for old browsers, but I don't even have a basis for
a solution yet.

Nasty work arounds:
*	a hidden field in the form, the return value of which
	will tell me how the browser encoded the document. Ugh.

*	a selection made by the user (in my experience users will
	screw this up plenty, 80% won't even know what an encoding
	scheme is, nor should they need to)

*	assume that the browser will return a form in the encoding
	of the page the form was in, but this dosn't seem to work
	as I send pages to Moz or N4 in UTF-8 and get the results
	of the form as ISO-LATIN-1

*	Set a default encoding for an archive. Eg. ISO-8859-5 for 
	greek. Except there are THREE DIFFERENT encodings for greek,
	and a greek archive should probably have to understand ALL
	of them (sob). At least the default encoding assumption
	could be used to complement one of the other methods.

Please, if you have any insight into this problem, let me know.


-- 

 Christopher Gutteridge                   support AT eprints.org 
 ePrints Technical Support                +44 23 8059 4833

[index] [prev] [next] [options] [help]