This page describes internationalisation aspects you need to be aware of while writing code for Moovida or Moovida plugins.
Encodings
In an ideal world, everybody would be using unicode (in the same version) everywhere, and we wouldn't have any issues with that. This utopia almost exists in recent GNU/Linux distributions, but we want Moovida to be portable beyond that. Also, the libraries on which we depend do not provide a consistent way of dealing with these issues, and tend to have different behaviours across different operating systems, therefore the programmer have to be very careful when dealing with these things.
Overall, as long as you don't interact with the file system, it's pretty easy: use unicode objects, if the API you're using doesn't support unicode object: fix it, or, if you can't, encode in UTF-8, it should work (do be careful testing that, though).
The really tricky part is when you deal with the file system: you need to pass around a file name, file path or local file uri. As a general advice, and until something better is developed in Moovida's core, you should stick with unicode objects, and convert them when needed to the suitable encoding.
How to deal with encodings in python
Basically, there are two things you need to know.
Encode a unicode string into an str in encoding 'X':
1 x_encoded_string = unicode_string.encode('X')
Decode an str encoded in encoding 'X' into a unicode object:
1 unicode_string = x_encoded_string.decode('X')
At all costs, avoid doing adventurous things like str(unicode_string) or unicode(str_object). This might work on your platform with the string you're using to test, but it WILL break for others.
With some clever types like MediaUri, it should be OK to do unicode(my_media_uri), but it is adventurous to do str(my_media_uri).
Helpers in Moovida
So, thanks to the previous chapter, you know how to convert between unicode objects and strings. Two issues arise: when to convert, and to/from what encoding?
elisa.core.utils.locale_helper contains plenty of useful functions that return the encoding expected by various software components interacting with Moovida. Their names and documentation should make it clear when to use them.
As an example, here is how you would pass a uri to gst.element_make_from_uri(), which, at least with the versions of gstreamer and gst-python I've tried, does not accept unicode objects:
1 import gst
2 from elisa.core.utils import locale_helper
3
4 def create_source_element(uri):
5 """
6 Create a source element from a c{elisa.core.media_uri.MediaUri}
7 """
8 unicode_uri = unicode(uri)
9 gst_uri = unicode_uri.encode(locale_helper.gst_file_encoding())
10 return gst.element_make_from_uri(gst.URI_SRC, gst_uri)
Here, we know that MediaUri objects can safely be converted to unicode (l.8). Unfortunately, gst.element_make_from_uri() does not accept unicode objects, so we have to give it an str object. To know the encoding expected by gstreamer functions, we use gst_file_encoding() from locale_helper which tells us that (l.9). We use the encode method of the unicode object with the result of that code, and we get a uri suitable for gst.element_make_from_uri()
When should I care
Special care should be given to the format/encoding of strings for:
- any string obtained dynamically (not something hardcoded or obtained through the translation system)
- any string containing the path/file name of something on the file system
Also, if you think a given function works well with unicode objects, you should check with complex non-ascii chars, and you should check on various operating systems. For instance, os.listdir() tends to behave properly on Windows, or on GNU/Linux as long as you have ascii file names, but it will mix unicode objects and strings on GNU/Linux if you have non-ascii characters in the listed directory.
Noteworthy issues
Our inter-process communication protocol does not support passing unicode objects yet. This is a bug. As long as it is not fixed, we need to pass string to anything going between processes. Great care should be taken to the encoding used for this.
Strings coming from environment variable cannot (AFAIK) be obtained as unicode objects. Most of the time, they *should* be in locale_helper.system_encoding(), but this is not guaranteed. When possible, we should avoid using information obtained from environment variable and prefer more reliable APIs that work with unicode. os.path.expanduser() is a good example of something that gets its information from environment variables, and therefore cannot be trusted to give us a proper unicode object.
