Translations/ i18n - General considerations and remarks
Editing documents in languages with multi-byte encodings
Hippo is mostly written in Java, so it uses Unicode internally. The storage system (JCR-Jackrabbit) also uses Unicode, and so does Hippo's delivery tier. For the rendering of JSP pages or Freemarker templates, all the page outputs are buffered in (Java Unicode (UTF-16)) character buffers and typically, these are aggregated and rendered to the webpage using UTF-8 encoding. So, Unicode text stored in the repository will render without any problem. This means that you can expect a very smooth experience when using Hippo to edit multilingual documents and when building a site with Hippo's delivery tier. The CMS Document Editor supports any language you choose, and saving and comparing document changes works as expected. Even mixed-language content is supported. Finally, document names support Unicode, too, multi-byte encoded node names are supported in JCR out of the box.
There is a caveat when it comes to configuring the delivery tier. For example, when creating an hst:sitemapitem, you can set the parameter hst:relativecontentpath to point to a document in the CMS by its (node-)path relative to the site's root node. Now, if the name of the document (or of the folder containing it) is using multi-byte encoded characters, that path will contain multi-byte characters. While this is valid to the delivery tier, a developer may not be familiar with the encoding, and it will be hard to set up a correct configuration and pinpoint problems. Also, error messages logged to the server's terminal may not show correctly when containing multi-byte symbols. This depends on the encoding used in the terminal and the server's operating system.
A number of Bloomreach Experience Manager projects have been built, using multilingual content extensively. Examples for this are http://china-cn.nlembassy.org/ and http://japan-jp.nlembassy.org/
Using languages with Right to Left orientation
Right-to-left orientation as used in f.e. the Arabic languages is a challenging topic. First of all, let's assume we log into the CMS in English, and we want to create content in Arabic. We open a document and start editing it and once we change the language in our keyboard, the browser is smart enough to change the orientation in the input field under editing, to Right to Left orientation. For text inputs (text/textarea), this works almost perfectly; the drawback is that the cursor is initially at the left side of the form input.
In formatted text and rich text fields, the CKEditor plugin 'bidi' can be enabled to allow a user to switch between left-to-right and right-to-left text orientation. It requires the following configuration:
ckeditor.config.appended.json:
{ extraPlugins: 'bidi', toolbarGroups: [ { name: 'bidi' } ] }
CMS Translations
Most of the text shown in the CMS resides in resource bundles, but there are also some labels and translations found in the configuration part of the repository (bootstrapped there by means of XML files). The CMS itself does not do much string manipulation, with the exception of CKEditor, the rich text editor. There, the CMS modifies links and image elements.
Adding a new CMS translation takes about 2-3 days and is well documented. There are also many tools to help you find missing translations. There is a SONAR check and also a shell script that you can use to help tracing down any missing literals. Please take a look in the documentation in this site, you can start from the 'See also..' section at the right side of this page for some useful pointers.