bruce's blog

Translation Management Module

ICanLocalize is migrating its proprietary closed translation management system into an open-source Drupal translation management module.

We are aiming at having an initial beta release of the module towards the end of next week if all goes to plan.

What's done:

  • Add/Edit Translators - specifying which uses can be translators and what languages they can translate from and to.

Consistant translations with glossary

It’s taken a while to build, but it’s finally ready – a global translation glossary for each client.


Many words can be translated in different ways and it’s important that everything we translate come out the same.

The solution is a glossary.

A glossary helps produce consistent translations, as it shows how phrases were translated before.

Website owners and developers can create entries for important phrases. These entries can include the translation and serve as guidelines for the translators, or remain untranslated, so that translators can suggest the right translation.

As translators work, they too can add glossary entries. These entries will help translate the rest of the project consistently and also serve as reference for other translators who work on the project.

Now it's ready for website translation projects

We've had this working for Software localization and for Instant translation projects for a while now at ICanLocalize. Now users of our Drupal ICanLocalize translator module can use this feature to get consistent translations for their Drupal websites.

Here is a short clip showing how glossaries work in the translation tool. What you'll see is:

  1. Drupal blog post to translate
  2. The translation tool
  3. Existing glossary terms highlighted while translating
  4. Translator creating new glossary entries
This text will be replaced

Multilingual Sites that Translate Well

Drupal's t() function allows translating texts from one language to others. In order to work right, you must create texts that translate well. We’ll show you frequent mistakes and how to correct them.

Localization 101

Once you've created a site, using the t() function, Drupal will scan the texts and send them to translation. The String translation mechanism allow replacing the texts in the original language with texts in other languages.

Translators cannot change anything else in the site or in the HTML. They only translate the texts that you give them.

Some of the things that translators cannot do:

  • Change the order in which texts appear
  • Merge or split texts
  • Translate texts that are not inside the t() function

Give your translators complete sentences

When translators get single words to translate, it's practically impossible to translate them correctly. They need to see full sentences in order to translate with meaning.

A common case is the login message:

You must be logged in to post a comment.

Too often, folks create this message using this code:

<?php print t('You must be') ?> <a href="<?php print url('user/login') ?>">
<?php print t('logged in ') ?></a> <?php print t('to post a comment.') ?>

This means that the translator now sees three strings to translate:

You must be
logged in
to post a comment.

Each of these strings makes little sense.

Did you know that the English word "be" has several meanings in Spanish?

Be can mean what you are (ser) or where you are (estar). Translators cannot tell which one you mean when they just see "you must be".

To fix this and create a sentence that translates well, we'll merge all these strings into a single sentence, as it's supposed to be in the first place.

We'll use placeholders to insert values from other functions into the sentence.

<?php print t('You must be <a !link>logged in</a> to post a comment' ),
 array('!link' => 'href="' . url('user/login') . '"') ) ?>

Now, it's crystal clear. The translator sees one sentence, from start to finish. If needed, the translator can swap between parts of the sentence and write it correctly in any language.

Beyond complete sentences

When we create multilingual sites, we need to remember that translation is not everything. Localization means adapting the site to a different language, country and conventions.

We need to create interface strings that allow adjusting things like:

  • Number formats
  • Date formats
  • Units
  • Phone numbers (adding country codes)
  • Addresses (adding the country)
  • and many others...

The first and most important step is understanding. Once we understand that localization only begins with adding t() functions, we'll create much better websites that read natural in any language.

Working with the Domain Access module

One of our clients at ICanLocalize has a number of sites that rely heavily on the Domain Access module. We thought that we had tested our ICanLocalize Translator module thoroughly with this module but nothing replaces a real live test.

For those that are not aware:

The Domain Access project is a suite of modules that provide tools for running a group of affiliated sites from one Drupal installation and a single shared database.

Problems found

1. ICanLocalize returned translations to the wrong domain.

What should happen here is that the all the translations of a node should be published in the same domain as the original node. What we found was that the domain information was not being saved at all for the translated nodes. After tracing through the code I found that all the domain information for a node was set correctly when calling node_save() but was not being saved to the domain access tables. It turns out that the Domain Access module was caching some of its data and the cached values where not always correct due to the way our module was accessing the saved node at a later stage. See http://drupal.org/node/752570 for details.

We managed to fix this with the changes kindly done by the maintainer of the Domain Access module and changing the order of things in our ICanLocalize translator module.

2. Duplicate translated nodes were being created.

What was happening was that when an updated translation was being saved by the ICanLocalize Translator module, it was creating a completely new node instead of updating the content of translated node. It seemed that our module could not determine that the node was already translated. With the help of the client we checked what was in the database and what was returned by the function translation_node_get_translations. All appeared to be OK so why was it going wrong? Our module calls translation_node_get_translations to find the translated nodes. When the client ran it, it returned the translated nodes but when it was called from the ICanLocalize Translator module, didn't return any translations.

After a bit of head scratching it dawned on me that maybe it was another Domain Access issue. When we send back translations from the ICanLocalize server they get sent back via XML-RPC. When the XML-RPC function gets called it runs in the default domain so calling translation_node_get_translations returns none because the Domain Access module determines the original node was in a different domain.

To fix this I created our own version of the "translation_node_get_translations" that bypasses Domain Access filtering of nodes.

Conclusion

ICanLocalize Translator should now work much better with the Domain Access module thanks to our client and the maintainer of the Domain Access module.

Domain Access is a powerful module and we intend to use it to deliver other client sites that require complex configurations. It's great to know that things work together now.

Internationalization at DrupalCon 2010

Gábor Hojtsy and Robert Douglass will be talking about internationalization and localization at DrupalCon SF. Parlez vous Internet? Ignore the rest of the world at your own risk.

If you are going to the conference I hope you can make it. It should be very interesting.

<self-promotion>

Managing a multilingual site

This site (Drupal-translation.com) is in English, German and Spanish with English as the default language. All the content is written in English and then professionally translated to German and Spanish using the ICanLocalize translator module.

Unfortunately I don't understand German or Spanish so what happens when someone comments on a page that has been translated to German.

Running a test server for translations

A common question we get asked at ICanLocalize is:

What's the best way of translating our Drupal site with minimum downtime and minimum disruption to our existing users.

I guess there are a number of answers, this is one method.

Under construction

Create a test (Quality Assurance) server

Here I'll show you how to set up a test site that's a copy of your original site so that you can then use this to setup a multilingual site. You will then be able to use this test site to add the required multilingual modules, enable your languages and translate your content.

1) Create a copy of your Drupal source directory:
  eg. my directory is /home/bruce/drupal-6.15, copy it to /home/bruce/drupal-6.15-QA

cp -Rpv /home/bruce/drupal-6.15 /home/bruce/drupal-6.15-QA

2) Copy your database:
  eg. mine is db_drupal, I did a sql dump and reloaded to a new db, db_drupal_QA

3) Edit your copied settings.php file so it uses the new database.

$db_url = 'mysqli://username:password@localhost/drupal_db_QA';

4) Edit your copied settings.php file and give it a different base url.

$base_url = 'http://localhost/drupal-QA';

5) Edit your apache configuration file to point to the new drupal site:

<VirtualHost *:80>
  DocumentRoot /home/bruce/drupal-6.15-QA
  ServerName localhost
 
  <Directory /home/bruce/drupal-6.15-QA>
    AllowOverride All
  </Directory>
</VirtualHost>

6) Restart apache

apache2ctl restart

Now you should have a complete duplicate of your site running at:
http://localhost/drupal-QA

You can now go ahead and setup your translated site.

  • Add languages
  • Add modules, i18n, ICanLocalize, etc
  • Enable language switch block
  • Translate the content
  • etc

See our how-to guide for further info: Setup for a multilingual site

Under construction

Once you have finished setting up your multilingual site all you need to do now is follow some of the above steps to copy your translated site back to your production site.

Note for ICanLocalize users: When moving the ICanLocalize settings to the production server you will need to ask the ICanLocalize team to change the settings on the ICanLocalize server so further translations are sent back to the correct server.

Handy SQL for translations

Here are a few handy SQL snippets for dealing with translated content on a Drupal site. These will be helpful if you are trying to translate a site and need to work out what content hasn't been translated yet.

1. Find nodes that haven't been translated.

SELECT 
  nid, title, language 
FROM node 
WHERE nid=tnid AND nid NOT IN (SELECT tnid FROM node WHERE language="es")

2. Find blocks that haven't been translated.

SELECT *
FROM locales_source
WHERE source IN (SELECT body FROM boxes) 
  AND lid NOT IN (SELECT lid FROM locales_target) 

3. Find blocks that are not in the locales_source table. These will be blocks that haven't been viewed in a translated language.

SELECT * 
FROM boxes 
WHERE body NOT IN (SELECT source FROM locales_source)

HTML validation

A client recently had a problem where content that was sent for translation wasn't translated fully. The returned translation was only half done and appeared to be truncated. After a quick investigation I discovered there was an error in the HTML for a node.

Instead of:

<a href='...'>link text</a>

they had:

<ahref='...'>link text</a>

(a space was missing between 'a' and 'href')

Many times, browsers manage to display pages with broken HTML, so it's difficult to notice there's a problem. In this case the link is missing but the text is all there.

Broken HTML leads to problems

This is what ICanLocalize does when you send content for translation:

  1. The node title and body text are sent to the ICanLocalize server
  2. The ICanLocalize server parses the HTML
  3. The HTML parser extracts the text for translation

The HTML parser extracts the text in such a way that translators only have to edit text and not HTML tags. While translators are editing, a preview panel shows them how the translated document would appear.

Our parser is fairly robust but obviously in this situation it failed.

Remember that the ICanLocalize server is not the only computer to process your pages. Search engines (a.k.a Googlebot) read your pages and try to make sense of them. When they encounter broken HTML, they get confused. Parts of the page, or even entire pages can be lost if search engines cannot process them and cannot follow links.

Make sure your HTML is valid

Before sending content for translation it's always a good idea make sure the HTML in your nodes are valid. I've done a quick search for drupal markup validation modules and this one looks useful: http://drupal.org/project/w3c_validator

You can use the validator directly: http://validator.w3.org/

My personal favorite is the Firefox HTML validator. It's pretty simple. Green means GO, red means no-go and yellow means check. You get instant validation for entire pages as soon as you save the first draft.

Using different images in translated content

I've been looking at how we can include different images in translated content. Translating the text in content is fairly straight forward but keeping track of hundreds of different images in many languages quickly becomes a serious problem.

We have had an inquiry from a client that has an online help system using Drupal. Many of the help pages include screen shots and these screen shots need to be different in each language. We're talking about thousands of images, translated into at least six languages.

Syndicate content