Translating XML documents with XLIFF
Some times ago I found this nice article about XML in localisation. Now I want to experiment with XLIFF a bit, because I could use it to translate an XML document I am writing.
I know that XLIFF is not the only way to translate an XML document, and neither its use is limited to XML documents only, it's just that using an XML document format to translate an XML document seems so natural!
The translation problem
Some very basic nomenclature for the translation problem (applicable in the case of CAT as well):
- original document
- the document to translate
- original format
- the format of the original document
- intermediate format
- a document format to submit to the translator; it holds the contents of the original document but it is independent from the original format
- destination document
- the translated document
- source language
- the language of the original document
- target language
- the language of the destination document
- translation-unit
- the element of the content meant to be translated independently from the others, typically for practical reasons the sentence or the paragraph are chosen, but in theory also words and letters, the whole document or any other element of a specific document format can be taken as TU.
- segmentation
- the process of splitting the original document in translation-units
- translation
- more formally, the equivalence between the source language and the target language for a translation-unit
- translation-memory
- a database of translations
Using an intermediate format makes the document's content independent from the document's format, this way the translator does not need to care about the actual document format, plus he/she can use specialized tools to ease the translation process.
An XLIFF workflow to solve the problem is basically as follows:
- Create a representation of the original document using XLIFF as an intermediate format, at this stage the segmentation process takes place as well. The original format could be stored in a separate file as a skeleton file without actual content, but it can well be embedded directly in the XLIFF file.
- Translate the content into the target language, using an XLIFF editor (or even a text editor, you know). Now that the content is split in TUs it can be easily seen how translating them one by one makes the contents of the original document converge to the contents of the destination document, this is an obvious concept, but I liked to stress it.
- Create the destination document converting the XLIFF representation back to the original format.
An example for XML documents
The tools we are going to use:
- xsltproc, you can find it packaged for any Unix system, I think.
-
xml2xliff.xsl
andxliff2xml.xsl
from xliffRoundTrip Tool Transolution has a nice XLIFF editor called xliffeditor :).
The latest version can be found in their SVN repository.I've also tried Virtaal but it does not handle nested elements in XLIFF very well.
Our original document is test-xliff-en.html
:
<?xml version="1.0" encoding="utf-8"?> <!-- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> --> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/MarkUp/SCHEMA/xhtml11.xsd" xml:lang="en"> <head> <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/> <title>Test page for XLIFF</title> <style type="text/css" media="screen, projection"> .styled { font-size: .8em; color: red; font-face: sans-serif; } </style> </head> <body> <p>A paragraph with some <em>styles</em> just see how <strong>nested elements</strong> are handled.</p> <ul> <li>Item with a <a title="test link" href="#">link</a> in it.</li> <li>Item with an <span class="styled">CSS style</span>.</li> </ul> <p>End of page</p> </body> </html>
Please note the commented out DOCTYPE
declaration, this is needed to not loose it, as XSLT doesn't know how to handle it properly. It has to be decommented when we are done.
DOCTYPE
is not in XSL and XPath data model (this article outlines a possible solution) and neither is CDATA
, so if you need to handle these elements you may want to rely on some preprocessing before converting to XLIFF, and some postprocessing after converting back to the original format.
Also, xml2xliff.xsl
does not support elements with a namespace, yet. That would be great to have.
Convert the document to XLIFF: xsltproc xml2xliff.xsl test-xliff-en.html > english-english.xlf
XLIFF specification supports only bi-lingual translation for now, this is not a big limitation as usually the translation if from an official language into the others.
The result of the conversion to XLIFF before the translation:
<?xml version="1.0" encoding="utf-8"?> <!-- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> --> <xliff xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:xmrk="urn:xmarker" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 xliff-core-1.2-strict.xsd urn:xmarker xmarker.xsd" version="1.2"> <file datatype="plaintext" source-language="en" original="html"> <header> <xmrk:nest> <xmrk:html xmarker_idref="html-0" xsi:schemaLocation="http://www.w3.org/MarkUp/SCHEMA/xhtml11.xsd" xml:lang="en"> <xmrk:head xmarker_idref="head-1"> <xmrk:meta xmarker_idref="meta-2" http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/> <xmrk:title xmarker_idref="title-3"/> <xmrk:style xmarker_idref="style-4" type="text/css" media="screen, projection"/> </xmrk:head> <xmrk:body xmarker_idref="body-5"> <xmrk:p xmarker_idref="p-6"> <xmrk:em xmarker_idref="em-7"/> <xmrk:strong xmarker_idref="strong-8"/> </xmrk:p> <xmrk:ul xmarker_idref="ul-9"> <xmrk:li xmarker_idref="li-10"> <xmrk:a xmarker_idref="a-11" title="test link" href="#"/> </xmrk:li> <xmrk:li xmarker_idref="li-12"> <xmrk:span xmarker_idref="span-13" class="styled"/> </xmrk:li> </xmrk:ul> <xmrk:p xmarker_idref="p-14"/> </xmrk:body> </xmrk:html> </xmrk:nest> </header> <body> <group id="id2297750axmarkhtml-0"> <group id="id2297759bxmarkhead-1"> <group id="id2297761exmarkmeta-2"/> <trans-unit id="title-3"> <source>Test page for XLIFF</source> <target>Test page for XLIFF</target> </trans-unit> <trans-unit id="style-4"> <source> .styled { font-size: .8em; color: red; font-face: sans-serif; } </source> <target> .styled { font-size: .8em; color: red; font-face: sans-serif; } </target> </trans-unit> </group> <group id="id2298359bxmarkbody-5"> <trans-unit id="p-6"> <source>A paragraph with some <g id="em-7">styles</g> just see how <g id="strong-8">nested elements</g> are handled.</source> <target>A paragraph with some <g id="em-7">styles</g> just see how <g id="strong-8">nested elements</g> are handled.</target> </trans-unit> <group id="id2298372bxmarkul-9"> <trans-unit id="li-10"> <source>Item with a <g id="a-11">link</g> in it.</source> <target>Item with a <g id="a-11">link</g> in it.</target> </trans-unit> <trans-unit id="li-12"> <source>Item with an <g id="span-13">CSS style</g>.</source> <target>Item with an <g id="span-13">CSS style</g>.</target> </trans-unit> </group> <trans-unit id="p-14"> <source>End of page</source> <target>End of page</target> </trans-unit> </group> </group> </body> </file> </xliff>
Translate the content using xliffeditor and save it to english-italian.xlf
. Here is a screenshot of xliffeditor from its author:
Convert back to the original XML format: xsltproc xliff2xml.xsl english-italian.xlf > test-xliff-it.html
Our destination document:
<?xml version="1.0" encoding="utf-8"?> <!-- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> --> <html xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="en" xsi:schemaLocation="http://www.w3.org/MarkUp/SCHEMA/xhtml11.xsd"> <head> <meta content="application/xhtml+xml; charset=utf-8" http-equiv="Content-Type"/> <title>Pagina di prova per XLIFF</title> <style media="screen, projection" type="text/css"> .styled { font-size: .8em; color: red; font-face: sans-serif; } </style> </head> <body> <p>Un paragrafo con degli <em>stili</em> solo per vedere come sono gestiti gli <strong>elementi innestati</strong>.</p> <ul> <li>Voce contenente un <a href="#" title="test link">collegamento</a>.</li> <li>Voce con uno <span class="styled">stile CSS</span>.</li> </ul> <p>Fine della pagina</p> </body> </html>
The only thing missing now is the value of the xml:lang
attribute, but this can be handled in a postprocess script as well.
Comments
You could try this
You could try this translation tool to help you out with the xliffs: https://poeditor.com. It has a really easy to use interface.
Hi, I came across your post
Hi,
I came across your post by searching about XLIIF format file.
One website translation tool delivers the content of website in this format file, which is very useful for me as translator. But I am trying to find out if it is possible to convert this format to Word document? If not, would that be useful if I send this translated website content to customer in XLIIF format and they can easily use it? or it needs to be converted to xml format, to be able to publish this translated website?
Maybe you could help? Thank you.
Hi An, XLIFF is just a
Hi An,
XLIFF is just a format to represent the original document split in translation units.
You can go to and from the original format in different ways: going XML -> XLIFF -> XML, is easier because XSLT can be used; for other formats you have to find ad-hoc solutions.
Look explicitly for a Word to XLIFF converter, or maybe use the Open Document Format (e.g. ODT) in Libreoffice and rely on odf2xliff: http://translate-toolkit.readthedocs.org/en/latest/commands/odf2xliff.html
Ciao, Antonio
Post new comment