Preparing half-translated bilingual XML for Trados Studio – with XSLT

More and more translation clients, especially in the Web industry, but also in application I18N/L10N, use the versatile XML standard for translation purposes. The market leader of Computer Aided Translation (CAT) Tools, SDL’s Trados Studio, allows to translate XML with an “Any XML” input filter, which includes an assistant that lets you choose which XML tags and attributes will be visible in the editor as “translatables”. Unfortunately, this means that the source strings will be overwritten with the translation — a bad idea if the source file is already bilingual XML that contains source and target language strings in matched tags.

If the target strings are empty, you can easily copy the content over and translate right away. But if the file is already partly translated, things get a bit more tricky, since you don’t want to overwrite existing translations. Worse, if the client happily announces that the source of some of the translated strings has changed, things get more than just a bit tricky. Let’s have a look at how to prepare those files with XSLT!

Alright, you have seen XML already, don’t you? Right. Looks similar to HTML, but you get to define the valid structure and tags in your own DTD. This basically means that while HTML is mainly used to display structured information to the human user, XML’s primary purpose is to contain structured information of any kind for humans and machines alike, and let separate stylesheets worry about how it will be displayed (e.g., as XHTML, PDF, LaTeX, CSV tables, plain text, you name it). If you want to know more, have a look at the XML and XSLT pages over at W3schools.

1. The XML file

Let’s first have a look at the file we want to translate with Trados (or the free/open source OmegaT plus Okapi Rainbow combo, or any other CAT tool):

<?xml version="1.0" encoding="UTF-8"?>
<uistrings text="de-DE" translation="en-US">
  <string id="001">
    <text>Die Verbindung konnte nicht aufgebaut werden: {0}.</text>
    <translation>Couldn't establish the connection: {0}.</translation>
  </string>
  <string id="002">
    <text>Falsches Datum {0} im Feld "{1}": Geben sie ein Datum nach dem {2} an.</text>
    <translation></translation>
  </string>
  <string id="003">
    <text>Ihre Eingaben werden von der Heisenberg & Söhne GmbH verarbeitet.</text>
    <translation>Your entries will be processed by Heisenberg & Planck GmbH.</translation>
  </string>
</uistrings>
Sometimes, clients will wrap HTML into those tags as Character Data (<![CDATA[ ]]>), which means you will get to see every tag in the translation environment as plain text. Be careful with those tags! Dear Clients: Using CDATA may lead to messed-up code during the translation, please try to use namespaces instead to enclose HTML in XML, then they will be correctly parsed and displayed as immutable tags and the translator is less likely to forget or mangle a tag somewhere.

The file starts with the XML file declaration including version and encoding. The mandatory “root element” uistrings encompasses all other tags, it also holds the source and target languages as attributes. Inside, we can see three string tags with their IDs as attributes, each with one text and one translation tag with the actual source and target content. Attention: If the file is saved as ANSI instead of UTF-8, the Umlaut and Ampersands might throw parsing errors and should be replaced with Entities!

I have inserted three use cases: The string is already accurately translated, the string is untranslated (empty translation tag) and the string is translated but the translation doesn’t match the text (here: the company name has changed). Unfortunately, our virtual client has not marked that string as modified, for example by setting something like a new or modified="yes" attribute on the string or text element.

So, we have already translated strings, empty strings and strings that need to be edited. Usually, you would want to write your translations into the translation elements. However, telling Trados to parse the translation elements as translatables will lead to English text in TagEditor’s German source column for strings 001 and 003, and you won’t get to see string 002 at all, because it’s empty and nobody would ever need to translate “nothing”, right? And on top of that, you won’t ever get to see the German source text.

2. File preparation

So apparently, what we need to do before translating the translation elements is to copy the source text, preferably without destroying extant translations. One way to achieve this is by using a text editor with Regular Expression Search&Replace functionality to turn the whole XML thing into a tab-separated table, save as .TXT, use Trados text table input filter to read and translate the file and turn it back into an XML document with another RegEx. Been there (article in German), it works quite nicely and you automatically have the source text in TagEditor’s source column and any existing translations in the target column. But let’s try using only XML this time, shall we?

XML and XSL are like HTML and CSS on steroids. Not only can XSL present XML data in a number of other languages, it also lets you convert one XML file into another, use variables, copy and move elements, and even use control structures such as if. One (good) use is to convert our XML file into an HTML file showing us three columns: ID, text and translation – and tell Trados in the file type options to use that .xsl stylesheet to display the preview window. Trados will even mark the currently edited segment with a red box in that preview, and we have our source and target sitting nicely side by side instead of having to stare at XML code. Example:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<!-- Sample XSL Stylesheet to display the above XML file as a HTML table in Trados' preview window -->

<xsl:output method="html" indent="yes" encoding="utf-8" doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"/>
<!-- What follows is regular HTML, except in tags with the namespace prefix xsl: -->

<xsl:template match="/uistrings"> <!-- XML root element of our file -->
<html>
  <head>
    <title>Preview</title>
  </head>
  <body>
    <table width="100%" style="border:1px solid #999;" cellpadding="2px" cellspacing="0px">
       <tr style="background-color:black;">
          <th style="color:white;" width="15%">ID</th>
          <th style="color:white;">Source</th>
          <th style="color:white;">Target</th>
       </tr>
      <xsl:for-each select="string"> <!-- Loop through all string elements in XML file and create table rows: -->
       <tr>
          <td color="blue" width="15%"><xsl:value-of select="./@id"/></td>
          <td><xsl:value-of select="./text"/></td>
          <td><xsl:value-of select="./translation"/></td>
       </tr>
      </xsl:for-each>
    </table>
  </body>
</html>
</xsl:template>
</xsl:stylesheet>

But it gets even better: As I have said already, such an XSL sheet can also transform one XML file into another XML file, and that’s where we can make that whole CAT translation thingy work, because Trados actually has a special XML filetype that is bilingual and that is read and displayed and edited correctly: XLIFF, the Translation (abbreviated XL) Interchange File Format, which is used by Trados and almost all other major CAT tools (as an import/export format if not natively). XLIFF is for bilingual texts what TMX is for translation memories.

This is how XLIFF can be generated by XSLT from our XML file:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xliff="urn:oasis:names:tc:xliff:document:1.1" exclude-result-prefixes="xliff">
<!--This XSL Stylesheet will output XLIFF, XLIFF prefixes shall not appear in the resulting file -->
<xsl:output method="xml" indent="yes"/>

<xsl:template match="/uistrings"> <!-- Start with our root element and create XLIFF: -->
<xliff xmlns="urn:oasis:names:tc:xliff:document:1.1" version="1.1">
<file source-language="de-DE" datatype="plaintext" target-language="en-US"> <!-- datatype could also be "html"! -->
  <body>
    <xsl:for-each select="string">
      <trans-unit id="{./@id}">
         <source xml:lang="de-DE"><xsl:value-of select="./text"/></source>
         <target xml:lang="en-US"><xsl:value-of select="./translation"/></target>
      </trans-unit>
    </xsl:for-each>
  </body>
</file>
</xliff>
</xsl:template>

</xsl:stylesheet>

How it works: We begin with our usual xml file type declaration, followed by the declaration that this is going to be an xsl:stylesheet, including the XSL version and namespaces for XSL and XLIFF. We also add that we don’t want to see xliff: prepended to any element in the output file. Then we proceed to the desired output, which is going to be XML and shall be indented for better readability. To define how our XLIFF file should look like, we begin our xsl:template at the uistrings root element (the one which holds all other elements) of our sample XML file.

The first line that will be written into the new file is its own file type declaration (xliff), together with its namespace, followed by a body element. Then, we begin iterating through our strings from the XML file: For each string, we write one trans-unit carrying the id as its attribute. Each one will contain one source and one target element with the content of the original text and translation elements. Then we end the loop, neatly close our body, file and xliff tags and end the xsl:template. And that’s also the end of our xsl:stylesheet. Easy, isn’t it? You just need to know how your desired file must look like and insert the content into the corresponding places – the for-each statement does the rest.

3. Convert to XLIFF, translate, convert back

Now let’s see if it works! You can download Apache’s Xalan XSLT processor either as a C binary or as a Java app (Xalan-C / Xalan-J). Personally, I find that the Xalan-C is less hassle: You download the xalan-comb-… package for your system (usually x86-windows or amd64-windows) right from here, extract the archive, drop the contents from the Xerxes directory into the Xalan directory (integrate the folders bin, include and lib) and there you go. There are also other XSLT processors, but Xalan is open source, free, libre and easy to work with.

Once you are done extracting (no real “installation” required), copy the above XML file code (the first code box) and paste it into an empty text file. Save that as test.xml. Likewise, copy the code from the last sample XSL sheet and save that as test2xliff.xsl. From the Xalan “bin” directory, do: xalan.exe -o testoutput.xlf test.xml test2xliff.xsl – be sure to include the full path to where you saved the test files, e.g. xalan.exe -o C:\Users\Me\Documents\XMLtest\testoutput.xlf C:\Users\Me\Documents\XMLtest\test.xml C:\Users\Me\Documents\XMLtest\test2xliff.xsl

Subsequently, you can open the .XLF (short form of .XLIFF) with Trados File/Open command and translate that. For me, it worked without hassles.

Wham!

Now you know how to write an XSL transformation into XLIFF. Will you be able to write a similar XLS transformation to convert the XLIFF back into the original XML file format? Try and tell me!
Cheers,
Christopher Köbel from DeFrEnT

Merken

Merken

Christopher Köbel

IT / IT-Marketing / Tech in DE / FR / EN defrent.de | XING Profil

Veröffentlicht in English Articles, Howtos in English Getagged mit: , , , ,
9 Kommentare zu “Preparing half-translated bilingual XML for Trados Studio – with XSLT
  1. Sven sagt:

    Hallo Christopher,

    erst mal Hut ab, das ist sehr hilfreich und im Grunde genau das, was ich brauche! Ich habe, bevor ich meine bilinguale XML in Angriff nehme, erst mal testweise Deine nachvollzogen und verstanden. Die erste habe ich als XML in Okapi Rainbow eingefügt, die zweite als XSLT hinterlegt. In der Software wird dann ähnlich wie bei Apache in eine XLIFF konvertiert. Leider tritt bei mir aber immer der gleiche Fehler auf – auch nach mehreren Versuchen wird in Englisch und Deutsch immer das gleiche Segment angezeigt statt der Übersetzung. Kann am Code-Beispiel etwas nicht stimmen oder hast Du spontan eine Idee, was schief läuft?

    Kurzes Beispiel aus der erzeugten XLIFF:

    <text>Die Verbindung konnte nicht aufgebaut werden: {0}.</text>
    <text>Die Verbindung konnte nicht aufgebaut werden: {0}.</text>

    Viele Grüße
    Sven

    • Hallo Sven,

      schön, dass Dir der Lösungsansatz gefällt! Es haben sich offenbar 2 Tippfehler eingeschlichen, die Xalan anscheinend kommentarlos ignoriert hat, die aber z.B. in Notepad++ mit aktiviertem XML-Tools-Plugin Validierungs-Fehler werfen:

      • In der Beispiel-XML-Datei wollen die &-Zeichen als Entity &amp; escaped werden (und wenn die Datei versehentlich als ANSI statt UTF-8 gespeichert wurde, führt auch der Umlaut in Zeile 12 zu Ärger).
      • In der XML-zu-HTML XSLT ist ein noch ärgeres Problem in Zeile 20: Der öffnende Kommentar-Tag ist falsch, das ! gehört natürlich vor die zwei Bindestriche!
      • Die XML-zu-XLIFF XSLT stimmt aber.

      Die Korrekturen habe ich jetzt im Artikel eingepflegt. Mit diesen 2 Änderungen bekomme ich per Xalan-C ein HTML- und ein XLIFF-File, die in Notepad++ wie erwartet aussehen und die ich so auch problemlos in Trados Studio 2015/2017 übersetzen konnte.

      Wenn ich aber dieselbe Datei ohne weitere Behandlung in OmegaT 4.1.1 lade (mit dem XLIFF-Filter), zeigt mir OT komischerweise die Target-Strings (EN) zur Übersetzung an statt der Source-Strings (DE). Wenn ich diese englischen Strings dann nach DE übersetze und mir die fertig exportierte Datei wieder in Notepad++ ansehe, sind die DE-Strings (source!) unverändert und die EN-Strings (target) mit deutschem Text befüllt. Das ist, gelinde gesagt, merkwürdig und unterläuft den Sinn der Transformation nach XLIFF. (Dieses konterintuitive Verhalten von OT an vielen Stellen ist leider einer der Gründe, wieso ich mit OT nur herumspiele und produktiv auf den Platzhirsch setze… :-/ ) Laut dem exzellenten OmegaT-Blog Velior.ru könnte es aber helfen, statt des nativen XLIFF-Filters den Okapi (SDL)XLIFF-Filter zu nutzen.

      Die andere Lösung, die mir spontan einfiele, wäre folgende: Man transformiert die teilübersetzte XML-Datei nach TMX statt XLIFF, um ein TM der vorhandenen Übersetzungen zu erhalten und transformiert dann die XML-Datei in eine XLIFF, in der alle Zielsegmente auch den Quell-Text enthalten. Dann übersetzt man in OmegaT die Strings in den <target>-Tags mit Hilfe des TMX, um die vorhandenen Übersetzungen verwenden zu können. Danach folgt die Rückkonvertierung.

      Oder man bemüht, wie Du es vorhattest, Okapi und muss ggf. eine passende Custom Configuration for the XML Filter mit ITS-Regeln erstellen, damit alle zu übersetzenden Inhalte in <target>-Tags landen, alles nicht zu übersetzende außen vor bleibt und alle Inline-Tags als OT-Tags präsentiert werden.

      Ich hoffe, diese Überlegungen geben Dir den Schubs in die richtige Richtung. Lass mich wissen, wie es ausging!

      Viele Grüße
      Christopher

      • Sven sagt:

        Hallo Christopher,

        vielen Dank für Deine kompetenten Ausführungen! Auch für den Hinweis zum XMl-Filter in Okapi Rainbow. Den habe ich direkt ausprobiert, und er hilft sehr gut dabei, nur relevante XML-Inhalte in der späteren XLIFF erscheinen zu lassen. Leider tritt bei Rainbow aber genau das Problem auf, dass Du auch bei OmegaT schilderst: Warum auch immer, aber die Segmente in soure und target sind leider immer in der gleichen Sprache befüllt, während die Übersetzung wiederum auch in der gleichen Sprache in source und target im nächsten item auftaucht usw.

        Mit Apache habe ich es auch noch mal probiert, nur funktioniert die Software unter Win 7 bzw. 8 auf zwei PCs bei mir nicht, ich kriege immer eine entsprechende Fehlermeldung in der Kommandozeile, egal ob mit -j oder -c. Kannst Du mir sagen, welche Version Du verwendest? Vielleicht habe ich eine für meine Systeme ungünstige erwischt…

        Viele Grüße und schönes Wochenende
        Sven

  2. Dark sagt:

    Christopher, if you’re interested in alternative solutions to manage the translation of XML files, try the software localization platform https://poeditor.com/

    The interface is great, simple and uncluttered, and the features it provides for l10n management are quite flexible, so you can adapt it to your own workflow.

    • Hello Dark(ness my old friend?),

      I was just short of not allowing that comment because it’s obviously an advertising… but since it is actually a relevant ad that might interest some of the readers here, I’ll allow it. This time.
      As to your proposed SaaS cloud solution, well, I am like most Germans are nowadays – we’re wary of cloud solutions. Not only do they require a ‘net connection to work at all (clear plus for locally installed software: it only needs a power plug or, in the case of laptops, enough battery power left), but it also raises confidentiality issues: I have to trust the cloud service provider not to scoop off any client data and all the intermediate server operators and the SSL protocol which doesn’t look as safe as it once did, and that means I have come to like working offline a lot more in those last few years. So that’s why I won’t use it, as good as it might be on the UI/UX and functionality sides. Just as I will never use a Dropbox or other file hoster to transfer data from/to clients. And wherever I can, I’m nudging my clients to get themselves E-Mail certificates to close down that gaping security hole, too. For others who don’t mind, PoEditor dot com might work very well, so good luck!

      • Dark sagt:

        Hello Christopher,

        I’ve come to talk with you again. 🙂

        What you described above is the general situation on the Internet.
        When this is not something holding you back, the use of an online platform is ideal for team work.

        There are projects that do require collaborative work, and not everyone handling such projects is happy to work offline, with clients.

        Cloud solutions can improve a translation and localization workflow, because they allow devs and translators to each do their part of the process in real time.

        Thanks for taking the time to reply. I wish you the best of luck with your projects!

  3. Hello Tommi, glad you like it. I haven’t worked with Qt Linguist .ts files yet, but if https://qt.gitorious.org/qt/webkit/source/7ea2c96fbd394fe930dc59b611b125ad269ec0ab:translations/linguist_de.ts is any indication, the file format should be easily convertible back and forth. You would have to preserve the context “name” and “location” for each “message”/”translation” pair to be able to reconstruct the correct structure of the original file, but fortunately, XLIFF allows to store this kind of extra information, see http://www.oasis-open.org/committees/xliff/documents/cs-xliff-core-1.1-20031031.htm#Struct_Extension (item 2.5 and following). Good luck!

  4. Tommi Nieminen sagt:

    That’s a good trick Christopher. I’m looking for a solution for importing Qt Linguist XML files (.ts) into Studio, and so far the best option seems to be making a new filetype with the Studio SDK, which is quite a lot of work. However, that way the whole process of conversion is invisible to the person preparing the files. I just wish it was possible to create custom bilingual XML file types inside Studio in the same way as normal XML file types.

1 Pings/Trackbacks für "Preparing half-translated bilingual XML for Trados Studio – with XSLT"
  1. […] Etwas später habe ich einen neuen, viel besseren Arbeitsablauf mit XSLT (Artikel in Englisch) […]

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.

*