How to extract the IATE glossary

The IATE glossary is a terminology resource maintained by the European Union that collects specialistic terms from a slew of disciplines as they’re used in official documents. It’s very useful when translating text about International politics, environmental issues, and other topics that fall under EU jurisdiction. The project started in 2004 and soon became the biggest terminology resource available for free on the Internet. This also meant that the file size became simply unbearable for most professionals: many terminology softwares (like MultiTerm, just to name one) would crash when attempting to extract terms from it, while OmegaT would freeze when trying to open a project the glossary was imported to, forcing the translator to remove the TBX file from the project to be able to resume working.

IATE finally managed to tackle this issue a couple years ago by developing a Java application named IATExtract that will take care of extracting subsets of the main termbase into smaller, topic-based termbases.

Preliminary steps

Install JRE

To run JAR files, you will need to install the Java Runtime Environment, which can be downloaded from here. Java is so ubiquitous nowadays that you’re almost guaranteed to have it installed. You can check if it is by opening a Terminal window in Linux and macOS and typing

which java

schermata-2016-11-13-alle-00-36-14
The path to Java on MacOS Sierra.

On Windows, you will need to use the where command

where java

The output of where java on Windows 10.
The output of where java on Windows 10.

If Java is installed, the command will output the path where its executable is located. Otherwise, nothing will be shown.

If JRE is not installed, proceed to Java’s website and click the Accept License Agreement radio button in order to be able to download the installer. Select the version appropriate to your operating system and processor architecture. If in doubt, refer to this guide on how to determine your processor architecture. After the installer has finished downloading, run it by double-clicking it and proceed with the installation.

Download the IATE glossary

It goes without saying that you will need to have downloaded the IATE glossary to your computer in order to extract terms from it. You can get it at this address. It’s a zip archive about 130 MB in size.

Extracting terms from IATE

Run IATExtract

Once the Java Runtime Environment is installed, download IATExtract and run it by double-clicking its icon. This window will open:

schermata-2016-11-13-alle-00-45-14

Use the Select IATE Export File to select the zip archive you downloaded from the link above. There’s no need to unpack it beforehand. The Set Extract Output Folder button allows you to define the folder where the terms or domains will be extracted to. You can choose which languages to extract by checking a language’s ISO code in the Choose languages section.

The two radio buttons below the languages section determine how the terms for extraction are chosen. Extract ALL selected languages will extract it only if that term has a entry for every language, otherwise is skipped. Extract ANY selected language will extract the term regardless of how many languages the term is available in. Remember to use the second option if you intend to extract many languages at once, because you risk of ending up with an empty termbase.

The drop-down menu Choose a domain to filter on allows you to select a specific domain to extract terms from. This is very useful when it comes to importing the IATE glossary into your CAT tool, because

  1. most CAT tools will crash when attempting to import a termbase too big, and
  2. you’ll get less irrelevant hits when translating, thanks to a narrowed-down selection of terminology.

The search box below is used to search for domains an subdomains, in case you don’t want to scroll down the drop-down menu.

Once you’re done tweaking the settings, click Start to begin the extraction procedure. Depending on the number of languages and the domain, this may take some time.

Download the IATE glossary as individual files

If you don’t want to extract terms yourself, you can download it divided by domains. Head over to this page to start downloading.

About Andrea Luciano Damico 126 Articles
Andrea Luciano Damico is a freelance translator from Italy. Among his interests are linguistics, technology, video games, and generally being a chill guy. He runs Let's Translate.it and Tech4Freelancers.net.