The PDF (portable document) format was intended to ensure that visually oriented documents are presented to users with the same appearance, irrespective of the software, hardware and operating system used. In theory, this platform independence should be good for Linux. Adobe, who originally developed the software, also produced a Linux version of its ubiquitous Adobe Reader PDF viewer. Sadly, the company's commitment to Linux waned and the last version of Adobe Reader for Linux is now several years old (see below).

PDF files present translators with particular problems. The function of the file format is that of an electronic printout; it was not intended for editing. Producing a translation of a PDF file by editing the file therefore presents difficulties. The professional way to translate a PDF file is to translate the document from which it originated, then re-export it to PDF.

This is the point at which problems arise for the translator (regardless of the operating system they are using). Often, the original file is not available, because the party commissioning the translation is not the party that produced the original layout. Or the original file may be available, but final editorial changes were made at the pre-print stage in the PDF file itself, which is now the only definitive version.

Where PDF files are received for translation, the following three scenarios must be distinguished:

Viewing PDF files

Viewing PDF files on Linux is fairly straightforward. Mainstream Linux distributions include their own PDF viewer, often in the form of a generic viewer that supports PDF among a range of other formats. Okular, usually the default viewer on KDE systems, is a good example.

Although a Linux version of Adobe Reader (Version 9) exists and can still be downloaded from some third-party sites, it has not been updated since 2015, and is now considered a major security risk. The use of an alternative PDF viewer is recommended, such as the native Linux viewer Okular, or more recent versions of Adobe Reader running on a compatibility layer.

Okular viewer showing selection of text with the mouse, by text flow (top left) and rectangular area selection (bottom right). In both cases, the selected text is copied to the clipboard.
Okular viewer showing selection of text with the mouse, by text flow (top left) and rectangular area selection (bottom right). In both cases, the selected text is copied to the clipboard.

Extracting text from PDF files/converting PDF files to other formats

Numerous tools exist for extracting text from a PDF file or converting it to a format more suitable for translation. These can be grouped broadly as follows:

Editing PDF files directly

Affordable applications with which PDF files can be edited directly (essentially "word processors for PDF files") have now been available for some years.

Particularly worthy of mention here is Iceni Infix. Although not available in a native Linux version, the creators of this excellent application were among the first to cater for the needs of both groups of users considered here, i.e. professional translators and Linux users. Infix supports exporting of the text from a PDF file in a form suitable for insertion into a CAT tool and re-importing the finished translation. In addition, care has been taken to ensure that it runs properly on Linux through a Windows compatibility layer.

Editing a PDF file, whether directly in the editor or by re-insertion of extracted and translated text, has its limitations however. One is that whilst substituting text is fairly trivial, a PDF editor may not offer the full functionality of a DTP system. Such functionality may be required, for example when the translation is longer than the original and the formatting must be adjusted to allow for this.

Another issue is that of fonts. PDF files contain the fonts required for the text they contain. Indeed, this is very much a part of their purpose, i.e. to be displayed flawlessly on any platform, including those on which the font of the file is not available. However, PDF files do not contain the full font set, but only the characters used in the particular file, and the translation may require further characters. In this case, the translator must either install the font required, or use a different font containing all the necessary characters. Such solutions may be adequate in some scenarios, but hardly constitute professional DTP work – which, of course, is in any case not the translator's role.

Annotating PDF files

Okular can be used to annotate PDF files with its Tools > Review function.

Creating PDF files

Most word processors running on Linux are now able to produce PDF files directly by way of an "Export to PDF" option.

> TranslateOnLinux: PDFs