Linux for Translators: PDF

The PDF format was intended to ensure that visually oriented documents are presented to users with the same appearance, irrespective of the software, hardware and operating system used. In theory, this platform independence should be good for Linux. Adobe, who originally developed the software, also produced a Linux version of its ubiquitous Adobe Reader PDF viewer. Sadly, the company's commitment to Linux waned and the last version of Adobe Reader for Linux is now several years old (see below).

PDF files present translators with particular problems. The function of the file format is that of an electronic printout; it was not intended for editing. Producing a translation of a PDF file by editing the file therefore presents difficulties. The professional way to translate a PDF file is to translate the document from which it originated, then re-export it to PDF.

This is the point at which problems arise for the translator (regardless of the operating system they are using). Often, the original file is not available, because the party commissioning the translation is not the party that produced the original layout. Or the original file may be available, but final editorial changes were made at the pre-print stage in the PDF file itself, which is now the only definitive version.

Where PDF files are received for translation, the following three scenarios must be distinguished:

Delivery of the translated text only, possibly with some level of formatting to assist the layouter, for re-layouting by the latter
Delivery of a PDF file with formatting broadly resembling the original, for use "for information"
Delivery of the translated text formatted exactly as the original PDF file (this may require some level of DTP skill and the corresponding software)

Viewing PDF files

Viewing PDF files on Linux is fairly straightforward. Mainstream Linux distributions include their own PDF viewer, often in the form of a generic viewer that supports PDF among a range of other formats. Okular, usually the default viewer on KDE systems, is a good example.

Although a Linux version of Adobe Reader (Version 9) exists and can still be downloaded from some third-party sites, it has not been updated since 2015, and is now considered a major security risk. The use of an alternative PDF viewer is recommended, such as the native Linux viewer Okular, or more recent versions of Adobe Reader running on a compatibility layer.

Extracting text from PDF files/converting PDF files to other formats

Numerous tools exist for extracting text from a PDF file or converting it to a format more suitable for translation. These can be grouped broadly as follows:

Native Linux command-line tools: these may produce plain text or in some cases also replicate the file's formatting. The pdftotext and pdftohtml utilities are classic examples.
Although command-line utilities, they need not necessarily be launched from the command line: if pdftotext (for example) is associated in the operating system with PDF as a file type, right-clicking on the file's icon in the file manager and then selecting pdftotext will also convert the file.
Native PDF viewers with a "Save as" function
The "Save as" function of Adobe Reader for Windows, running on a compatibility layer (see Running Windows applications on Linux)
Using a PDF editor (see below) to save the entire file in a more suitable format, such as DOCX (as opposed to editing the PDF file itself)
Office software: PDFs can be opened for example in LibreOffice Draw or Microsoft Word (2013 and later), modified in these applications' formats, then re-exported to PDF. This process is clumsy (you may have to translate each line at a time), and the layout may not be preserved perfectly. I occasionally found this technique useful in the past when a lightly captioned, graphics-heavy text was required translation for information only, but better methods have now appeared.

Editing PDF files directly

Affordable applications with which PDF files can be edited directly (essentially "word processors for PDF files") have now been available for some years.

Particularly worthy of mention here is Iceni Infix. Although not available in a native Linux version, the creators of this excellent application were among the first to cater for the needs of both groups of users considered here, i.e. professional translators and Linux users. Infix supports exporting of the text from a PDF file in a form suitable for insertion into a CAT tool and re-importing the finished translation. In addition, care has been taken to ensure that it runs properly on Linux through a Windows compatibility layer.

Editing a PDF file, whether directly in the editor or by re-insertion of extracted and translated text, has its limitations however. One is that whilst substituting text is fairly trivial, a PDF editor may not offer the full functionality of a DTP system. Such functionality may be required, for example when the translation is longer than the original and the formatting must be adjusted to allow for this.

Another issue is that of fonts. PDF files contain the fonts required for the text they contain. Indeed, this is very much a part of their purpose, i.e. to be displayed flawlessly on any platform, including those on which the font of the file is not available. However, PDF files do not contain the full font set, but only the characters used in the particular file, and the translation may require further characters. In this case, the translator must either install the font required, or use a different font containing all the necessary characters. Such solutions may be adequate in some scenarios, but hardly constitute professional DTP work – which, of course, is in any case not the translator's role.

Annotating PDF files

Okular can be used to annotate PDF files with its Tools > Review function.

Creating PDF files

Most word processors running on Linux are now able to produce PDF files directly by way of an "Export to PDF" option.

Portable document format (PDF)

Viewing PDF files

Extracting text from PDF files/converting PDF files to other formats

Editing PDF files directly

Annotating PDF files

Creating PDF files

> TranslateOnLinux: PDFs