#HOWTO: Convert .docx files to .pdf files format loss-free

Recently I had to convert generated .docx files to .pdf files for a more convenient distribution. The Word documents contained some custom formatting and additional pictures. I tried several Java libraries for doing this job (Docx4j, XDocReport and Apache POI) but all of them couldn’t generate the output I got from manually converting the .docx files with Microsoft Word’s native export functionality. On GitHub, I found a nice command line tool for converting the documents to pdf files: OfficeToPDF. In this blog post I’ll show you a quick example on how to use this CLI tool with Java.

First off there are some technical requirements you need to fulfill:

  • .NET Framework 4
  • Office 2016, 2013, 2010 or Office 2007

These requirements are a strong indicator for running this solution on a Windows machine. To try the following example on your machine, you need to download the .exe from the GitHub project site and have a .docx for the conversion at hand.

Calling the .exe from Java might look like the following (the CLI expects just two paramters: the path of the .docx file and the path of the generated .pdf):

public class SampleDocxConversion {

  private static final String PATH_TO_EXE = "C:\\Temp\\OfficeToPDF.exe";
  private static final String PATH_TO_TEMPLATE = "C:\\Temp\\TEMPLATE.docx";
  private static final String PATH_TO_OUTPUT = "C:\\Temp\\OUTPUT.pdf";

  public static void main(String[] args) throws IOException, InterruptedException {

    Process process;
    process = new ProcessBuilder(PATH_TO_EXE, PATH_TO_TEMPLATE, PATH_TO_OUTPUT).start();
    process.waitFor();
    
    System.out.println("Result of Office processing: " + process.exitValue());

    File file = new File(PATH_TO_OUTPUT);
    byte[] fileContent = Files.readAllBytes(file.toPath());

    System.out.println(fileContent.length);
  }

}

The code above calls the CLI with its two parameters and waits until the process finished and reads in the content of the generated file as a byte array.

This approach may be too oversized if you plan to convert just text-based .docx files, but if your files contain some custom formatting, this approach might help you. For regular text-based files I would prefer to use the conversion functionality of the Java libraries.

See you in the next blog post,

Phil.

Leave a comment

Your email address will not be published. Required fields are marked *