Recently I had to convert generated .docx
files to .pdf
files for more convenient distribution. The Word documents contained some custom formatting and additional pictures. I tried several Java libraries for doing this job (Docx4j, XDocReport and Apache POI) but all of them couldn't generate the output I got from manually converting the .docx
files with Microsoft Word's native export functionality. On GitHub, I found a nice command-line tool for converting the documents to pdf files: OfficeToPDF. In this blog post, I'll show you a quick example on how to use this CLI tool. This helps us to convert docx to pdf using Java without losing formatting.
First off there are some technical requirements you need to fulfill:
- .NET Framework 4
- Office 2016, 2013, 2010 or Office 2007
These requirements are a strong indicator of running this solution on a Windows machine. To try the following example on your machine, you need to download the .exe
from the GitHub project site and have a .docx
for the conversion at hand.
Calling the .exe
from Java might look like the following (the CLI expects just two parameters: the path of the .docx
file and the path of the generated .pdf
):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | public class SampleDocxConversion { private static final String PATH_TO_EXE = "C:\\Temp\\OfficeToPDF.exe"; private static final String PATH_TO_TEMPLATE = "C:\\Temp\\TEMPLATE.docx"; private static final String PATH_TO_OUTPUT = "C:\\Temp\\OUTPUT.pdf"; public static void main(String[] args) throws IOException, InterruptedException { Process process; process = new ProcessBuilder(PATH_TO_EXE, PATH_TO_TEMPLATE, PATH_TO_OUTPUT).start(); process.waitFor(); System.out.println("Result of Office processing: " + process.exitValue()); File file = new File(PATH_TO_OUTPUT); byte[] fileContent = Files.readAllBytes(file.toPath()); System.out.println(fileContent.length); } } |
The code above calls the CLI with its two parameters. It waits until the process finished and reads in the content of the generated file as a byte array.
This approach may be too oversized if you plan to convert just text-based .docx
files, but if your files contain some custom formatting, this approach might help you. For regular text-based files, I would prefer to use the conversion functionality of the Java libraries.
For further examples about file handling with Java, have a look at the following overview page.
Have fun converting docx to pdf with Java,
Phil.