Lately, I had the requirement to generate Word documents from a template and fill them dynamically with data. For this task, I compared the following Java libraries: Apache POI, iText PDF, Docx4j, and XDocreports. I compared them while using the following characteristics: Possibility to replace variables, amount of additional dependencies, lines of code to generate the document, and the number of releases in the last year. The winning library was Docx4j. In this blog post, I'll show you a simple way to generate Word documents from a Word template with Docx4j using Java EE and running on Wildfly. The application will accept data through a JAX-RS endpoint, populate the document and send the generated document back as a result.
Project Setup for using Docx4j on WildFly
The pom.xml
of the project looks like the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | <?xml version="1.0" encoding="UTF-8"?> <project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://maven.apache.org/POM/4.0.0" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>de.rieckpil.blog</groupId> <artifactId>word-from-template-generation</artifactId> <version>1.0-SNAPSHOT</version> <packaging>war</packaging> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding> <maven.compiler.source>1.8</maven.compiler.source> <maven.compiler.target>1.8</maven.compiler.target> <failOnMissingWebXml>false</failOnMissingWebXml> </properties> <dependencies> <dependency> <groupId>javax</groupId> <artifactId>javaee-api</artifactId> <version>8.0</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.docx4j</groupId> <artifactId>docx4j</artifactId> <version>6.1.2</version> </dependency> <dependency> <groupId>com.googlecode.jaxb-namespaceprefixmapper-interfaces</groupId> <artifactId>JAXBNamespacePrefixMapper</artifactId> <version>2.2.4</version> <scope>runtime</scope> </dependency> </dependencies> <build> <finalName>word-generation</finalName> </build> </project> |
I am using the Docx4j library in version 6.1.2 and as this library is making use of JAXB and uses the JAXB namespace prefix mapping, I am adding an additional library to be able to use the JAXB reference implementation of Wildfly.
The Docx4j and the JAXBNamespacePrefixMapper dependencies are packed into the .war
which is okay for this sample project, but for a thinner .war
approach you can also add them to the Wildfly libs and mark them as provided in the pom.xml
.
To be able to use the JAXB implementation of Wildfly, we need the following jboss-deployment-structure.xml
under /src/main/webapp/WEB-INF
:
1 2 3 4 5 6 7 | <jboss-deployment-structure> <deployment> <dependencies> <module name="com.sun.xml.bind"/> </dependencies> </deployment> </jboss-deployment-structure> |
Converting a Word Template with Docx4j
The following POJO will be used for replacing the variables in the Word document with some data:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | public class UserInformation { @NotEmpty private String firstName; @NotEmpty private String lastName; @NotEmpty @Size(max = 500) private String message; @NotEmpty private String salutation; // getter & setter } |
The JAX-RS endpoint looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | @Path("messages") public class MessagesResource { @Inject DocxGenerator docxGenerator; @POST @Produces(MediaType.APPLICATION_OCTET_STREAM) @Consumes(MediaType.APPLICATION_JSON) public Response createNewDocxMessage(@Valid @NotNull UserInformation userInformation) { byte[] result; try { result = docxGenerator.generateDocxFileFromTemplate(userInformation); } catch (Exception e) { e.printStackTrace(); return Response.serverError().build(); } return Response.ok(result, MediaType.APPLICATION_OCTET_STREAM) .header("Content-Disposition", "attachment; filename=\"message.docx\"") .build(); } } |
We validate the incoming UserInformation
object using Bean Validation and pass it to the DocxGenerator
for replacing the variables. This endpoint returns the media type APPLICATION_OCTET_STREAM
as we are sending the raw bytes to the client. In addition, I am adding the Content-Disposition
header to inform the client about the attached document.
The required code for loading the document and replacing the variables is also rather simple:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | @Stateless public class DocxGenerator { private static final String TEMPLATE_NAME = "template.docx"; public byte[] generateDocxFileFromTemplate(UserInformation userInformation) throws Exception { InputStream templateInputStream = this.getClass().getClassLoader().getResourceAsStream(TEMPLATE_NAME); WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(templateInputStream); MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart(); VariablePrepare.prepare(wordMLPackage); HashMap<String, String> variables = new HashMap<>(); variables.put("firstName", userInformation.getFirstName()); variables.put("lastName", userInformation.getLastName()); variables.put("salutation", userInformation.getSalutation()); variables.put("message", userInformation.getMessage()); documentPart.variableReplace(variables); ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); wordMLPackage.save(outputStream); return outputStream.toByteArray(); } } |
The template.docx file is loaded into a Docx4j internal object and prepared for the variable replacement. For replacing the variables I created a HashMap
where the key is the name of the variable and the value the content. Docx4j supports three approaches for replacing variables within a .docx
file:
- Marking variables with ${} in the document
- Using the Word field format MERGEFIELD
- Using XPath to replace the content
In this example, I am using the first approach, which is the simplest one. The template.docx looks like the following and is placed under /src/main/resources
:
Downloading the Word Template
To download a generated document you can now use a REST client like Postman and store the response to the filesystem or use this cURL command:
1 | curl -XPOST -o result.docx -H 'Content-Type: application/json' -d '{"lastName": "Duke", "firstName": "Tom", "salutation" : "Mr.", "message": "Hello World from Wildfly 14"}' http://localhost:8080/resources/messages |
For a simple deployment on your machine, I created the following Dockerfile:
1 2 3 4 5 | FROM jboss/wildfly:20.0.0.Final ENV JAVA_OPTS="-Xms64m -Xmx1024m -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=256m -Djava.net.preferIPv4Stack=true -Dcom.sun.xml.bind.v2.bytecode.ClassTailor.noOptimize=true" ADD target/word-generation.war /opt/jboss/wildfly/standalone/deployments/ROOT.war |
With the JAVA_OPTS environment variable, I am adding some memory restrictions for the JVM and with the parameter -Dcom.sun.xml.bind.v2.bytecode.ClassTailor.noOptimize=true
I am optimizing the performance and reducing the amount of required heap space for JAXB processing. You can find a detailed explanation for this here.
You can find the whole source code on GitHub and a detailed explanation for starting this application in the README.md
.
Have fun with generating documents from a Word template with Docx4j on Wildfly,
Phil.
This method doesn’t work properly. I mean replacing variables with:
documentPart.variableReplace(variables);
since one variable, for example, ${firstname} may be separated in multiple runs in the word document. So, ‘variableReplace’ method cannot find such variables which are separated in multiple runs.
Hi Navid,
thanks for reaching out. I did a quick test and added a second page to the template.docx with the ${firstName} variable in addition to page one and it worked. You can also have a look at the implementation of the .variableReplace() method and see what’s happening under the hood.
If you want, I can share the example with multiple variable replacements with you.
Kind regards,
Phil
Hi Rieckpil,
thanks for responding. Try this scenario:
in the “${firstName}” variable in the word file, change the format of one character, for example, bold the letter N in ${firstName} and check whether this time it is replaced with the value or not.
Hey Navid,
I tried it with the character ‘N’ and the whole variable in bold and it worked for me. With docx4j you have three different possible ways of replacing variables:
Maybe you can try a different solution
Hai Navid,
I tried to replace the variable with the collection but I can’t what I want to do
Hey Sathish,
if you upload your example to e.g. GitHub and give me access to the repository, I can have a look at it. Otherwise you can find the source code for this example on GitHub.
Kind regards,
Phil
I moved your code example to a GitHub issues as it was quite large. You can follow it here
ok thank you Rieckpil
I’ve answered your question on GitHub, please have look.
Hey Rieckpil,
I had referred your code in GitHub I want to iterate the list one by one what I want to change in my template.Please help me to solve this
Unfortunately, I am not the author of Docx4j and never did this. Try to ask the Question on StackOverflow
Hello,
While using this I encountered an error like:
Caused by: java.lang.NoClassDefFoundError: org/docx4j/openpackaging/packages/WordprocessingMLPackage
I added the dependencies in pom.xml.
I cant figure out why its not working.
Hi,
did you use the source code from GitHub (https://github.com/rieckpil/blog-tutorials/tree/master/generate-documents-from-word-templates-with-docx4j-on-wildfly14) or did you include it into your own project? If so, please create an issue on GitHub and provide me more information about your setup.
Kind regards,
Phil
I include it in my own!
then please provide more information with an issue on GitHub otherwise it’s hard to help
What do you need me to show you??
The best would be to upload your project to GitHub and share it with me. Otherwise the
pom.xml
and your Java method where you use Docx4jI have it on BitBucket, will it do?
https://goncalojtpint[email protected]/goncalojtpinto/catequese.capuchinhos-program.git
This is also okay. Once I find the time, I’ll have a look
Unfortunately I can’t access this repository as it seems I have missing permissions.
Can you try it again? Thank you.
https://goncalojtpint[email protected]/goncalojtpinto/catequese.capuchinhos-program.git
Thanks now it works. I am able to check out your project and build it locally. I can reproduce the error of
java.lang.NoClassDefFoundError
. Unfortunately, I have no experience in properly packaging a JavaFX project, but for me, it looks like your current build process does not bundle the dependencies correctly (as the resulting.jar
is really small). The Maven Assembly plugin might help you here.Otherwise please create a question on StackOverflow, as I can’t really help here.
Kind regards,
Phil
Hey Rieckpil,
Great exmaple!~
I wanna inject an object into my template file like ${gender.title}, so i use this function documentPart.addObject(gender);
but i got an error like “Caused by: javax.xml.bind.JAXBException: class com.xxx.xxx.dto.Gender ni aucune de ses superclasses n’est connue dans ce contexte.”
Hey Tuo,
can you open a GitHub issue (https://github.com/rieckpil/blog-tutorials/issues) with more information about your setup?
Thanks in advance,
Phil
Hello I have followed your tutorial and I am running into the problem that it results in a corrupted word file. Can you help with this issue?
Hi Marlon,
sure. Please either create a GitHub issue or a question on Stack Overflow with a detailed description of your setup and a minimal reproducible example (if possible).
Kind regards,
Philip
Hi there. Great blog post.
I notice that you have a section titled “Generate the PDF file on WildFly”, but am I right in saying that you code doesn’t actually generate a PDF (it generates a docx)?
Did you have a requirement to generate PDFs and if so, what did you use? I am using docx4j, but I’m finding PDF conversion quite tricky as all the open source libraries I have used so far have issues with formatting and layout (i.e the generated pdf, doesn’t look like the input docx).
Thanks
Hi Lutin,
thanks for the feedback and the hint. Indeed, the headline is misleading and should state “Document” and not PDF.
Some years ago I also struggled to create nice-looking PDF files from a Word document as a template. I ended up with the following solution. You might use this as a proof of concept and build a more robust solution for it.
Kind regards,
Philip
Thank you so much for the reply!
I am knee-deep in researching the best solution for: docx template + data -> docx -> pdf.
I think I’ve settled on either docx4j (or docx-stamper) for the first part, but the programmatic conversion to PDF (as you have found out), is trickier than the average joe would assume (“just open the file in word and save as PDF!”).
Docx4j – may require a word plug-in for content control in order to support conditional and repeating elements in a doc. This plug-in (opendope) is only available for windows (mac doesn’t support content control). In addition, this approach may require XML binding.
Docx-stamper – uses variable replacement and then comments (yes, comments!) for conditional and repeating elements. This means templates could be authored on mac (and word online). Downside to this is that it’s less mature than doxc4j and may not support advanced use cases (variable replacement inside text boxes).
For the conversion to PDF, I was hoping to use open source tools, but all the libs I’ve tried need only support very simple docx formatting (as you also found out). LibreOffice conversion works very well (and is free to use, unlike MS), so I might leverage that. I’m thinking of either running it on an aws server (ec2) integrated as a web app, or as an aws lambda (libreoffice can be loaded as a lambda layer).
EC2 Server – not great for scaling but good performance in terms of generation time.
Lambda – good scaling, but cold starts (fully loading the libreoffice binaries) may be a dealbreaker.
Anyway, thank you so much for your time.
Hi Lutin,
thanks for coming back and adding the outcome of your investigations. If there’s a LibreOffice Lambda layer available and you don’t need responses within milliseconds, then the AWS Lambda solution sounds like a good plan.
Kind regards,
Philip
Hello Phil, Thanks for this great post, I am using this in my project and I’m able to successfully replace the MERGEFIELD.
but, I do also have conditional MERGEFIELD as well in my word template do you have any suggestion how can I handle conditional MERGEFIELD field e.g. {IF{MERGEFIELD sreasonPoorQuality}= “true” 254 168} a f Wingdings ?
Thanks
Hi Muhammad,
thanks for your feedback. Oh, I’ve no idea how to handle Wingdings here. Maybe Stack Overflow is a better place to ask the question. You can use the docx4j tag https://stackoverflow.com/questions/tagged/docx4j
Kind regards,
Philip