#HOWTO: Generate documents from Word templates with Docx4j on Wildfly 14

Lately, I had the requirement to generate Word documents from specific templates and fill them dynamically with data. For this task, I compared the following Java libraries: Apache POI, iText PDF, Docx4j and XDocreports.  I compared them while using the following characteristics: Possibility to replace variables, amount of additional dependencies, lines of code to generate the document and the number of releases in the last year. The winning library was Docx4j.

In this blog post, I’ll show you a simple way to generate Word documents from Word templates with Docx4j using Java EE 8 and running on the recently published Wildfly 14.0.0.Final. The application will accept data through a JAX-RS endpoint, populate the document and send the generated document back as a result.

The pom.xml of the project looks like the following:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>de.rieckpil.blog</groupId>
    <artifactId>word-from-template-generation</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>war</packaging>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <failOnMissingWebXml>false</failOnMissingWebXml>
    </properties>

    <dependencies>
        <dependency>
            <groupId>javax</groupId>
            <artifactId>javaee-api</artifactId>
            <version>8.0</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.docx4j</groupId>
            <artifactId>docx4j</artifactId>
            <version>6.0.1</version>
        </dependency>

        <dependency>
            <groupId>com.googlecode.jaxb-namespaceprefixmapper-interfaces</groupId>
            <artifactId>JAXBNamespacePrefixMapper</artifactId>
            <version>2.2.4</version>
            <scope>runtime</scope>
        </dependency>

    </dependencies>

    <build>
        <finalName>ROOT</finalName>
    </build>

</project>

I am using the Docx4j library in the latest version 6.0.1 and as this library is making use of JAXB and uses the JAXB namespace prefix mapping, I am adding an additional library to be able to use the JAXB reference implementation of Wildfly. The Docx4j and the JAXBNamespacePrefixMapper dependencies are packed into the .war which is okay for this sample project, but for a more thinner .war approach you can also add them to the Wildfly libs and mark them as provided in the pom.xml .

To be able to use the JAXB implementation of Wildfly, we need the following jboss-deployment-structure.xml under /src/main/webapp/WEB-INF:

<jboss-deployment-structure>
    <deployment>
        <dependencies>
            <module name="com.sun.xml.bind"/>
        </dependencies>
    </deployment>
</jboss-deployment-structure>

The following POJO will be used for replacing the variables in the Word document with some data:

public class UserInformation {

    @NotEmpty
    private String firstName;

    @NotEmpty
    private String lastName;

    @NotEmpty
    @Size(max = 500)
    private String message;

    @NotEmpty
    private String salutation;

    // getter & setter
}

The JAX-RS endpoint looks like this:

@Path("messages")
public class MessagesResource {

    @Inject
    DocxGenerator docxGenerator;

    @POST
    @Produces(MediaType.APPLICATION_OCTET_STREAM)
    @Consumes(MediaType.APPLICATION_JSON)
    public Response createNewDocxMessage(@Valid @NotNull UserInformation userInformation) {

        byte[] result;

        try {
            result = docxGenerator.generateDocxFileFromTemplate(userInformation);
        } catch (Exception e) {
            e.printStackTrace();
            return Response.serverError().build();
        }

        return Response.ok(result, MediaType.APPLICATION_OCTET_STREAM)
                .header("Content-Disposition", "attachment; filename=\"message.docx\"")
                .build();
    }
}

The incoming UserInformation object is validated with Bean Validation and passed to the EJB DocxGenerator for replacing the variables. This endpoint returns the media type APPLICATION_OCTET_STREAM as I am sending the raw bytes to the client. In addition, I am adding the Content-Disposition header to inform the client about the attached document.

The required code for loading the document and replacing the variables is also rather simple:

@Stateless
public class DocxGenerator {

    private static final String TEMPLATE_NAME = "template.docx";

    public byte[] generateDocxFileFromTemplate(UserInformation userInformation) throws Exception {

        InputStream templateInputStream = 
           this.getClass().getClassLoader().getResourceAsStream(TEMPLATE_NAME);

        WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(templateInputStream);

        MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();

        VariablePrepare.prepare(wordMLPackage);

        HashMap<String, String> variables = new HashMap<>();
        variables.put("firstName", userInformation.getFirstName());
        variables.put("lastName", userInformation.getLastName());
        variables.put("salutation", userInformation.getSalutation());
        variables.put("message", userInformation.getMessage());

        documentPart.variableReplace(variables);

        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

        wordMLPackage.save(outputStream);

        return outputStream.toByteArray();
    }

}

The template.docx file is loaded into a Docx4j internal object and prepared for the variable replacement. For replacing the variables I created a HashMap where the key is the name of the variable and the value the content. Docx4j supports three approaches for replacing variables within a .docx file:

  1. Marking variables with ${} in the document
  2. Using the Word field format MERGEFIELD
  3. Using XPath to replace the content

In this example, I am using the first approach, which is the simplest one. The template.docx looks like the following and is placed under /src/main/resources :

To download a generated document you can now use a REST client like Postman and store the response to the filesystem or use this cURL command:

curl -XPOST -o result.docx -H 'Content-Type: application/json' -d '{"lastName": "Duke", "firstName": "Tom", "salutation" : "Mr.", "message": "Hello World from Wildfly 14"}' http://localhost:8080/resources/messages

For a simple deployment on your machine, I created the following Dockerfile:

FROM jboss/wildfly:14.0.0.Final

ENV JAVA_OPTS="-Xms64m -Xmx1024m -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=256m -Djava.net.preferIPv4Stack=true -Dcom.sun.xml.bind.v2.bytecode.ClassTailor.noOptimize=true"

ADD target/ROOT.war /opt/jboss/wildfly/standalone/deployments/

With the JAVA_OPTS environment variable I am adding some memory restrictions for the JVM and with the parameter -Dcom.sun.xml.bind.v2.bytecode.ClassTailor.noOptimize=true I am optimizing the performance and reducing the amount of required heap space for the JAXB processing. You can find a detailed explanation for this here.

You can find the whole source code on GitHub and a detailed explanation for starting this application in the README.md.

Have fun with generating documents,

Phil.

Leave a comment

Your email address will not be published. Required fields are marked *