Repair an XML with invalid characters in a node

Repair an XML with invalid characters in a node

Some characters cannot be used inside the value of a node or attribute within an XML file.

 

The characters are:

 

Original character

Escaped character

"

"

'

'

<

&lt;

>

&gt;

&

&amp;

 

The table above indicates, that you should use the sequence for the escaped character instead of the original character on the left.

 

Another way to get around the limitation of the invalid characters is to include the value within a CDATA sequence like so:

 

Start sequence:

<![CDATA[

 

End sequence:

]]>

 

This means e.g. that this node is valid:

 

<Company_Name>![CDATA[InterForm & Kim A/S]]</Company_Name>

 

- even though the & sign is invalid and not escaped like above.

 

This section explains how you can repair an invalid input XML file and make it valid even though one or more invalid characters might be found in a value and the <![CDATA[ sequence mentioned above is not used.

 

A prerequisite for this is, that you can list the nodes with this potential problem.

 

In this example we get an XML file with this node:

<Company_Name>InterForm & Kim A/S</Company_Name>

Which makes the XML file invalid.

 

What we need to do is to search for the string: <Company_Name> and then replace it with <Company_Name>![CDATA[ and also search for the string </Company_Name> and replace this with ]]</Company_Name>.

 

This can be done with a velocity template, which in InterFormNG2 can be used as a mail-template.

 

The velocity template can be setup like this as the contents:

 

#set( $out = $payload.replace("<Company_Name>", '<Company_Name><![CDATA[').replace("</Company_Name>", ']]></Company_Name>') )

${out}

 

A prerequisite for this is, that a variable, payload has been defined and filled with the contents of the (invalid) input XML file prior to calling this mail-template. So the first step is to copy the line above to a simple text file with the extension .vm and upload that as a mail-template in InterFormNG2.

 

This mail-template can now be used in a workflow that looks like below:

 

NG2RepairXML01

 

The workflow consists of these components:

 

Read from file

This is just the workflow, that in this case monitors an input directory for input XML files.

 

Payload to workflow variable

This copies the payload (the invalid XML file) into a variable, payload.

 

Create Email message text from a template

This calls the mail-template with the contents mentioned above. This overwrites the payload with the changed XML file (with the ${out} command).

 

To filesystem

Here we save the corrected XML file for verification, but the XML file could of course also be used as input for a merge into print, PDF, email or other.

 

    Notice: Help Center Transition Update

    As of January 13, 2025, we are excited to announce that our new Help Center is in the final stages of development. While the Knowledge Base is already accessible, our current JIRA system will continue to manage support tickets during this transition period. For assistance with InterForm Output Management Software, please refer to the Support for InterForm Output Management Software.

    We appreciate your patience and understanding as we work to enhance your support experience. If you have any questions or encounter any issues, please do not hesitate to reach out via the existing support channels.

    Best regards,
    The InterForm Support Team


      • Related Articles

      • XML Node Selection & Referencing

        XPath provides multiple ways to reference and navigate through an XML structure. Learn how to select specific nodes using direct references, index numbers, conditions, and relationships with other nodes. Connecting preceding or following node sets to ...
      • Node Existence & Conditions

        XPath allows you to verify the existence of nodes, check if they contain data, count occurrences, and ignore namespaces. This section covers essential techniques for validating and filtering XML elements. Calculating the sum of nodes Counting ...
      • Iteration & Grouping

        When working with repeated data, XPath enables you to loop through multiple elements or extract specific subsets of data based on conditions. This section covers repeat loops, grouping techniques, and handling warnings when no nodes are selected. ...
      • Base64 XML node to payload

        If a resource is included as base 64 in an input XML file, then you can use this advanced utilities component to extract the resource from the input file into the payload. The base64 XML node to payload workflow component has these parameters: XPath ...
      • Rule-based XML validation

        This advanced validation workflow component, Rule-based XML validation, can validate XML according to rules. The rules for rule-based validation can be used to validate documents in XML format. A suggestion for implementation can be found here. The ...