Repair an XML with invalid characters in a node

Repair an XML with invalid characters in a node

Some characters cannot be used inside the value of a node or attribute within an XML file.

 

The characters are:

Original character

Escaped character

 "

 "

 '

 '

 <

 &lt;

 >

 &gt;

 &

 &amp;

 

The table above indicates, that you should use the sequence for the escaped character instead of the original character on the left.

Another way to get around the limitation of the invalid characters is to include the value within a CDATA sequence like so:

  1. Start sequence:
    <![CDATA[
  2. End sequence:
    ]]>

This means e.g. that this node is valid:

<Company_Name>![CDATA[InterForm & Kim A/S]]</Company_Name>
- even though the & sign is invalid and not escaped like above.

 

This section explains how you can repair an invalid input XML file and make it valid even though one or more invalid characters might be found in a value and the <![CDATA[ sequence mentioned above is not used.

A prerequisite for this is, that you can list the nodes with this potential problem.

 

In this example we get an XML file with this node:

<Company_Name>InterForm & Kim A/S</Company_Name>

Which makes the XML file invalid.

 

What we need to do is to search for the string: <Company_Name> and then replace it with <Company_Name>![CDATA[ and also search for the string </Company_Name> and replace this with ]]</Company_Name>.

 

This can be done with a velocity template, which in InterFormNG2 can be used as a mail-template.

 

The velocity template can be setup like this as the contents:

#set( $out = $payload.replace("<Company_Name>", '<Company_Name><![CDATA[').replace("</Company_Name>", ']]></Company_Name>') )
${out}

A prerequisite for this is, that a variable, payload has been defined and filled with the contents of the (invalid) input XML file prior to calling this mail-template. So the first step is to copy the line above to a simple text file with the extension .vm and upload that as a mail-template in InterFormNG2.

 

This mail-template can now be used in a workflow that looks like below:

NG2RepairXML01

 

The workflow consists of these components:

  1. Read from file
    This is just the workflow, that in this case monitors an input directory for input XML files.
  2. Payload to workflow variable
    This copies the payload (the invalid XML file) into a variable, payload.
  3. Create Email message text from a template
    This calls the mail-template with the contents mentioned above. This overwrites the payload with the changed XML file (with the ${out} command).
  4. To filesystem
    Here we save the corrected XML file for verification, but the XML file could of course also be used as input for a merge into print, PDF, email or other.

 

    • Related Articles

    • XML Node Selection & Referencing

      XPath provides multiple ways to reference and navigate through an XML structure. Learn how to select specific nodes using direct references, index numbers, conditions, and relationships with other nodes. Connecting preceding or following node sets to ...
    • Node Existence & Conditions

      XPath allows you to verify the existence of nodes, check if they contain data, count occurrences, and ignore namespaces. This section covers essential techniques for validating and filtering XML elements. Calculating the sum of nodes Counting ...
    • Iteration & Grouping

      When working with repeated data, XPath enables you to loop through multiple elements or extract specific subsets of data based on conditions. This section covers repeat loops, grouping techniques, and handling warnings when no nodes are selected. ...
    • Base64 XML node to payload

      If a resource is included as base 64 in an input XML file, then you can use this advanced utilities component to extract the resource from the input file into the payload. The base64 XML node to payload workflow component has these parameters: XPath ...
    • Remove a node from XML

      This advanced utilities workflow component can remove one or multiple nodes from an input XML, that is found in the payload of the workflow. The output is a changed payload, where the selected nodes are removed. The workflow continues processing with ...