Repair an XML with invalid characters in a node

Repair an XML with invalid characters in a node

Some characters cannot be used inside the value of a node or attribute within an XML file.

 

The characters are:

 

Original character

Escaped character

"

"

'

'

<

&lt;

>

&gt;

&

&amp;

 

The table above indicates, that you should use the sequence for the escaped character instead of the original character on the left.

 

Another way to get around the limitation of the invalid characters is to include the value within a CDATA sequence like so:

 

Start sequence:

<![CDATA[

 

End sequence:

]]>

 

This means e.g. that this node is valid:

 

<Company_Name>![CDATA[InterForm & Kim A/S]]</Company_Name>

 

- even though the & sign is invalid and not escaped like above.

 

This section explains how you can repair an invalid input XML file and make it valid even though one or more invalid characters might be found in a value and the <![CDATA[ sequence mentioned above is not used.

 

A prerequisite for this is, that you can list the nodes with this potential problem.

 

In this example we get an XML file with this node:

<Company_Name>InterForm & Kim A/S</Company_Name>

Which makes the XML file invalid.

 

What we need to do is to search for the string: <Company_Name> and then replace it with <Company_Name>![CDATA[ and also search for the string </Company_Name> and replace this with ]]</Company_Name>.

 

This can be done with a velocity template, which in InterFormNG2 can be used as a mail-template.

 

The velocity template can be setup like this as the contents:

 

#set( $out = $payload.replace("<Company_Name>", '<Company_Name><![CDATA[').replace("</Company_Name>", ']]></Company_Name>') )

${out}

 

A prerequisite for this is, that a variable, payload has been defined and filled with the contents of the (invalid) input XML file prior to calling this mail-template. So the first step is to copy the line above to a simple text file with the extension .vm and upload that as a mail-template in InterFormNG2.

 

This mail-template can now be used in a workflow that looks like below:

 

NG2RepairXML01

 

The workflow consists of these components:

 

Read from file

This is just the workflow, that in this case monitors an input directory for input XML files.

 

Payload to workflow variable

This copies the payload (the invalid XML file) into a variable, payload.

 

Create Email message text from a template

This calls the mail-template with the contents mentioned above. This overwrites the payload with the changed XML file (with the ${out} command).

 

To filesystem

Here we save the corrected XML file for verification, but the XML file could of course also be used as input for a merge into print, PDF, email or other.

 

    • Related Articles

    • Base64 XML node to payload

      If a resource is included as base 64 in an input XML file, then you can use this advanced utilities component to extract the resource from the input file into the payload. The base64 XML node to payload workflow component has these parameters: XPath ...
    • Rule-based XML validation

      This advanced validation workflow component, Rule-based XML validation, can validate XML according to rules. The rules for rule-based validation can be used to validate documents in XML format. A suggestion for implementation can be found here. The ...
    • Extract payload from sub-node in input XML

      In this example we want to extract an XML file, that is stored as a sub-node in an input XML file. This could e.g. be an XML file stored inside of a SOAP input file. This input file could have contents similar to this: <SOAP-ENV:Envelope ...
    • JSON to XML

      One of the valid input formats of InterFormNG2 is JSON files. If you e.g. want to data in an input JSON file in either an email or in a template, then you first need to convert the JSON file into XML. You can use the basic, converter workflow ...
    • Enrich XML from database

      Enrich XML from database It is possible to change the contents of an XML file and to add data from an external database with the advanced special workflow component: Enrich XML from database. An alternative to this component is ng.databaseLookup, ...