String Manipulation

String Manipulation


XPath includes powerful functions for processing text within XML elements.

This section explores how to concatenate strings, extract substrings, search within text, replace characters, and tokenize strings into lists.

Adding preceding/leading or trailing zeroes/blanks to a string

Concatenating multiple strings (concat)

Extracting part of a string using start position and length (substring)

Extracting text before or after a certain value (substring-before, substring-after)

Checking if a string contains another string (contains)

Replacing occurrences of string1 with string2 (replace)

Removing or replacing specific characters in a string (translate)

Tokenize: Convert a string into a list and use index for references to this list


Adding preceding/leading or trailing zeroes/blanks to a string

If you have a string of a variable length and you want to fill out with leading or trailing blanks or zeroes to a specified length, then you can do that as below. In the examples we refer to the string with the variable length as $input:

 

Add leading/preceding blanks up to the length of 10:

substring(concat('          ',$input),string-length($input) + 1,10)


Add leading/preceding zeroes up to the length of 10:

substring(concat('0000000000',$input),string-length($input) + 1,10)

 

If the $input variable is numeric, then you can also use the more simple function:

ng:numberFormat($input,'us','0000000000')

 

Add trailing blanks up to the length of 10:

concat(substring(concat($input,'          '),1,10)

 

Add trailing zeroes up to the length of 10:

concat(substring(concat($input,'0000000000'),1,10)

 

If the $input variable is not a string, then you can use the function, string to cast it as as string e.g. like below:

substring(concat('0000000000',string($input)),string-length(string($input)) + 1,10)
Concatenating multiple strings (concat)


If you want to concatenate constants with data from the XML file you can use the concat function e.g. like this:
concat('ABCD',/Data/Header/Type,'EFGH')


This inserts the XML data right in between the constants: ‘ABCDVDA4902 EFGH’.

You may notice, that there above is a blank between the XML data and the trailing constant. You can remove that by removing leading and trailing blanks of the XML data like so:

concat('ABCD',normalize-space(/Data/Header/Type),'EFGH')
Extracting part of a string using start position and length (substring)


You can also substring data like this:
substring(/Data/Header/Type, 2,3)


The first parameter of the substring command is the string, that should be subset, the second is the start position and the third is the length, so:  if /Data/Header/Type contains the string ‘VDA4902', the result will be ‘DA4'.  

     

Extracting text before or after a certain value (substring-before, substring-after)

Other string functions are:

substring-before(expression1 ,expression2)

substring-after(expression1 ,expression2)

 

substring-before searches for the text, expression2 inside expression1 and returns the part of expression1, that precedes the found position.

This formular returns a ‘c’:
substring-before("c:\dir", ":\")

substring-after searches for the text, expression2 inside expression1 and returns the part of the expression1, that follows the found position.

This returns ‘dir’:
substring-before("c:\dir", ":\")

If substring-before and substring-after is not able to find the text, then an empty string is returned.
Checking if a string contains another string (contains)

Function contains() returns either true or false depending on if one string is found inside of another.

Examples: contains('abcdefg','de') equals true as the string 'de' is found in the string 'abcdefg'.

If e.g. you want to test if one string (here a variable called $mystring) match one of the strings in a list like so:
if ($mystring = 'abc' or $mystring = 'def' or $mystring = 'ghi') then ...

Then you can consider to do a similar test like so: 
contains('abc,def,ghi',$mystring)
Replacing occurrences of string1 with string2 (replace)


Function replace() replace any occurrence of expression 2 in expression1 with expression3.


So the expression
replace('abc@def','@','123')
 will result in the string: ‘abc123def’.
Removing or replacing specific characters in a string (translate)

The function translate()
is able to translate a character with another. It will search the first string for any character in the second string and if found it will use the same number in the list from the third string instead.

This will e.g. replace any comma with a dot in the variable, in:
translate($in, ”,” , ”.”)

                    –S1–   S2    S3

 

If the third string is empty the character will be replaced with nothing: 

translate($in, ”,” , ””)

 (commas are removed)


You can also convert from lower case to upper case in this way:

translate($in, ”abcde” , ”ABCDE”


Tokenize: Convert a string into a list and use index for references to this list

If you have a string, that contains a list of elements and you would like to refer to these elements by an index, then you should consider the standard tokenize() xpath funtion.

 

The tokenize() function can be used in the way, that you can convert it into a list, which is defined by a delimiter e.g. to  extract each word from a sentense.

 

For this example we can consider this variable:

  

 

We want to convert this into a list of 4 elements while using comma (,) as the delimiter between each element.

 

That can be done in this way:

tokenize($list,',')


So the first parameter of the tokenize is the input string, that we want to convert into a list and the second parameter is the delimiter.

Now we can use a normal index to extract a specific element from the list like below.

This expression returns '34' as this is the second element:

tokenize($list,',')[2]

 

We can also choose to print out all elements in the list like below:

  NG2Tokenize0002

This results of course in this list:

  NG2Tokenize0003

 

Extract words from a sentense

You can also use tokensize to extract each word from a string like below.

If the variable list is defined like so:

  


Then we might want to extract a specific word (where the words are delimited by a normal space).

In this case we use a space as the delimited as below

This extracts the second word (which is 'had'):

tokenize($list,' ')[2]
 

This repeat outputs each word on separate output lines:

  NG2Tokenize0005

 

    • Related Articles

    • Replace non-breaking space component

      This advanced workflow component, Replace non-breaking space replaces any non-breaking space (nbsp) hexadecimal 0A or C2A0 in UTF-8 with a normal space in an XML file. An alternative to this component (and more detailed information) is found in the ...
    • XPath

      In InterFormNG2, you use the language XPath for referencing data from the XML file. There are many sources of information if you want to learn more about XPath, but you can also simply read the few examples below to get a good idea of how it can be ...
    • Index data

      You can create an index data XML output file with the advanced workflow component, Index data. The index file can e.g. be used for archiving You can use a transform component to convert the XML into another format before saving it to a file. The ...
    • Using variables in Xpath expressions

      XPath is a strong tool in itself, but you can even use variables in your XPath expressions. The way it works is by referring to a variable as: $variable You use the variable element to assign a value to a variable. Variables are case-sensitive, so ...
    • ng:trimLeft

      The built-in function, ng:trimLeft() trims any leading blanks from a string, which is the input for the function. Example: This expression outputs the string: 'abc ': You can also see the difference, if you combine this function with the concat ...