WeeklyAlteryxTips#66 Solve the mystery of invisible characters

Alteryx

Sometimes the data that you got from the Internet is in trouble. If it is easy to see, it’s OK, but it sometimes might have invisible letters.

For example, “Blog­” is normal word, but actually it has an invisible letter. Let’s confirm it at the website that can bring the invisible characters to light.

The Unicode “U+00AD” is the letter “SOFT HYPHEN”. In this post, let’s see it by the Alteryx Desginer.

Checking for invisible characters in Alteryx

You can use the tokenize option at the RegEx tool to divide each character from words and then find the invisible characters. The RegEx tool’s setting is as follows.

The important point is to set “.” to the Regular Expression, select the Tokenize at the Output Method and use “Split to Rows”.

Input data is as follows.

The result is as follows.

Indeed, there is space in 4th row.

Let’s calculate the Unicode code point. Basically, you can calculate them to use CharToInt function from String to ASCII code and convert them to hex. Putting the obtained values ​​into common notation you get the following:

"U+"+PadLeft(IntToHex(CharToInt([Field1])),4,"0")

Actual output is as follows.

The identity of the invisible character is Unicode “U+00AD”. It means “SOFT HYPHEN”. If you can just use the convert website, you can use them bellow.

Remove the invisible characters by Alteryx

Well, the invisible characters should be removed because it is no use. Now, if what you need are only numbers and alphabets, you can use RegEx tool to remove unrequired characters.

For example, the setting of the RegEx tool to only leave numbers and alphabets is as follows.

[^a-zA-Z0-9]

The important point is that adding the “^” after the first blacket. This means that anything other than the character inside the brackets.

Or, if you only want to remove invisible characters, you can also use the Find Replace tool. But it is a bit annoying because you have to make a invisible character list.

Conclusion

  • I have explained how to confirm invisible characters by Alteryx Designer and if then, how to remove them.
  • This article is inspired by the Weekly Challenge “Challenge #443: Mystery of the Unjoined Records“. When I was wondering whether to write this article, this Weekly Challenge was posted.

Sample Workflow download

The next post will be…

The next post will be data prep in the Result window.

コメント

Copied title and URL