I would like to explain how to incorporate randomness to the Alteryx workflow. This post is inspired by the Weekly Challenge #409.
When/Where should you use the randomness?
At the many use case of the Alteryx, collecting data and summarize it in the organization. In such cases, you should get rid of the randomness.
On the other hand, it is needed for the randomness when you simulate something or predict for the future, such as random sampling for surveys, extracting data to train machine learning models, and simulating the rolling of dice.
How to use the randomness in Alteryx
Alteryx provides some way to handle randomness. Basically, there are two kind of functions.
- Gain the random values
- Extract the values from data randomly
Gain the random values
To gain the random values, you can use the function in Formula tools as follows.
- RAND()
- RANDINT()
- ST_RandomPoint()
RAND function is to gain the random decimal value between 0 and 1. RANDINT function is to gain the positive interger value within the integer number which you specify. Generally speaking, many functions can be specified the SEED number, but these are not able to specify.
ST_RandomPoint is special function. This is for the spatial analysis. When you use use it, you can gain the random point object within the polygon/polyline object. For example, you can create ramdom points based on pupulation with population polygons. I wrote this article “WeeklyAlteryxTips#32 Create a random point in the polygon” in this blog.
Extract the values from data randomly
For extract, Alteryx provides some tools. Many tools could be specified the SEED value.
- Random % Sample tool
When you want to extract the records from data randomly, you can use this tool. This tool is to extract the records by number or percentage of the total number which you specify. It is extracted randomly every time you run the workflow.
- Sample tool
Extract the each record by a percentage which you specify. The decision to extract or not is made independently for each record. It is extracted randomly every time you run the workflow, but you can’t specify the SEED value.
- Create Samples tool
Divide the datastream to three ones by the percentage which you specify. For example, you can specify 60%, 20% and 20%. The data is extracted ramdomly but if you don’t change the SEED value, the records combination is fixed. Basically, this tool is for machine learning training/validation data set preparation.
Alternatively, you can use the RAND() or RANDINT() functions to generate a random selection.
Random % Sample tool
When you want to extract some records randomly from data, you can use the Random % Sample tool.
Sample tool
When you want to extract the records by specified percentage, you can use the Sample tool with the option “1 in N chance to include each row”. But N’s minimum value is 2, you can not specify a probability greater thatn 50 percentage.
When you want to specify a probability freely, you can use RANDINT() or RAND() function as below workflow.
Create Samples tool
The Create Samples tool is for dividing the datastream for the training/validation of the machine learning. Basically, you can set the seed value and when you chenge it, the Designer change output data. In other words, the Designer doesn’t change the output data until you change the seed value.
However, the Random % Sample tool and Sample tool with “1 in N chance to include each row” option changes the output data every run. It is useful for validating the randomness, but this can be an obstacle to validating your workflow. In such a case, you have to save the data and use it instead of those tools.
Conclusion
- This post is about how to handle the randomness in Alteryx
- Please challenge the weekly challenge #409 which is handled randomness.
Sample workflow download
The next blog post is…
The next post will be about error detection.
コメント