WeeklyAlteryxTips#38 Load the data to the Database by PreSQL

Alteryx

This post is about loading the data to the database.

When you adopt the Alteryx Server, you can use schedule function. In this case, you can make and maintain the data pipeline, so you can use Alteryx as ETL tool. As you know, Alteryx can get the data from complex Excel sheets and transform the data into the clean data, so you can use such Excel files as part of a data pipeline.

Speaking of the making the data pipeline, you will connect to vary databases. But the Alteryx can’t use bulk insert at some databases (for example, MySQL). And Alteryx can’t load the Spatial Objects into some databases by Output Data tool. In these case, you want to use SQL in Alteryx.

Basically, to use SQL directly, you need to use the Pre/Post SQL Statement function on the Input Data tool in Alteryx. I posted the article “WeeklyAlteryxTips#21 Call the stored procedure in PostgreSQL” about Pre SQL Statement function before.

What is the Pre/Post SQL Statement?

This function’s name is very long, so I shorten it to “PreSQL” or “PostSQL”.

This is a function that run the SQL you write in this option. Regarding the Input Data tool or Output Data tool, they read/write the data from/to database using SQL, but you can only one sentence of SQL. But you sometimes want to use some sentences for setting. In this case, you can use Pre/Post SQL statement. In other words, “Pre/Post SQL statement” function is to use databases flexible.

Try it out!

Let’s inser the data to PostgreSQL. Here is the data to insert the database.

I would like to add one row. The SQL is as follows.

INSERT INTO public."SampleTablewithUniqueKey"("Category","Item","No","UnitPrice") VALUES('meat','beef',6,200);

This is a simple SQL. You can also add this data by Output Data tool. But this time, I would like to use Pre SQL statement function on the Output Data tool.

First, Open the setting of the Output Data tool. In this case, you need to uncheck the “Run PreSQL on tool configuration” because every time you open the configuration of this Output tool, it will run the statement.

Aftar that, click the “Pre SQL Statement”. And then, the “Configure Pre SQL Statements and Stored Procedures” dialogue opens and please click the “SQL Editor” tab and paste that SQL.

Click on the OK button.

Next, this is the Input Data tool, so you need to set the SELECT statement to Query option on the Input Data tool. If you need to know the record count to confirm normality, you can set the count statement or the SQL “SELECT * FROM xxxx”. But it will take much time depends on the database system, so in this case I want to set the dummy SQL.

select 1 as "dummy"

The setting is over. Once again, surround the parts that need to be set with a red frame.

After the running and you can confirm the data below. The inserted data is row 6.

However, this is only the fixed SQL sentence, so if you want to insert huge data, you need to remake this into the batch macro. In other words, you need to rewrite the PreSQL Statement option by the Control Parameter tool.

Batch Macronize

Batch Macronize is very easy. You set the Control Parameter tool to update the PreSQL statement on the Input Data tool. And also, for making the generic macro, rewrite the Connection ID by Text Input and Action tool at the DCM mode.

The setting of the Action tool connected to the Control Parameter tool is as follows. Select the “PreSQL – value”.

The setting tool of the Action tool connected to the Text Box tool is as follows. Select the “Connection – value”.

Careful Point

  • When you do bulk insert on MySQL or handle the spatial objects, the SQL is sometimes too long. In this case, the data is over Alteryx’s limitation. So, you need to devide into some SQLs.
  • When do multiple INSERT INTO in Pre SQL:

If the error occurs durning multiple Insert, unexecuted INSERT INTO will be skipped. But executed INSERT INTO is commited. If you don’t want to write the all data, you need to do transaction control(add “begin – commit” to the PreSQL). The following SQL is a sample. In this SQL, 2th INSERT INTO occurs “Unique Key violation” error, so the “commit” is skipped and this SQL write nothing in fact.

begin;
INSERT INTO public."SampleTablewithUniqueKey"("Category","Item","No","UnitPrice") VALUES('meat','chikin',8,100);
INSERT INTO public."SampleTablewithUniqueKey"("Category","Item","No","UnitPrice") VALUES('meat','beef',6,200);
commit;

If you don’t add “begin – commit”, when the error occurs, remaining SQL sentences will be skipped. When you execute the following SQL, 1st row occurs “Unique Key violation” error, so remaining SQL sentenses will be skipped.

INSERT INTO public."SampleTablewithUniqueKey"("Category","Item","No","UnitPrice") VALUES('meat','beef',6,200);
INSERT INTO public."SampleTablewithUniqueKey"("Category","Item","No","UnitPrice") VALUES('meat','chikin',8,100);
  • Please try to do what you want and check that database and drivers are working properly.

Check the another way to run the SQL in Alteryx

Alteryx has some tools which runs SQL.

  • “Connect In-DB” in In-DB category
  • “Dynamic Input In-DB” in In-DB category
  • “Input Data” in In/Out category
  • Python tool
  • R tool

However, you can not do all SQL command in the tools above. “Input Data” tool, “Connect In-DB” tool and “Dynamic Input In-DB” only can execute “SELECT” command. If you write “INSERT INTO” or other SQL command, Designer automatically add “SELECT” to the beginning of the sentence and the error occurs.

Actually, you can only execute SQL freely in “Pre/Post SQL Statement” function of the Input Data tool and the Output Data tool.

And also, the tools that multiple SQL statement can accept are …

  • “Pre/Post SQL Statement” function of “Input Data”/”Output Data” in In/Out category

Of course, you can connect any database by installing any packages in Python/R tool. This is final way in the point of No Code view. However you can adopt another way. Designer can run command line tool of any databases. I think they are faster than R/Python tool.

Conclusion

  • You can execute any SQL command in the Pre/Post SQL Statement option in the Input/Output Data tool.
  • Please try to do what you want and check that database and drivers are working properly. I sometimes faced bugs.
  • You can adopt Python/R tool, too.
  • If you want to do bulk load to MySQL, Using InFile is the fastest way.

Sample Workflow Download

Contents of the next blog

Next post will be Make Columns tool. It was in the Laboratory(beta tool) category before. But it is now official tool from 2024.1.

コメント

Copied title and URL