Original working title: How the Python Tool lets you do any damn thing you want.
Want an easy way to either read or write Parquet files in Alteryx? Use Apache Arrow (more specifically PyArrow) and the Python Tool.
With just a couple lines of code (literally), you’re on your way. Here. I’m loading a csv file full of addresses and outputting to parquet:
from ayx import Package
from ayx import Alteryx
from pyarrow import csv
import pyarrow.parquet as pq
Package.installPackages(['pyarrow'])
source = 'C:/addresses.csv'
table = csv.read_csv(source)
pq.write_table(table, 'c:/addresses.parquet')
The view from Designer:
By the way, see that Alteryx menu item in the menu bar? You can actually pull a Jupyter notebook right in so you don’t even have to TYPE anything. Sheesh, we make it easy. Want it? Here.
And of course, you’ll want to go the other way. Maybe you have a parquet file, and you want to read it into Alteryx. Perhaps you want to dump it out as a CSV. No big whoop:
Want this notebook too? But of course.
Here’s the workflow and the addresses.csv so you can play yourself.
A couple things to keep in mind:
- To install the
PyArrow Package, you may need to run Designer via “Run as Administrator” IF you did an admin install of Designer itself. Otherwise, you’ll get errors. - In the examples above, I did have Designer running in admin mode, which is what allowed me to read and write from the root of C:/ If I hadn’t done this, Windows would probably have thrown a permissions error.
Where does the Apache Arrow download need to be stored in order for Alteryx to be able to load it?
Thanks!