AlteryxCloud

Alteryx Connect: Loading Data

So, we have Connect up and running. It’s now time to ingest data. I chose several different data sources to go after in order to get a feel for how Connect works:

  • TPC-H data from the sample Snowflake TPCH_SF1 database schema
  • Five different databases I have sitting on an instance of PostgreSQL running under AWS RDS
  • A (file system) folder full of random CSV files
  • The contents of two sites on my personal Tableau Server

Ingest? Connect isn’t storing my sensitive data, is it?

No.

When I use phrases like “load”, and “ingest”, I don’t mean we’re actually storing any user data inside Connect.

Alteryx Connect does generate some simple statistics about your data, but that’s it: For example, we’ll figure out the Percent of unique values coming from a column (like “zipcode”) in a CSV file we crawl and save that for you…but we don’t store the column values (like “10024”) themselves.

We’re going after the data-about-your-data, not the data itself:

  • Column and Table Names
  • Information about data types
  • Who owns what
  • How things are related to each other
  • If your assets are tagged, and what are those tags are
  • Plain-english descriptions of your asssets that exist in the source systems

Fine. How do I load it up?

Connect can load data from a bunch of different data sources, and for each data source you want to ingest, you’ll execute a Loader. The Loader is nothing more than an Alteryx Analytic App that knows how to connect to the data source in question, extract metadata from that source, and then push it into Connect. Loaders can be run manually (which is what I did) or scheduled.

You may recall in the previous blog post we spent a bit of time plugging in location and login details for our Alteryx Server – this was so we can schedule and execute Loaders right from Connect if we want to.

You’ll start by downloading and installing a set of Alteryx Loaders on your Alteryx Server. I sort of wish these had just been part of the standard Server install. That said, I understand that some folks without Connect might not want those extra apps placed on their Server.

The payoff for downloading these suckers was big for me, however. I’m not all that good at building Alteryx Workflows, you see. The workflows encapsulated in these loaders are pretty complex and demonstrate some smart ways of doing things like:

  • Hitting REST endpoints and marshaling requests and responses
  • Parsing XML and JSON
  • Dumping and leveraging schemas from different data systems

For example, here is some of the logic we use to pull metadata from the Alteryx Gallery:

I plan to spend many hours reverse engineering these things and I suspect I’ll learn a lot about how to solve difficult prep problems.

Anyway, you’ll need to (initially) launch the loader you want to use from Designer:

Let’s load a Snowflake Data Warehouse

…then plug in connection details for the server (Snowflake, Tableau, whatever) you want to load from:

…and finally, tell the analytic app where it should load data TO (your Connect machine):

Click Finish, and grab a drink. By the time you get back, your Connect box will be populated with assets from the source system like object names, parsed connect strings, tags, people (owners), and more.

Reports:

I can see Tableau Vizzes and Alteryx Insights dashboards & charts. This is where your Qlik / Power BI assets would show up, too.

Alteryx Workflows:

Your Workflows, workflow descriptions, and so on. If I take a look at a particular asset, I’ll get more details – like the database table(s) it uses, where that database is, etc:

I haven’t done much here.

Data Sources:

You’re going to have a bunch of stuff related to Data Sources.

In the screen above, I might want to drill down into the folder of Files I ingested, or maybe look at my Database Servers:

..and the databases on a specific server:

…and of course Tables, and Columns. (You get the idea. I’m not going to drill anymore):

IF your Tables, Columns, Vizzes have associated descriptions, Connect captures and displays them. For example, here are column comments from PostgreSQL:

Search and Nexus

All of this stuff is searchable, of course. if I query on a keyword like “Address”, I get the following results:

  • parquet_to_csv consumes a file with the word “address” in it’s filename
  • Address is a column in a CSV file we crawled
  • seattle_park…addresses.csv IS a file we crawled
  • and it looks like we have three instances of a database column named “address” spread around the tables I’ve ingested

If I search for “sith”, I can see that people have been talking about the all-powerful Sith on the Alteryx Community AND that some of the Tableau vizzes I crawled were tagged with “sith” on Tableau Server:

Connect is also great at showing relationships between the assets I imported via a tool called Nexus. This is cool. Very cool:

I’ll start with a Tableau workbook called WWReach Sales Reporting. I can see that it lives in the Default project of a Tableau Server Site. Which Site? Let’s find out:

…looks like the Default site – the folder to the left of where I’m hovering

The workbook has two sheets in it, and it leverages one of my database tables named microbrewery_licensees. The workbook is tagged as “sith” and the Microbrewery Licensees viz is tagged as “sith” (from Tableau) and “beer” (from Connect)

I can see all the columns in that database table, and the fact that “I” own both the Tableau Viz and the Database Table.

I could also come at this from the database table direction too: Searching for microbrewery_licenses gives me not only the column names in the table, but lets me know that the Tableau WWReach Sales Reporting workbook depends on it – Impact analysis, baby!

There’s some interesting inference going on here – Connect figured out that the Database Table is related to the Tableau viz because it recognizes a hostname / port / database name that look “the same” in both the connection string of the Tableau workbook and the Database in which the table lives.

That “Russell Christopher” user you see is a Connect user. Connect noted the email address of the person who owns the viz in Tableau and sees that it matches the email address of a user it manages – so it infers ownership (you can change this of course).

Do you see all the relationships between your tables? Yep. We display relationships that exist in the data source AND additional relationships your users might choose to add. That’s table stakes:

Business Glossary

The Glossary is, well, a glossary. It lets you define terms and frame meaning in your business.

Of course, we also let you create relationships between these definitions and other assets (for example, a database table) that you are managing.

Let’s say that I want to tie the term “Asset” above to a particular table…

Easy! The sampledata_customers table is defined by the term “Asset”, and “Asset” is a part of my “Glossary”.

People

This area of Connect is where you manage groups, users, and permissions. Since this is more-or-less a virgin server, things look pretty bare.

Summary

There you go. We loaded data and you even got a really lightweight tour of the product. There’s a lot more here, but it gets a bit propeller-head.

Speaking of propeller-head: Next up is Promote! Linux! Yay!

Leave a Reply

Old as dirt posts

Contact me.