Auditing is an important part of governance and the combination of Unifi Software and Tableau have you covered.

Of course, what you’re about to see can also be accomplished with Unifi plus pretty much any analytics tool – but Tableau is typically going to get you there faster. (Note: I am an ex-Tabloid, and therefore a Tableau bigot. Deal with it.)

What does Unifi give you in terms of Auditing?

You can use Unifi to monitor the entire process of operationalizing your data. From discovery and ingestion, through prep and stewardship, Unifi tracks what your data engineers, stewards, and analysts do.  When the data is viewed or downloaded, Unifi knows and remembers. You can get pretty nasty-ass forensic about things.

Don’t get me wrong either – it’s not all big brother and risk mitigation we’re talking about. You can use the information Unifi automatically records for you to do things like:

  • Identify hidden SMEs and budding data rock stars on your staff that you had no idea existed
  • Understand which cross-functional datasets (weather, income, etc) are adding value to the data culture and where your gaps are

Whether you want to “lock it down” or “open it up”, this information can help you.

Where is the audit trail stored?

Unifi’s PostgreSQL instance.

By default, our PostgreSQL listens on * in postgres.conf, but only allows connections via local and 127.0.0.1/32. This will enable you to create a data source and datasets via Unifi only.

If you want, get into /usr/local/pgsql/data/pg_hba.conf and loosen the rules a bit. Then, you can connect directly to the tables in question via your favorite analytics / BI tool .

What’s there to see?

Event Categories

At the highest level, Unifi tracks the following for you:

  • Events related to authentication and session creation/destruction
  • Permission events
  • Object access events
  • Create, update, and delete events related to objects

Authentication

Want to know who’s logging in and out? You want the uf_audit_auth_event table.

It’s going to give you some broad categories (login, logout) as well as some explicit text around what happened:

Who’s logging in?

I like the fact we’re capturing the IP address of the user. You can also clearly see when I fat-fingered the password for a new user I added over and over again about 3p on December 16. Useful.

Session creation and destruction

Unifi lets you see how long people are working, as well. Sessions are monitored via uf_audit_active_session. The columns session_start and session_end are your friends.

In the screenshot below, it’s pretty clear that the unifi user (that’s me!) is in the system the most often. In this case, I wasn’t actually working all the time. I simply left my browser open and the session timeout on Unifi was high enough that it allowed me to stay on the system for reasonably long periods of time. (Note to self: Lower ACTIVE_SESSION_IDLE_TIME, dummy!)

How long are people working?

Object Access Events

The uf_audit_access_event table records the lion’s share of information generated when users leverage data-related objects in Unifi.  Unifi can/will track what users touch even when that object doesn’t live inside Unifi itself – more on that in a moment.

First, let’s look at the schema.

 

Next, I can see various types of action related to accessing, or “using” objects. These actions are not tasks related to creating/updating/deleting the objects, however.

Below, I can see that I (user = unifi) touched a dataset named uf_audit_active_session coming from a data source named Unifi Metadata Database – that’s the PostgreSQL database. This action was quite literally the result of me clicking the dataset in the dataset list, and the dataset opening to preview data.

 

If the next thing I had done was choose to do some data prep using this data source, I would have seen another access against the same object, plus many other related objects… Here’s an example of this in action:

I open up factDailyResponse….

Find and open factDailyResponse

…and decide to create a Data Prep job. Unifi immediately suggests all the other datasets I might want to join factDailyResponse to:

Join me! Join me!

If I look at my audit table at this point, I’ll literally see each of the tables above. Unifi has “exposed” them to me, so we record that fact. I may or may not choose to use them, but it doesn’t matter. I’ve seen them and know of their existence. We record that.

Objects I encountered during the Join Datasets process…

 

What else?

I can also see people viewing the output of a data prep job (essentially eyeballing the results in a web page to make sure they’re correct). I’m able to watch them downloading the data, too. 

Downloading Results

 

In fact, any time a user sees data, you’re going to know. Here’s an interesting edge case scenario:

I accessed a file on HDFS and previewed its contents before I created a dataset from the file. Even though this file “isn’t a part” of Unifi yet, I looked at it…so Unifi surfaces that information. If I chose not to create a dataset out of the file in question (maybe it contains lame, dirty data), Unifi would still report that I peeked at it.

 

Finally, we stitch all this information together and have a decent little status panel that I can use to understand what’s going on inside my system:

A Basic Dashboard

Create – Update – Delete Events

Next up: Events that focus on the life-cycle of my objects: Creation, Modification, “Retirement”.  This is arguably the most complex schema, and at its center, you’ll find uf_audit_createupdate_delete_event:

 

Rather than attempt to explain what all these suckers do, it’s easier to SHOW you. Here’s a dashboard which consumes the dataset in question:

Ch-ch-ch-changes.

 

As you can see, there’s a lot going on here:

  • Data Prep creation & modification events
  • Workflow tasks stringing together Prep jobs are created and modified
  • Schedules to execute the Workflow and Prep jobs are created
  • Data Adapters are added to the system by an admin
  • Data Stores and Data Sources are added and modified on the system
  • Datasets and associated dataset columns are created and updated

By highlighting the various categories of work, it’s pretty easy to see which users do which type of work most often. It’d therefore be trivial to spot outliers to generally known behavior. You could probably run this data through some sort of ML (maybe a decision tree) and build some complementary analytics that literally presents the actions which are out of the ordinary based on who the user is and what they normally play with.

Permissions!

Are your people buttoning down your data or opening it up? I’m an “open it up” guy myself, but I recognize the importance of making sure PII and other sensitive data doesn’t escape into the ether. uf_audit_permisssion_event will help you understand which users are changing permissions, and how.

Authorization and Permissions

In the (very poor) dataviz below, I’m interested in:

  • Who is changing permissions most often?
  • Which users are the target of all this permission changing?
  • Are the changes making the system more or less restrictive overall?
  • Are there critical columns that are being “opened” or “closed” to the general data-consuming public?

Who set the damn s3 bucket to Public?!

This poor little dashboard communicates the following:

  • Users Denny and Darlene are likely creating datasets and giving blanket GRANTs to all users on the system.
  • User unifi (me!) made sure to lock down a sensitive column (“Password”) in his dataset. He’s such a good boy.
  • User unifi also noted that Darlene had inadvertently granted permissions on a security-related dataset to everyone. He REVOKED permissions on that dataset from everyone, including Darlene.

Summary

Unifi + Tableau will give you deep insight into how your data is (or isn’t!) secured. It’s an automatic flight data recorder for security that you can use to avoid nasty problems.

Leave a Reply