What is this all about? Start here, silly.
For those of you who spend a lot of time standing up environments in the cloud, or for those of you who just don’t care, this post will be “meh”.
But for everyone else, you’ll probably be able to pick up a hint or two. Some of what I’ll point out is conventional wisdom, there are a few hard-earned lessons from yours truly, and there’s some very opinionated personal preference thrown in, too.
As I mentioned previously, I’d suggest you use both a public and private subnet for this work. Granted, doing so does add complexity. BUT, it does make it much harder for bad guys to get at your goods even if you happen to make a mistake or two elsewhere.
While it can be fun and instructive to create the VPC and subnets yourself, I’d recommend you simply use the VPC Wizard. Doing so will allow you to ignore certain complexities like Routing Tables, Internet Gateways, etc.
Newbie docs: What’s a VPC?
When running this wizard, you have two choices around how your private subnet “gets to the internet”. Either a NAT Gateway, or an “old fashioned” NAT instance.
TL;DR I generally like NAT Instances except for big projects where there will be lots of data flying around
Newbie Docs: What’s a NAT?
The NAT gateway runs 24 x 7, and you get charged for both the time it runs and the amount of data that goes through it. The NAT Instance is a single Linux machine that handles network address translation. NAT gateways are “set it and forget it” – they scale for you automatically as network traffic increases….but you pay for them running all the time.
I generally choose the NAT instance, except for big projects.
So, this is sort of a lie:
…I’m using a NAT Instance which also acts as a Bastion Host for me.
TL;DR: If you create Security Rules which allow lots of traffic from 0.0.0.0/0, you’re begging for trouble.
After you create your VPC and subnets via the VPC Wizard, you’ll need to create a security group for machines in your private and public subnets.
Newbie docs: What’s an AWS Security Group?
Please don’t use the “default” group that was generated during VPC creation. Please don’t start opening up tons of ports in the default group either. Some other person who knows less than you is going to choose this thing in the future and get themselves into a world of hurt.
True Story: So, I’m temporarily working with this company that is pretty “wild west” with their cloud infrastructure: Everyone is an admin. The problem is that many of these folks are in a hurry and simply add rules like “All Traffic” allowed from 0.0.0.0/0 (“the world)”…or “8088” (a Hadoop YARN port) allowed from 0.0.0.0/0. They also put their Hadoop clusters in public subnets because it was a pain in the ass to access HUE, Resource Manager, etc. in the Browser from over the public internet otherwise.
Result? Their Hadoop clusters were owned by crypto-mining hackers over and over again. And again….and OMG
makeit stop, again. It got to the point where I wrote a document on “how to clean up” and had it ready to forward when I got that “My Hadoop cluster is really slow and all nodes are at 100% CPU” email. Don’t be these people.
Create a few, simple security groups that don’t allow much traffic through, and use them religiously.
I have a security group for machines (Guacamole, NAT Instance) in the public subnet, and one for the machines (Server, Connect, Promote) in the private network.
In my sandbox:
- Each security group allows all traffic which originates from “inside itself”: so that machines in the same subnet can talk.
- With the exception of Promote, most of the work we do is over HTTP/HTTPS (80/443) and sometimes 27018 (MongoDB), We also might want to connect an external SMTP server (587, 25, 2525, 465). When we want to really lock things down, I might add rules to allow ONLY these ports for inter-subnet chit-chat and remove the “all traffic” rule.
- I’ll be accessing machines via RDP (3389) and SSH (22) from “the world” (0.0.0.0/0). However, I’m using Guacamole in the public subnet for RDP (all the time) and SSH (much of the time) connections – so in my private network, I might choose not to trust 3389 and 22 from 0.0.0.0/0, but instead only allow connections on those ports from the private IP address of my Guacamole machine. Up to you.
Here are the rules from my public security group:
- All traffic allowed from machines in the public and private security groups
- Only HTTP and HTTPS allowed from “the world”
- SSH and traffic on ports 0-65535 allowed only from my home and/or work
And the private group:
- HTTP and HTTPS traffic allowed from “the world”. When I finally get around to accessing Server and/or Connect from the browser, I’ll be doing so across my application load balancer – so the security group will “see” my “outside world” IP address rather than that of a machine in the public subnet.
- All Traffic from the public or private subnet: Again, I will clean this up and make things more restrictive after I have everything up and working properly.
- If I need to SSH into one of the Promote hosts without Guacamole, I’ll be doing so via my NAT/Bastion. Therefore, it’ll look like I’m coming from the public subnet vs. “The world” to the machine I connect to.
Now that we’ve covered the network, let’s talk about the non-Alteryx stuff I’m running.
Guacamole is Linux-based open source software that allows me to easily RDP or SSH into any of the machines in my private network. There are lots of more “manual” ways to do this, by the way — but I (often) find it easier to use this software.
For example, if I need help from a 3rd party troubleshooting something, it’s super easy to add another user account to Guacamole, grant that user permissions to certain connections in my network, and then be done. All the usernames and passwords for the resources in my network are already saved in Guacamole (and can’t be “seen” by the helper), so it makes things much more simple. Later, I can disable the user account I created:
Another reason I use Guacamole… it’s fun and cool. I’m showing off just a little bit. You totally don’t need Guac, actually. You could use another Windows machine in your public subnet instead. Create some RDP connections on the Desktop to the machines in the private subnet, and you’re ready to go.
In case you want to set up Guacamole, you’ll find several good tutorials on the internet: like this one and this one.
A bastion is generally a Linux box that does nothing more than give you a jumping off point into your network. Use something cheap.
It is common to put a security rule on the Bastion that only allows traffic from certain
Application Load Balancer
The goal of this entire exericse to to allow you and others to access Altyerx services from anywhere. So they need to be able to connect to resources in your private network via the public internet. That’s where an ALB comes in.
Newbie docs: What’s an ALB?
Typically, one comes into a load balancer, and Rules direct traffic like so:
- “If the user is asking for gallery.alteryx.com, forward to <some place> in the private network”
- “If the user is asking for connect.alteryx.com, forward to <some other place> in the private network>”.
- “If the path the user is asking for contains /gallery in the URL, forward to <some place> in the private network
Because I want to be fancy, I have three domain names in AWS Route 53 which all redirect to the same load balancer:
I want the ability to at least
Simple Directory Services is not “real” AD – but it’s good enough to handle things like adding users, joining machines to a domain, etc. It’s also “set and forget”, which is good:
You’ll give your directory a name like “foo.com” and assign it to the VPC you created earlier. The directory should be available in each subnet you might put machines in which need to join the domain.
Even though I don’t think I’ll ever need any domain-joined machines in the public network, I dropped Directory Services into BOTH the private and public subnets, just in case.
As a result, you’ll see two subnets listed, as well as two DNS Server IP addresses: one for each subnet.
Cloud. Good. Fast. Get things done. Alteryx.
Up next, we’ll install Server and a Worker. It’s easy to do and I found the hardest bit was making sure the Windows boxes joined my domain correctly 🙂