Automating Your DevOps Infrastructure: Starting from Scratch with Ansible

Automating Your DevOps Infrastructure: Starting from Scratch with Ansible

Dariusz Pasciak
Dariusz Pasciak

September 19, 2016

You've heard of it, but you haven't done it. You know it's where you want to be, but it's not where you are. You know there exist many ways of doing it, but you don't know where to start.

So how do you go about automating your infrastructure?

A pretty simple yet powerful tool to get you started is Ansible. There are lots of other tools out there for this, but here we will only focus on Ansible.

There are several characteristics of Ansible that make it easy to get started:

  1. Only you need to have it installed to use it. You don't need to install any software on any other machine.
  2. There are lots of modules to get you started, so there is a very good chance that Ansible already has a solution to begin automating some of the things you are doing manually today.
  3. Because it's been around for a while and is widely used, there are lots of examples out there on which you can lean to help you automate just about anything. In addition, Ansible documentation is an excellent resource to get you going.

Ok, that sounds nice, but where do you really begin?

Well, I said you can automate just about anything with this tool, so you can start by taking notes (mental or digital) of the things that you do today that are not automated. And sorry, but writing code, fixing bugs, and doing laundry don't count.

The biggest candidates for automation are tasks you perform by having to SSH onto a server and "do that thing." A few examples may be:

  • restart a dying but not-quite-dead process
  • git pull and "redeploy"
  • add someone's public key to a list of authorized_keys so that someone else can then SSH onto that server and manually do the things you do
  • delete old log files from an archive folder because the server is running out of space

Some of those sound familiar to you?

Great! Let's get started.

To get our feet wet, let's try to automate the process of deleting unnecessary log files. This is a good task to start with because deleting files on a server somewhere, especially in an automated fashion that we're not familiar with, should allow for an adequate level of adrenaline in our bloodstream and keep us vigilant for the duration of this blog. As a nice side-effect, we may walk away from this retaining more knowledge than we otherwise would have.

The first step is, obviously, to get Ansible installed, so go ahead and do that now.

But seriously, go and get Ansible installed. You will need it.

Wonderful. Now that you've done that, we have two paths we can take from here. We can start by either executing a series of one-off ansible commands directly on the command line, or we can create a playbook, which is essentially a file that contains groups of Ansible commands to be run in sequence.

Although we can achieve the same results using either approach, for the purpose of this example we'll choose to create a playbook. Doing so will give us something tangible that we can later move into source control, share with others, and continuously improve upon.

Fire up your favorite editor and create a file called clear-logs.yml. Put the following snippet of yaml into that file:


- name: "Delete logs that we for sure 100% won't ever need to look at"
		hosts: localhost
		tasks:
				- name: "Create file"
						file: path=my-ansible-file state=touch

Now before we run this, let's break it down line by line to make sure we understand what's going on.

The first line, beginning with - name: "Delete ..., has a strictly informative purpose: it is a short phrase that communicates to the user what this playbook does. You can change "Delete logs that ..." to any text you'd like. It should clearly communicate what this specific playbook is used for.

The line after that, hosts: localhost, specifies which machine(s) the playbook should target. Note that we are using localhost because we didn't actually create any external machines on which we'd like to operate. I will show you how to operate on external machines shortly.

Line number three, tasks:, simply declares that everything below this line will be a list of tasks (or commands) that should be run on a given machine.

The next line, - name: Create file is the beginning of our one and only task that we currently have in this playbook. It is also informative to the user and meant to communicate what the intent of this specific task is. When the playbook is run, the string you enter here will get printed to the console when this task actually begins running.

Finally, the last line, file: path=my-ansible-file state=touch, is where the "interesting" stuff happens. This line instructs Ansible to use the file module to create (touch) an empty file called my-ansible-file. Note that because we are targeting localhost, the file will be created in the directory from which you will run the playbook.

Now I know that you have gigabytes of juicy data logs that you're anxious to delete, and so far this playbook does not delete anything, but hold on. Let's take small steps so we don't get lost.

In your terminal, in the directory where you saved that file, tell Ansible to run the playbook by using the following command:

ansible-playbook delete-logs.yml

You may see some warnings about optional dependencies, or something about missing hosts files–don't worry about those for now. The important thing is that somewhere amidst that output you see something like:


TASK [Create file] *************************************************************
changed: [localhost]

What that means is that your "Create file" task completed successfully. You can ensure that the file was created by running ls my-ansible-file and seeing that it shows up. It will be empty, because we haven't written anything to it, but it will be present.

Let's just take this one step further and delete the file. Modify your playbook so that it now looks like this:


- name: "Delete logs that we for sure 100% won't ever need to look at"
		hosts: localhost
		tasks:
				- name: "Create file"
						file: path=my-ansible-file state=touch
				- name: "Delete that file we just created. Why? I don't know."
						file: path=my-ansible-file state=absent

The only thing we added there were the last two lines. What this gives us is a playbook that has two tasks: one for creating a file, and one for deleting it immediately after.

Notice that we're using the same file module for the second task. The only difference between the two tasks is the state attribute. The more you use Ansible, the more you'll realize that state=absent/present/something-else is actually a pretty common attribute for a lot of tasks. It typically translates to "make sure this resource does not exist, and if it does then delete it" or "make sure this resource does exist, and if it doesn't then create it." The word "resource" here can mean anything from a server somewhere in the cloud, to a piece of software installed on some server, to a file as we see here.

Make sure to save the playbook file and rerun the playbook (ansible-playbook delete-logs.yml).

This time there should be a little bit more output. Specifically, we should see something like:

TASK [Create file] ********************************************
changed: [localhost]

TASK [Delete that file we just created. Why? I don't know.] ***
changed: [localhost]

This means that both tasks ran successfully and that the file we created should now be gone.

Pretty cool, huh? Surely, much more interesting than running a simple bare-bones touch my-ansible-file && rm my-ansible-file on the command line.

So how do you run this on a server somewhere?

Before we continue, the prerequisite here is that you already have a server up and running somewhere and that you can connect to it over ssh without a password, but a public-private key pair instead. This blog will not show you how to set that up.

Assuming you have that setup, let's continue. And here is where things get a liiiittle bit more complicated. We'll have to do a few things in order to run those same tasks on a remote server:

  1. Add one line in the delete-logs.yml playbook to specify which user we will be operating as on the server.
  2. Change one line in the delete-logs.yml playbook so that we operate on "all" machines and not just localhost.
  3. Add parameters to our ansible-playbook command that specify which machine we want to operate on and which private key file should be used for connecting to that machine.

1. Specify user in playbook

Open up the playbook again and, right underneath the first name key and right above the hosts key, add a user key with the name of a user that you can use to log in to your server.

2. Change "localhost" to "all"

Modify the playbook and change the value of hosts key from localhost to all.

After this step, the top of your playbook should look almost identical to:


- name: "Delete logs that we for sure 100% won't ever need to look at"
		user: dariusz # <- here is where your username goes
		hosts: all
		tasks:

3. Add parameters to ansible-playbook command

The -i argument can take a list of hosts to operate on or the name of a file that contains a list of hosts to operate on. We will provide the name of our host directly on the command line.

The --private-key argument takes the path to a private key file, which can be used to connect to the remote machine as the user you specified in step 1 above.

As an example, if my machine's IP address is 55.66.77.88 and my private key is located in /super-secret, then my new playbook command would look like:

ansible-playbook -i "55.66.77.88," --private-key=/super-secret delete-logs.yml

Note the comma after the IP address. The comma signals to ansible-playbook that what you provided is a list of hosts, and not a path to a file that contains a list of hosts. Without the comma, ansible-playbook will attempt to read a file called 55.66.77.88 and fail.

One other thing to note is that the --private-key argument is actually optional if you use your default key (typically located in ~/.ssh/id_rsa) to connect to the machine.

Run that command and you should see an ouput like:

TASK [Create file] *************************************************************
changed: [55.66.77.88]

TASK [Delete that file we just created. Why? I don't know.] ********************
changed: [55.66.77.88]

If you don't this, then you'll probably see an error message telling you that Ansible failed to connect to the host you specified. If that's the case, double-check the path to your ssh key, the username you entered in the playbook file, and the IP address you entered on the command line. Make sure all of these are correct.

If that still doesn't work, then double-check again. This will result in what is commonly known as a triple-check.

Run the command one last time, and you should now see the expected output indicating success. Feel free to manually ssh onto the server and behold that ... nothing actually changed. That's because the same file we created with the first task in our playbook was immediately deleted by the second task.

At this point, you should be able to modify the playbook yourself to delete or create any files you'd like on your server. Go ahead and play around with that and try to create a playbook for removing those space-consuming log files that you were hoping to get rid of earlier. [1].

Now someone might say that this is stupid, and that really you should have a Cron job for doing these sort of maintenance things like deleting files and freeing space. Thanks for the suggestion, Johnny. That's really not a bad idea. But how would you configure that Cron job? Would you just ssh onto the server and update the crontabs?

This really sounds like something you could use Ansible for. Your crontabs can live in the same repo as your playbooks, and you can use Ansible to get the crontabs onto your servers. What you get here is a repeatable, automated way to deploy crontabs to your server or many servers, and you get source-controlled crontabs at no additional cost. Go ahead and try that as an exercise.

Now what?

I'm glad you stuck with me through this short and simple exercise on Ansible. By no means was this intended to be an in-depth intro to the tool, but rather a motivation for actually picking it up and starting to use it. Remember, automating your entire infrastructure is not something that happens overnight, or even over a one-week span. It is something that gradually evolves over time to meet the needs of your team and organization.

One thing that's certain is, if it's done correctly and well-maintained, an automated infrastructure will save you and your team a great deal of time. It doesn't come for free, as it needs to be built up and will change as your organization changes, but it certainly beats pages of documentation and manual clicking around when it comes time to "install this piece of software on such and such machine" and "provision a new AWS instance that can be used for this and that."

To leave you with a few words of advice, I recommend that you start small. If you're just getting into infrastructure automation, don't try to tackle the toughest of problems first. Take small, manual tasks that are easy to automate and automate them. This way, not only are you moving forward, but you are also getting more and more comfortable with the tool. The more you use it, the more you will understand it and the easier it will be to tackle bigger and bigger automation tasks. Keep at it, and before long you'll be surprised at how much you have automated!

Secondly, take a peak at Ansible's playbook best practices page. Don't worry about doing everything exactly as they recommend in there, especially in the beginning. Focus on getting stuff working first. As your repository gets bigger and bigger and you see that your team is actually using it, then you can start worrying about making it "proper."

Cheers!


[1] I highly advise against doing this "playing around" or testing of playbooks on a production server. Be sure to test your playbooks thoroughly on servers where you can't cause any damage.