Provisioning a LAMP stack with Chef, Vagrant, and EC2 (2 of 3)

Image may be NSFW.
Clik here to view.This is the second article in a three-part series on managing LAMP environments with Chef, Vagrant, and EC2. This article shows how to use Chef to install and configure a baseline LAMP stack.

The infrastructure created in this tutorial is pretty basic, with the production environment running initially on a single server instance. This is a useful setup for rapid prototyping of web apps, sharing a single server among multiple applications to minimize cost and complexity. Once you’re comfortable with this basic configuration, it’s relatively simple to scale it out, separating the roles onto multiple server instances.

Articles in this series:

Part 1: Introduction
Part 2: Provisioning a LAMP stack (this article)
Part 3: Configuring and deploying a web application

In this article:

Set up Git, Chef, EC2, and Vagrant

There’s a bunch of stuff to set up before we can really get cooking. Fortunately it’s all pretty straightforward, and there’s a lot of good documentation available for this.

Set up Git

Version control is a key part of configuration management. This tutorial uses Git for version control, in part because the Chef admin tool knife is particularly well integrated with Git. Using knife and Git together makes it easy to use third-party cookbooks from the Chef community, customize them with your own local changes, and still be able to incorporate upstream improvements by automatically merging them together. See Git workflow in the Chef manual for details.

If you don’t have Git installed on your workstation, see the documentation on setting up Git from Github. Create a Github account too, while you’re at it; we’ll be using it later.

Set up Chef server and workstation

The easiest way to get started with a Chef server is to use Opscode’s Hosted Chef–it’s free for up to 5 nodes. Once you exceed 5 nodes you can decide whether to pay a monthly fee or install your own Chef server.

Follow the Chef Fast Start Guide to set up a hosted chef account and a management workstation. You only need to go through about half of that guide–stop before “Step 5: Configure the workstation as a client”. In this tutorial, the clients will run on Vagrant and EC2 instead of on the management workstation.

After setting up the management workstation according to the Fast Start Guide, we’ll make a few small changes:

Move the Chef config directory from ~/chef-repo/.chef to ~/.chef. (This lets you execute knife commands from anywhere on the system.)
Add the cookbook_copyright, cookbook_email, and cookbook_license properties to the ~/.chef/knife.rb config file, so knife will automatically populate those values when creating new cookbooks. See the Knife config docs for details. Here’s mine:
```
cookbook_copyright       "Jason Grimes"
cookbook_email           "jason@grimesit.com"
cookbook_license         "apachev2"
```

Set up EC2

You’ll need an Amazon Web Services (AWS) account in order to proceed. If you don’t already have one, sign up for an AWS account here.

To manage EC2 instances with Chef, you’ll need to install the Knife-EC2 plugin, and then configure knife to authenticate to AWS.

Follow the instructions in the first two sections of the Chef EC2 Fast Start Guide to set up knife and EC2. (Stop before “Step 3: Bootstrap the EC2 instance.”)

Set up Vagrant

Follow the instructions in the Vagrant getting started guide to install Vagrant.

This tutorial uses the latest 64-bit long-term stable version of Ubuntu: 12.04, “Precise Pangolin”. After installing Vagrant, download the “precise64″ Vagrant base box:

vagrant box add precise64 http://files.vagrantup.com/precise64.box

Get cookbooks

Now that the essential tools are set up, we’re ready to download some Chef cookbooks.

You’ll need a cookbook for each software package you want to install with Chef. Cookbooks for many common packages are available for download from the Chef community site. If you can’t find a cookbook you need there, you can often find one hosted on Github or elsewhere with a quick Google search. Otherwise, you’ll need to write your own from scratch (or more often, using an existing close-but-not-quite cookbook as an example).

When using cookbooks from the Chef community, we can use Chef’s command-line tool knife to fetch the cookbook and merge it into our local repository. All the third-party cookbooks used in this tutorial can be retrieved this way:

knife cookbook site install apt
knife cookbook site install git
knife cookbook site install mysql
knife cookbook site install apache2
knife cookbook site install php 
knife cookbook site install build-essential
knife cookbook site install users
knife cookbook site install sudo
knife cookbook site install vim

This causes knife to download each cookbook (and its dependencies) as a tarball, extract it, create a new branch in our local Git repository, tag it, and merge it back into our master branch. If we already had custom changes to a cookbook, this magic would help merge in the upstream changes while tracking them in separate branches. See Git workflow in the Chef manual for details.

Run git log to review the changes.

Now upload the cookbooks from your workstation to your Chef server:

knife cookbook upload --all

Create Chef environments for development and production

Often you’ll want your development and production environments to have slightly different configurations–for example, the development environment might include xdebug, which wouldn’t be installed in production. To support this, Chef lets you define different environments and assign a node to a particular environment. To start with, we’ll create placeholders for a “dev” and a “prod” environment. Later on we can add custom configuration to these environments.

Create environments/dev.rb, with the following contents:

name "dev"
description "The development environment"

Create environments/prod.rb, with the following contents:

name "prod"
description "The production environment"

Commit the environment files to version control:

git add environments
git commit -m 'Add development and production environments.'

Upload the environments to the Chef server:

knife environment from file dev.rb
knife environment from file prod.rb

Create Chef roles for webserver and database

Chef roles provide a way to define a group of recipes and attributes that should be applied to all nodes that perform a particular function. For our LAMP stack, we’ll start by defining three roles: a “base” role with configuration common to all nodes, a “webserver” role, and a master database role called “db_master”.

Create roles/base.rb with the following contents:

name "base"
description "Base role applied to all nodes."
run_list(
  "recipe[users::sysadmins]",
  "recipe[sudo]",
  "recipe[apt]",
  "recipe[git]",
  "recipe[build-essential]",
  "recipe[vim]"
) 
override_attributes(
  :authorization => {
    :sudo => {
      :users => ["ubuntu", "vagrant"],
      :passwordless => true
    }
  }
)

The run_list method defines a list of recipes to be applied to nodes that have this role. The override_attributes method lets us override the default attributes used by various recipes–in this case, we’re overriding attributes used by the sudo cookbook so the “vagrant” and “ubuntu” users can run sudo without entering a password.

Create roles/webserver.rb with the following contents:

name "webserver"
description "Web server role"
all_env = [ 
  "role[base]",
  "recipe[php]",
  "recipe[php::module_mysql]",
  "recipe[apache2]",
  "recipe[apache2::mod_php5]",
  "recipe[apache2::mod_rewrite]",
]

run_list(all_env)

env_run_lists(
  "_default" => all_env, 
  "prod" => all_env,
  #"dev" => all_env + ["recipe[php:module_xdebug]"],
  "dev" => all_env,
)

This shows how to use the env_run_list method in a role to define different run lists for different environments. To simplify things we create an all_env array to define the common run list for all environments, and then merge in any additional run list items unique to each environment.

Create roles/db_master.rb with the following contents:

name "db_master"
description "Master database server"

all_env = [
  "role[base]", 
  "recipe[mysql::server]"
] 

run_list(all_env)

env_run_lists(
  "_default" => all_env,
  "prod" => all_env,
  "dev" => all_env,
)

Commit the role files to version control:

git add roles
git commit -m 'Add LAMP roles.'

Upload the roles to the Chef server:

knife role from file roles/base.rb
knife role from file roles/webserver.rb
knife role from file roles/db_master.rb

Set up a sysadmin user account

Define a user account for yourself to be created with sysadmin privileges on every node. This is done by defining a data bag for the users cookbook, with attributes that describe the user account to create. (A data bag is just a JSON data structure which defines attributes that are stored on the Chef server for searching and retrieval by a Chef client.)

I recommend creating a user with the same name and SSH keypair as your account on your workstation, so you can SSH in to the nodes without having to specify a username or password.

First, create a new data bag file named after the user you want to create:

mkdir -p data_bags/users
vim data_bags/users/$USER.json

Add something like the following to the $USER.json file (called “jkg.json” in this example):

{
    "id": "jkg",
    "ssh_keys": "ssh-rsa ...LoNgGobBleDyGookSshPuBl1cKey... jason@grimesit.com",
    "groups": [ "sysadmin", "dba", "devops" ],
    "uid": 2001,
    "shell": "\/bin\/bash"
}

The value of ssh_keys should be your SSH public key. This is often found in ~/.ssh/id_rsa.pub.

Commit the data bag file to version control:

git add data_bags
git commit -m 'Add sysadmin user data bag item.'

Upload the data bag to the Chef server:

knife data bag create users
knife data bag from file users $USER.json

Create an encrypted data bag for passwords and other secrets

Use an encrypted data bag to store secrets like passwords and encryption keys.
See Encrypted data bags in the Chef manual for details.

Create an encryption key:

openssl rand -base64 512 | tr -d '\r\n' > ~/.chef/encrypted_data_bag_secret
chmod 600 ~/.chef/encrypted_data_bag_secret

Add the following line to ~/.chef/knife.rb, to cause the encryption key to be copied automatically to Chef clients so they can use it for decryption.

encrypted_data_bag_secret "#{current_dir}/encrypted_data_bag_secret"

Make sure the $EDITOR environment variable is set to your preferred editor (ex. by adding something like export EDITOR=vim to ~/.bash_profile). The knife data bag command will launch this editor to let you edit the data bag contents.

Create a new encrypted data bag item for storing MySQL passwords:

knife data bag create --secret-file ~/.chef/encrypted_data_bag_secret secrets mysql

Enter the following in the editor that opens:

{
  "id": "mysql",
  "prod": {
    "root": "my-awesome-root-password",
    "repl": "my-awesome-replication-user-password",
    "debian": "my-awesome-debian-admin-script-password"
  },
  "dev": {
    "root": "my-dev-root-password",
    "repl": "my-dev-replication-password",
    "debian": "my-dev-debian-password"
  }
}

Note that you’re setting different passwords in the production and development environments. After saving and closing the editor, knife will automatically encrypt the data bag item and upload it to the Chef server.

Next you should dump the encrypted data bag item to a file that can be stored in version control. We’ll dump it in JSON format with the -Fj argument.

mkdir -p data_bags/secrets
knife data bag show secrets mysql -Fj > data_bags/secrets/mysql.json

Then store it in your Git repository. This gives you the benefits of version control while keeping your secrets encrypted to protect from prying eyes:

git add data_bags
git commit -m 'Add encrypted data bag for MySQL secrets.'

If you want to decrypt and display the data bag item, just include the --secret-file argument:

knife data bag show secrets mysql --secret-file ~/.chef/encrypted_data_bag_secret

To edit the encrypted data bag file later, use the knife data bag edit command:

knife data bag edit --secret-file ~/.chef/encrypted_data_bag_secret secrets mysql

After editing, make sure to store the changes in Git, as above.

To make the mysql server recipe use the settings from the encrypted data bag, we need to make a small modification to the recipe.

Add the following to the top of cookbooks/mysql/recipes/server.rb:

# Customization: get passwords from encrypted data bag
secrets = Chef::EncryptedDataBagItem.load("secrets", "mysql")
if secrets && mysql_passwords = secrets[node.chef_environment] 
  node['mysql']['server_root_password'] = mysql_passwords['root']
  node['mysql']['server_debian_password'] = mysql_passwords['debian']
  node['mysql']['server_repl_password'] = mysql_passwords['repl']
end

Commit changes to version control.

git add cookbooks/mysql
git commit -m 'Read MySQL passwords from encrypted data bag.'

Upload the modified MySQL cookbook to chef.

knife cookbook upload mysql

Provision EC2

Now that our initial LAMP configuration is in place, we’re ready to try provisioning it onto some servers.

It’s often best to do initial provisioning trials using EC2 rather than Vagrant, because EC2 tends to be faster to install. We’ll use the Knife-EC2 plugin to spin up a new EC2 instance, bootstrap it, and provision it with our LAMP stack. See the Chef EC2 Bootstrap Fast Start Guide for details.

Use the following command to provision an EC2 instance:

knife ec2 server create \
    -S aws -i ~/.ssh/aws.pem \
    -G webserver,default \
    -x ubuntu \
    -d ubuntu12.04-gems \
    -E prod \
    -I ami-a29943cb \
    -f m1.small \
    -r "role[base],role[db_master],role[webserver]"

Some things to note:

Use the correct path to your AWS certificate, if it’s different than ~/.ssh/aws.pem.
-G specifies the AWS security group. If you just used the “default” security group, you can omit this argument.
-E tells knife which environment to use for this node. This tutorial uses EC2 instances as the production environment.
The specified AMI is a 64-bit Ubuntu 12.04 EBS image in the us-east-1 region, provided by Alestic (because there were no official AWS AMIs for Ubuntu 12.04 as of this writing). If you want to use a different EC2 region, select a similar AMI in your desired region.
You must specify the db_master role before the webserver role if you want to use Chef to deploy the application as described in part 3 of this tutorial.

Once provisioning has completed, knife will spit out some data about your new EC2 instance, including the public IP address and the instance ID (which will also serve as the Chef node and client ID). You can also find this information later by running knife ec2 server list.

If you run into any errors during provisioning, you can tweak the Chef configuration, upload it to the Chef server, and then re-run the Chef client on the EC2 instance by SSHing in and running:

sudo chef-client

That’s the fastest way to make adjustments, because Chef won’t re-install things that are already installed the way you want them (that’s what is meant by the term “idempotence”).

But if you ever do need to wipe out the whole thing and start over, that’s easy to do with knife. You just have to make sure to delete the Chef “node” and “client” in addition the the EC2 instance. (If you forget, Chef will throw errors saying that a client or node with that ID already exists.)

To wipe out an instance, first determine the instance ID (which also serves as the node and client ID):

knife ec2 server list

Then delete the server instance, node, and client using knife:

INSTANCE=i-4e63f492
knife ec2 server delete $INSTANCE
knife node delete $INSTANCE
knife client delete $INSTANCE

Once provisioning has completed successfully, test the new instance:

SSH into the instance. You shouldn’t have to enter a username or password.
From within the instance, connect to MySQL with the username and password you defined in the encrypted databag.
Connect to the instance via HTTP and verify that you see the Apache “It worked!” message.

Provision the VM

Once everything works on the EC2 instance, try it with Vagrant. We’ll set the VM up as a development environment, in contrast to the production environment provisioned on EC2.

First create a directory on your workstation for your LAMP VM. This folder will be shared with the VM, and all of your web applications will live in subdirectories within it.

VMDIR=~/dev/lamp-vm
mkdir -p $VMDIR
cd $VMDIR

Next, create a $VMDIR/Vagrantfile, a config file that tells Vagrant how to create and provision the VM. Add the following Vagrantfile to the $VMDIR:

Vagrant::Config.run do |config| 
  config.vm.box = "precise64"
  config.vm.forward_port 80, 8080

  config.vm.customize [
    "modifyvm", :id,
    "--name", "LAMP VM",
    "--memory", "2048"
  ]

  config.vm.network :hostonly, "10.0.0.23"
  config.vm.host_name = "lamp-vm"
  config.vm.share_folder("v-root", "/home/vagrant/apps", ".", :nfs => true) 

  # Your organization name for hosted Chef 
  orgname = "CHANGE_THIS_TO_YOUR_HOSTED_CHEF_ORGNAME"

  # Set the Chef node ID based on environment variable NODE, if set. Otherwise default to vagrant-$USER
  node = ENV['NODE']
  node ||= "vagrant-#{ENV['USER']}"

  config.vm.provision :chef_client do |chef|
    chef.chef_server_url = "https://api.opscode.com/organizations/#{orgname}"
    chef.validation_key_path = "#{ENV['HOME']}/.chef/#{orgname}-validator.pem"
    chef.validation_client_name = "#{orgname}-validator"
    chef.encrypted_data_bag_secret_key_path = "#{ENV['HOME']}/.chef/encrypted_data_bag_secret"
    chef.node_name = "#{node}"
    chef.provisioning_path = "/etc/chef"
    chef.log_level = :debug
    #chef.log_level = :info

    chef.environment = "dev" 
    chef.add_role("base")
    chef.add_role("db_master")
    chef.add_role("webserver")

    #chef.json.merge!({ :mysql_password => "foo" }) # You can do this to override any default attributes for this node.
  end 
end

See the Vagrantfile documentation for details about the file format. Things to note about this Vagrantfile:

You should set orgname to the orgname you use in Hosted Chef.
The node must be unique among all nodes that use your Chef server. You can override it by exporting a $NODE environment variable, or you can accept the default “vagrant-$USER”.
This Vagrantfile uses NFS for shared folders–this is a useful configuration on a Mac or Linux host. Omit the , :nfs => true argument on a Windows host.
Don’t try to mount a shared directory on /home/vagrant –it will cause important configuration to be overwritten, such as the .ssh directory (preventing key-based ssh authentication).
You can change the amount of memory allocated to the VM with the config.vm.customize [ "--memory", 2048] setting (currently configured to allocate 2GB).
You must specify the db_master role before the webserver role if you want to use Chef to deploy the application as described in part 3 of this tutorial.

Next, provision the Vagrant VM:

cd $VMDIR
vagrant up

Or, to specify a custom NODE name:

NODE=my-lamp-vm vagrant up

If you need to tweak the Chef scripts and then re-provision over the top of the existing configuration:

cd $VMDIR
vagrant provision

To wipe it out and start over:

NODE=vagrant-$USER

cd $VMDIR
vagrant destroy
knife node delete $NODE
knife client delete $NODE

To test the provisioned LAMP stack in Vagrant:

SSH into the VM by running vagrant ssh from within the $VMDIR. (It should log you in with no username or password.)
Connect to MySQL as root using the password you configured in the dev environment.
Connect to http://localhost:8080 in your web browser and check for the Apache “It worked!” message.

Maintain, upgrade, and scale

In this section:

Running commands on groups of nodes with knife ssh

The knife ssh command lets you run a command over SSH on all nodes matching a particular search query. For example:

# Restart Apache on all webservers
knife ssh role:webserver 'service apache restart'

# Check the free disk space on all nodes
knife ssh name:* 'df -h'

Unfortunately, this doesn’t work properly on EC2 instances, because the hostname Chef knows about is only accessible from the internal EC2 network. So when working with EC2 instances, you need to specify that the public hostname should be used instead:

# Check uptime of all EC2 instances
knife ssh ec2:* -a ec2.public_hostname uptime

# Check uptime of all non-EC2 instances
knife ssh 'NOT ec2:*' uptime

It’s possible to patch the knife ssh script so this workaround isn’t necessary. See this knife SSH issue for details.

If you get errors connecting to your dev VM, make sure the hostname and IP address configured in the Vagrantfile are in your /etc/hosts file. For the Vagrantfile used in this tutorial, the line in the hosts file would be:

10.0.0.23 lamp-vm

Running chef-client periodically

You should periodically run chef-client on each node to keep it up to date. There are a few ways to do this:

Manually run chef-client on one node at a time, by SSHing into the node and running sudo chef-client
Manually run chef-client in batches, using knife ssh to run sudo chef-client, as described above.
Run chef-client as a daemon. See chef client documentation for details. You can set this up by adding the chef client cookbook to your “base” role.
Run chef-client as a cron job on each node.

Adding and upgrading community cookbooks

To add or upgrade a cookbook from the Chef community, just do the following:

knife cookbook site install $COOKBOOK
knife cookbook upload $COOKBOOK

If you get an error when uploading the cookbook, it may have installed additional dependencies that need to be uploaded first. The easiest way to deal with it is just to upload all the cookbooks in your local repository:

knife cookbook upload --all

Managing third-party cookbooks with knife-github-cookbooks

There are many more cookbooks available on Github than are available in the official Chef community repository. To manage these third-party Github cookbooks with knife, use the knife-github-cookbooks plugin.

First you need to install the plugin as a Ruby gem:

gem install knife-github-cookbooks

Installing third-party cookbooks from Github is very similar to installing them from the Chef community. For example, to install the msonnabaum/chef-phpunit cookbook from Github:

knife cookbook github install msonnabaum/chef-phpunit

This installs the cookbook in your local repo under cookbooks/phpunit. (A prefix or suffix of “chef” or “cookbook” is automatically stripped from the repo name.)

The same command will pull in any upstream changes if run again.

(Note: as of 16 Jun 2012 there’s a bug in the knife-github-cookbooks plugin due to use of a deactivated Github API, which results in “ERROR: The object you are looking for could not be found” when running this command. See this pull request for details. I had to manually patch it because the pull request hadn’t been accepted yet.)

To check what changes have been made to the third-party cookbook since you installed it:

knife cookbook github compare phpunit

See this blog post by Brian Racer on managing third-party Chef cookbooks with knife-github-cookbooks for details.

Changing passwords

To rotate passwords, just edit them in the encrypted data bag and then re-run chef client on the nodes.

See encrypted data bags above for an example.

Major upgrades

The safest way to do major upgrades is to try them out in a new EC2 instance and VM first, to make sure nothing breaks. You can pin cookbooks to a particular version in the environment files, to make sure updated cookbooks aren’t deployed until you’re ready for them. See the Chef environments documentation for details.

Scaling up

Though this tutorial installs the entire LAMP stack on a single server instance, the configuration can be adapted to multiple instances fairly easily. When the time comes to scale up your infrastructure, try this:

Use separate nodes for webserver and database. Put the webserver role on one node, and the db_master role on the other.
Add a load balancer role, and run the webserver on multiple nodes. The Chef article on LAMP stacks gives a decent example of how this can be done.
Add slave databases with MySQL replication. Create a new db_slave role, and a recipe for setting up slave databases. See the database cookbook, mysql master/slave recipes, and this article on Chef as systems integration tool for examples.

Next!

Continue with Part 3: Configuring and deploying a web application