Drupal Site-Optimization Recipe from Lullabot

nirvan

We're finally nearing launch of our new Drupal 5 site! The last time we launched a new site, we landed on the home page of digg, thousands of folks visited, and our server crashed. This time, we're trying to be better prepared by optimizing our site performance before launch.

Optimizing Drupal is an art in itself. To get advice, we reached out to some experts in the field, and were fortunate to receive a consultation from Matt Westgate of Lullabot. Lullabot is among the world's top Drupal companies, and recently deployed the website for the Grammy's (grammys.com), which successfully handled over 280 million visitors on a single day! The Grammy's site was hosted on 50,000 optimized servers- which is no small challenge logistically. Needless to say, the Lullabot's know their stuff, and I took detailed notes.

Matt took a look under the hood of our new site, and shared a slick site-optimization recipe that will make 80% of the possible performance improvements. With Matt's permission, I'm going to share my notes here for any folks who may be looking to optimize a Drupal 5, or even a D6 site.

Here is a breakdown of my notes:

Module Accounting/Inventory

First, Matt took a look at the modules we're using.

We're using a PressFlow version of Drupal, which Matt said was a good choice; using PressFlow makes a lot of the server side site-optimization recipe tricks possible.

Matt then called out a few of the modules we are using (Buddylist, Workflow NG, and Organic Groups) as potentially performance hungry modules when it comes to scaling. It's not that there's necessarily anything wrong with the code for these modules; Organic Groups is a great module, for example, but the node-level access permissions checks it does adds another layer of queries over Drupal. Matt suggested the Flag Module (http://drupal.org/project/flag) as a more efficient replacement for Buddylist (if you don't need all the whizbang features buddylist provides). 

For us, it's too late in the game to change these modules. While this may cause us issues in the future, we will just have to see how it goes. But for folks still at the initial planning stage of your site, you may want to consider using the Flag Module in lieu of Buddylist, and really analyze whether your site is complex enough to need Workflow NG as opposed to the standard Workflow module.

Matt also suggested replacing Drupal's native Search with either APACHE SOLR or Acquia search. The search function that comes with out-of-the-box Drupal is resource intensive, and it's better to offload search entirely to an enterprise search solution like the ones mentioned above. We opted to use Solr search. This meant installing TomCat and JRE onto our server. Pair hooked that up for us.

Matt also did a Devel Module/Query Run and we made noted items that listed above 50_action_get_hook_aids - sure enough, the buddylist was making a lot of calls. Matt suggested we check profile pages for Buddylist Queries and turn on the slow query log in MySql.

With an eye for detail, Matt also noticed our Cron set-up had been turned off. For the record, CRON should be set to run once every 15 minutes.

Now on to the juicy stuff. 

THE 80% WIN LULLABOT SITE-OPTIMIZATION RECIPE 

What follows is a checklist to optimize a Drupal 5 or D6 site. By following this list, you will have made about 80% of the possible performance improvements for your site. Matt's mantra for this recipe is to "Protect the Database" at all costs.

Checklist:

1. Optimize Each layer of STACK
2. In the Drupal Admin Settings:
  - Enable Caching
  - Enable CSS Compression and Aggregation
  - Go to MySQL and Enable Query cache in MySql
  - Go to PHP and Add a PHP OpCode Cache (APC)

3. Caching ANONYMOUS users

  - Set up a Reverse Proxy Cache (this keeps pages in memory and serves anonymous visitors cache page through RAM instead of dynamically via the database). For Pressflow 6 & Drupal 6 sites, set up the Reverse Proxy Cache using VARNISH (http://varnish-cache.org). However, for a Drupal 5 site like ours, it's harder to setup a Reverse Proxy Cache, so instead of Varnish, Matt instead recommended using the Boost module (http://drupal.org/project/boost)

A good cache lifetime is 5 minutes, or 12 times an hour.

4. Caching AUTHENTICATED users
  - use Memcache (http://drupal.org/project/memcache and http://memcached.org) along with Pathcache (http://drupal.org/project/pathcache) for drupal lookup path queries.
This puts most authenticated user queries into memory (RAM)

--

5. Have at least 2 Servers (if possible)

Best practice for scaling is to have at least 2 servers; one dedicated server for content, and a seperate server just for the Database. Then optimize each server for its specific task. Ideally, both are powerful servers, but if you have to choose, give the Database more resources.

-----

* Getting Fancier -> set up CDNs

CDNs (Content Delivery Networks) are a network of servers distributed across multiple locations that help deliver content more efficiently. We didn't get into CDNs as we don't currently have the resources- but apparently that is the next step.

---

Throttle Module Question
We asked Matt about using Drupal's Throttle Module. Matt pointed out that the Throttle Module kind of works, but has a bad user experience as you disable functionality of your site in order to scale it. The other downside is modules essentially need to be written to be throttle module aware. The above recipe should handle the majority of performance issues.

---

So there you have it. A delicious recipe for cooking up a fast Drupal site.

 

Big thanks to Matt and the Lullabots for the advice, pair Networks for donating our hosting, our developers at Ciplex for doing all the heavy lifting, and, of course, all of our Producers for chipping in to make this possible.

As for our site launch, it's getting very close! Our developers estimate 3-5 days to implement the site-optimization recipe. As soon as that is done, we will do our final data import and can then launch. This means we could launch within a week...

I should mention that even with these site-optimizations, enough traffic will still take our site down. Unlike the Grammy's 50,000 servers, we only have 1 (and we're stoked to have that!). We're working within a non-profit budget micro-funded by public donations. Our goal here is simply to make the most of what we have.  

Can't wait to share it with all of you soon!