Archive for June, 2010

PUE lives on with Revised Metric

Monday, June 21st, 2010

When folks from the data center industry got together about 3-5 years ago to create a data center efficiency metric, we knew that we should tie it to the actual work being created within the data center (i.e. transactions per watt, IOPs/watt, FLOPs/watt…). However, every data center and even more so, every computer has a different work being completed and thus metric to be applied. For example, a science research computer might complete one transaction per month with a lot of network and storage traffic for that one big “transaction”, while an eBay data center might have 1,000’s of transactions per second for one computer system.

So, we came up with a compromise, knowing that all data centers and their workloads were different, yet needing something to push us as an industry to higher efficiency. Well, the holly grail of data center metrics got released….P U E. Yes, Power Usage Effectiveness. While it was only a start, and a best compromise, and we knew we needed to improve upon it or come up with something better, yet is has had perhaps more influence on energy efficiency of our data centers than any other metric or industry movement.

While improving PUE only affects the infrastructure side of the data center, not the hardware or software–leaving that always equal to 1.0 with everything else being above one–more power use the higher the PUE. Our data centers have been averaging above 2.0 (meaning that at 2.0 the infrastructure power load is equal to the server load, higher than 2 means it uses more power than the server load). A recent report from EPA of about 200 data centers last year across the US shows that we are averaging north of 2.0. Other studies show that we had been averaging around 3.0 worldwide, so we have improved greatly but still can improve so much more. While 2.0 is much better than 3.0, using 50% less power for the infrastructure, we know we should be able to achieve PUEs of at most 1.5 any where in the world, any TIER level, yet at 2.0 we are using more than double the power we need to support the non-hardware loads.

One problem with the PUE metric is that it is instantaneous since it measures power and not energy and all data center power usage fluctuates with weather and usage, and really what we care about to reduce costs is total energy use over a period of time. Energy is power use over time, while power is instantaneous. Otherwise, PUE could be measured on the coldest day of the year when all systems are running more efficiently but that is not a good gauge of annual energy use and thus costs. In all of the projects I get involved with, and all of the PUEs I quote, I use total annual energy instead of one-time power measurement, and hardware measured at the rack to account within PUE UPS, PDU and other electrical distribution losses. So, PUE should be an annual average, and that is exactly what member representatives of Green Grid, SVLG, 7×24, EPA, DOE, USGBC, ASHRAE and UpTime recommended in December of this year at a meeting in DC. I provided recommendations from the SVLG along with Chris Page, Scott Noteboom and Tim Crawford representing the SVLG at this meeting with input from Olivie Sanche of Apple and many others.

Essentially the outcome was a revised PUE metric that now measures annual usage of infrastructure and IT load, which is fantastic! Also, a little more clarity or definition on how it should be measured and what should and should not be included. (Such as on-site power generation should never reduce one’s PUE, as energy in is energy in, regardless of source.) We’ll soon see PUE and PUE subscript 1, 2 & 3. These clarify where the server load was measured (UPS output, PDU or rack). Ideally, we’d all be measuring at the rack input, but many folks do not have this meter & monitoring capability, so the compromise was to allow for some acceptance of any of these points of measurement.

Even though the location of measurement will affect the measured PUE–meaning different measuring locations will result in different PUEs for the same data center–at least it’s an improvement, and will hopefully drive folks to measure at the rack–the most accurate location of measurement. It will also drive us to think about annual usage and costs, not one time or instantaneous, another big improvement in our thoughts about all buying and operational decisions. These are the key to improving efficiency and reducing costs: long-term measurement, long-term constant improvements, and buying decisions based on long-term economic analysis.

With our new PUE metric, it should re-invigorate the PUE discussions, comparisons, and improvements. Perhaps driving us all to lower PUEs, regardless of actual resulting PUE and type of data center. After all, we all gain when we each improve.

Goat Power, Forward prices of electricity, actual needs and the Green Data Center Conference

Thursday, June 17th, 2010

For years I’ve spoken about the forward cost of energy due to increasing energy costs and climate legislation, which will affect the costs of some forms of energy generation. One of the key things I look for when I complete site selections for clients is the “forward” cost of electricity, which can often be much higher in a net-present value term than another site even though lower today than the same comparable site. This is because predictions of the increasing prices of electricity by market vary depending upon legislation, regulation, emissions requirements, and fuel prices. Since every utility has a difference mix of fuel sources, and each state has a different utility regulator, as well as different debt obligations, cost recovery and other factors, future utility prices will vary quite a bit. I believe that utilities with high carbon intensity or other emissions from their power generation will see higher price increases compared to utilities with lower carbon intensity per kWh. We’ve seen this in Northern Virginia with electricity prices increasing significantly over the last several years. I think we’ll see the same for other high-carbon states and those specifically with high coal production, such as North Carolina, Colorado, Texas, etc. Consider the forward price of electricity instead of just the current price of electricity into your site selection analysis.

And speaking of low-carbon intensity, Yahoo’s Quincy data center, which I led the completion of the first phase of construction before starting MegaWatt Consulting, recently released a new low-carbon option in managing our data centers: Goat Power. Enjoy this short video with my Yahoo friends Chris, Lisa and Ty and some new goat friends as well. Looks like one of the goats was particularly fond of Chris as well, or at least her shirt.

I spoke at the Green Data Center Conference in San Diego over the last three days. I taught a three-hour energy efficient data center workshop, also a one hour session about energy options and efficiency ideas for data centers, and also joined in on three panels: energy sources moderated by John Diamond, Organizations and Associations moderated by Bruce Myatt while I talked about the SVLG Data Center Efficiency program I co-chair and the McGill University high-performance co-location data center project case study that I helped with site selection and design ideas with Rumsey Engineers. Eric Soladay with Rumsey Engineers did a great job presenting the efficient data center design, with designed annual PUE of 1.06, and the very interesting snow-field concept for cooling this high-density data center without chillers or any other compressor-based cooling through 90% humidity and 90F summer time weather.

After 10+ years of talking about data center energy efficiency being important and concepts to improve the energy efficiency in data centers, as well as sharing my own experiences, I am so glad to hear that these ideas are sticking as well as the importance of energy efficiency. I was even more proud to that many of the ideas I have been pushing for the last several years as well as terms that I believe I coined nearly 10 years ago are sticking and being used in the regular vernacular of the industry: Holistic (designing and operating data centers as an entire system of hardware, software and infrastructure to achieve lowest total cost and highest availability for the intended purpose) and server hugger (the “need” (aka want) to have one’s data center and/or servers located nearby, often an emotional response and not a technical or rational need.)

Remember to look at your specific needs and also be creative with carbon reductions, like how you cut your data center grass!

The Data Center Vibratation Penalty to Storage Performance

Thursday, June 10th, 2010

Every now and then a really great way to reduce energy use comes along that is so simple we all whack our head wondering, “why didn’t I think of that!” My principles of achieving ultra-efficient data centers (PUEs between 1.03-1.08; I call anything less than 1.10 ultra-efficient) are based upon simplicity and a holistic approach while meeting the need not the want or convention. Generally the simpler the better, as simple is always lower cost up front and ongoing, as well as easier to maintain, more reliable and more efficient.

So, here is one that will not catch you by surprise: a rack that saves energy. We’ve all heard of passive and active cooling racks: those with fans, heat exchangers or direct cooling systems. I explored some of the front & rear door heat exchanger racks back in 2003, which work really well for high-density applications but can be very expensive compared to better-designed data center cooling systems. But how about a rack that not only reduces energy costs but also improves hardware performance?

I’ve had the pleasure of exploring with Green Platform’s CEO, Gus Malek-Madani, their anti-vibration rack (“AVR”), a carbon-fiber composite rack actually designed to remove vibration. Why remove vibration? Green Platform claims that a typical datacenter experiences vibration levels of around 0.2 G-Root Mean Square (GRMS); this, it claims can degrade a disk drive’s performance (both I/O and throughput) by up to 66%; a fact that was borne out during a ‘rigorous’ testing exercise it did in conjunction with Sun Microsystems.

As harddrives get “larger” in capacity, bits get crammed into a smaller space. This along with smaller drives force tolerances between rotating platters and the movement of mechanical actuator arms within the drives to get tighter, and thus, vibration causes drives to slow down or have higher mis-read & writes, slowing down I/O performance. “As a result of this ‘vibration penalty,’ the company believes that up to a third of all US datacenter spending – on both hardware and power – is wasted on vibration, amounting to some $32bn of wastage. The company also says there’s evidence that reducing the impact of vibration will serve to improve the reliability of drives (and improving mean time between failure.)”

In order to back up this figure, early tests with Sun Microsystems (pre-Oracle) and Q Associates (“Effects of Data Center Vibration on Compute System Performance” by Julian Turner) showed IOPS improvement of up to 247% in random I/O. The following chart shows this storage performance degradation:



You can also watch the following video that clearly shows that just yelling into the face of storage hardware causes a very visible degradation of storage performance: http://www.youtube.com/watch?v=tDacjrSCeq4

If the vibration from yelling into a rack causes performance degradation, think about the vibration affects from HVAC systems, thousands of server fans, and even walking thru your data center.

The company says its carbon-composite design massively reduces the vibrations that can cripple hard disk drive performance, boosting performance, efficiency and even reliability. From the results of these tests stated above, they assume that most folks should see a 100% improvement in storage throughput, 50% shorter job times and consequently, 50% less power consumed per job. In testing with Sun Microsystems the AVP dissipated vibration by a factor of 10x to 1000x. In further testing with systems integrator Q-Associates, which pitted the AVP against a regular steel rack – it found that random read IOPS increased by between 56% and 246%, with random write IOPS showing a 34% to 88% improvement with the AVP.  

“The throughput and I/O rate of storage remains a significant performance bottleneck within the datacenter; though hard disk drive (HDD) capacities have increased by several orders of magnitude in the last 30 years, drive performance has improved by a much smaller factor. This issue is exacerbated by the fact that server performance, driven by Moore’s law, has increased massively, to the extent that there’s now a server-storage performance gap. The way most datacenters engineer around this problem is inefficient; typically workloads are striped across many disks in parallel, and disks are ‘short stroked’ — i.e. data is only written to the edge of platters – in order to minimize latency. Although this does address performance, the trade-off is that disk capacity is massively underutilized, wasting datacenter space and energy, not to mention the cost of reliably maintaining an unnecessarily large disk estate.”

In the many data centers I have had the pleasure of working in lately, storage is growing faster than server capacity and the greatest performance limitation is storage throughput. This product works for the high-end video/audio and scientific markets; a niche space where another of Malek-Madani’s company– Composite Products, LLC — is focused. The test results clearly show storage throughput dramatically improved by reducing vibration at the rack of storage hardware. With some 3 million storage racks currently in use inside datacenters worldwide, and growing by the second to probably eventually exceed server racks, this is a very large opportunity to improve performance while reducing energy use, always one of my main mantras. Green Platform expects to have their racks as an option from storage vendors, NetApp, EMC, and others, so that as you purchase and provision new storage systems, you pay a small incremental increase in price of the storage system for a very large improvement in performance and energy reduction. Think of all of those servers waiting so much less for data throughput and how much that can improve the utilization of those systems? Think about it.

I’m looking to conduct an end-user test with their rack; contact me if you’re interested so we can determine results for your organization.