Posts Tagged ‘Data Center’

The Data Center Vibratation Penalty to Storage Performance

Thursday, June 10th, 2010

Every now and then a really great way to reduce energy use comes along that is so simple we all whack our head wondering, “why didn’t I think of that!” My principles of achieving ultra-efficient data centers (PUEs between 1.03-1.08; I call anything less than 1.10 ultra-efficient) are based upon simplicity and a holistic approach while meeting the need not the want or convention. Generally the simpler the better, as simple is always lower cost up front and ongoing, as well as easier to maintain, more reliable and more efficient.

So, here is one that will not catch you by surprise: a rack that saves energy. We’ve all heard of passive and active cooling racks: those with fans, heat exchangers or direct cooling systems. I explored some of the front & rear door heat exchanger racks back in 2003, which work really well for high-density applications but can be very expensive compared to better-designed data center cooling systems. But how about a rack that not only reduces energy costs but also improves hardware performance?

I’ve had the pleasure of exploring with Green Platform’s CEO, Gus Malek-Madani, their anti-vibration rack (“AVR”), a carbon-fiber composite rack actually designed to remove vibration. Why remove vibration? Green Platform claims that a typical datacenter experiences vibration levels of around 0.2 G-Root Mean Square (GRMS); this, it claims can degrade a disk drive’s performance (both I/O and throughput) by up to 66%; a fact that was borne out during a ‘rigorous’ testing exercise it did in conjunction with Sun Microsystems.

As harddrives get “larger” in capacity, bits get crammed into a smaller space. This along with smaller drives force tolerances between rotating platters and the movement of mechanical actuator arms within the drives to get tighter, and thus, vibration causes drives to slow down or have higher mis-read & writes, slowing down I/O performance. “As a result of this ‘vibration penalty,’ the company believes that up to a third of all US datacenter spending – on both hardware and power – is wasted on vibration, amounting to some $32bn of wastage. The company also says there’s evidence that reducing the impact of vibration will serve to improve the reliability of drives (and improving mean time between failure.)”

In order to back up this figure, early tests with Sun Microsystems (pre-Oracle) and Q Associates (“Effects of Data Center Vibration on Compute System Performance” by Julian Turner) showed IOPS improvement of up to 247% in random I/O. The following chart shows this storage performance degradation:



You can also watch the following video that clearly shows that just yelling into the face of storage hardware causes a very visible degradation of storage performance: http://www.youtube.com/watch?v=tDacjrSCeq4

If the vibration from yelling into a rack causes performance degradation, think about the vibration affects from HVAC systems, thousands of server fans, and even walking thru your data center.

The company says its carbon-composite design massively reduces the vibrations that can cripple hard disk drive performance, boosting performance, efficiency and even reliability. From the results of these tests stated above, they assume that most folks should see a 100% improvement in storage throughput, 50% shorter job times and consequently, 50% less power consumed per job. In testing with Sun Microsystems the AVP dissipated vibration by a factor of 10x to 1000x. In further testing with systems integrator Q-Associates, which pitted the AVP against a regular steel rack – it found that random read IOPS increased by between 56% and 246%, with random write IOPS showing a 34% to 88% improvement with the AVP.  

“The throughput and I/O rate of storage remains a significant performance bottleneck within the datacenter; though hard disk drive (HDD) capacities have increased by several orders of magnitude in the last 30 years, drive performance has improved by a much smaller factor. This issue is exacerbated by the fact that server performance, driven by Moore’s law, has increased massively, to the extent that there’s now a server-storage performance gap. The way most datacenters engineer around this problem is inefficient; typically workloads are striped across many disks in parallel, and disks are ‘short stroked’ — i.e. data is only written to the edge of platters – in order to minimize latency. Although this does address performance, the trade-off is that disk capacity is massively underutilized, wasting datacenter space and energy, not to mention the cost of reliably maintaining an unnecessarily large disk estate.”

In the many data centers I have had the pleasure of working in lately, storage is growing faster than server capacity and the greatest performance limitation is storage throughput. This product works for the high-end video/audio and scientific markets; a niche space where another of Malek-Madani’s company– Composite Products, LLC — is focused. The test results clearly show storage throughput dramatically improved by reducing vibration at the rack of storage hardware. With some 3 million storage racks currently in use inside datacenters worldwide, and growing by the second to probably eventually exceed server racks, this is a very large opportunity to improve performance while reducing energy use, always one of my main mantras. Green Platform expects to have their racks as an option from storage vendors, NetApp, EMC, and others, so that as you purchase and provision new storage systems, you pay a small incremental increase in price of the storage system for a very large improvement in performance and energy reduction. Think of all of those servers waiting so much less for data throughput and how much that can improve the utilization of those systems? Think about it.

I’m looking to conduct an end-user test with their rack; contact me if you’re interested so we can determine results for your organization.

Can we replace UPSs in our data centers?

Tuesday, April 27th, 2010

It has been common since I entered the data center realm 15 years ago that a data center had Uninterruptible Power Supplies (UPS) feeding all computer equipment or other critical loads. The UPS did two things: 1) kept the power flowing from batteries in the UPSs for a short duration until generators came on, utility power was restored computer equipment could be shut down; and 2) kept voltage and frequency stable for the computer load while the utility (or generator) power fluctuated, known as sags or surges. However, UPSs consume about 5-15% if the power entering them as losses in the units (a.k.a inefficiency). So if IT load equals 1 MW, UPS power will be about 1.1 MWs with the additional 100 kW lost as heat, which then requires additional cooling to keep at the roughly 75F temperature batteries and UPS run best. Here is a photo of some UPS systems:ups

Now, enter 2010. UPSs are still assumed by nearly every data center engineer and operator to be needed or required, yet, power electronics within the computer equipment can ride thru just about any voltage sag or surge a utility would pass on thru their protective equipment. Computer equipment power supplies have been rated for 100-240VAC and 50-60 Hertz for about 10 years now, so a far greater range than an utility will likely every pass on. Furthermore, due to capacitors in the power supplies, these devices can ride thru complete outages of about 15+ cycles, which is roughly 1/4 second. So the UPSs job is really now only to provide ride thru of outages over 1/4 second and until a generator comes on or as needed by the operation.

In many of the data center design charrettes that I have been part of over the last few years, we ask the users what really needs to be on UPS, avoiding the assumption that all computer load must be on UPS. Once we dive into the operations, we always come back with an answer from the data center operators that only a portion of the computer load needs to be on UPS and the rest can go down during a usually irregular utility outage. The reason is that these computers can stop operating for a few hours and not affect the business. Examples might be HR functions, crawlers, back up/long-term data storage, research computers, etc. Computers that might need to be on UPS include sales tools, accounting applications, short-term storage, email, etc. but not every application and function. Think about your own data center operations about what can go down every now and then from a utility outage (usually about once per year for a few hours) and see if you can reduce the total amount of UPS power you require and repurpose that expensive UPS capacity and energy loss to the critical functions.

Some data centers avoid UPSs entirely by putting a small battery on the computer itself, in widely publicized Google’s case, an inexpensive and readily available 9V battery. While this is an excellent idea for those that have custom computer hardware, it is not as easy to implement for most folks buying commodity servers today. Perhaps another idea better for the masses is to locate a capacitor on the computer board or within the server that can ride thru ~20+ seconds until generator(s) can supply load during a utility outage. Capacitor technology of today should make this fairly easy to implement and could be a standard feature on all computer equipment with a minimal added cost, much as the international power supplies did for us 10+ years ago and higher-efficiency power supplies (90+) are today. A great new technology that could make this easy to build on the computer board can be seen here:
http://newscenter.lbl.gov/feature-stories/2010/04/23/micro-supercapacitor/

Using a technology like this we could avoid UPSs entirely in our data centers by having enough ride thru built onto the computer boards, into the hardware, allowing us to save very expensive UPS power capacity, operating and maintenance expenses and space within our data centers for more important functions, compute and storage capacity. My thought for the day. Think about it and you might save some money and energy.

SVLG Data Center Summit a GREAT Success

Friday, October 16th, 2009

Yesterday, October 15th, after a culmination of a year of work from over 60 people, the SVLG Data Center Energy Efficiency Summit went off smoothly. We had 44 presenters, 24 case case studies presented and about 450 people at the summit. The event was hosted by NetApp in Sunnyvale. Representatives from numerous Silicon Valley elites, start ups, VCs, organizations, and solution companies were present. All case studies were presented from data center end-users, showing what they are doing to reduce energy use in their data centers. We had brief sessions about cloud, carbon reductions, notable sessions called the Chill Off 2, in which various cooling technologies were tested with real load in a real data center, also testing the systems at various temperature ranges. Andrew Fanara with EPA gave a quick update of EnergyStar for servers, storage and networking gear. Paul Scheihing with DOE provided an update of the energy efficiency and data center programs. I had a candid interview with California Energy Commission Commissioner and friend Jeff Byron about California’s energy policy, net-zero energy buildings requirement, renewable portfolio standards, energy efficiency standards for TVs and other consumable devices, etc. It was fun!

The day went off without a hitch, thanks to Ralph Renne’s leadership with host NetApp, which provided a wonderful venue and support, with a team of probably 20-30 people directly supporting the event all day, along with a nice keynote from their CEO, Tom Georgens.

Numerous cooling case studies were presented, the Stack Framework which is a method of measuring complete data center load and efficiency, we had a CIO roundtable during lunch, Mark Bramfitt with PG&E talked about advancing PG&E’s compliment of data center energy efficiency incentives as well as his planned departure from PG&E, which caught many in the audience by surprise and left not only tears in Mark’s eyes, but also many in the audience with kind comments from Deborah Grove and others during the Q&A session about his excellent work over many years advancing these programs.

We had a great case study from Intel and NetApp about using on-board hardware temperature sensors to control HVAC equipment, which had some very compelling results within their data centers. Also Cytak presented about turning off computers automatically while not in use using Power Assure’s product.

Overall, a wonderful event. It was great to see so many industry friends and to make new friends. As the co-chair of the program and summit, it was great to see so many people interacting with each other, beginning collaborations stimulated from the excellent case studies presented, which is what the program is all about: Innovation through collaboration. Together we all benefit when we share with each other, and consequently, we as an industry then improves. It was wonderful to see every presenter do a fantastic job showing off their wonderful case studies. No vendors showing off their product, instead, everyone sharing information.

The cocktail reception at the end of the day drew about 200 people that wanted to stay and chat, make friends, and just have fun. So many thanks go out to my committee, which brought the case studies and presentations to us, which includes but not limited to: Bill Tschudi, Bob Hines, Bruce Myatt, Dale Sartor, David Mastrandrea, Deborah Grove, James Bickford, Kelly Aaron, Mukesh Khattar, Patricia Nealon, Ralph Renne, Rosemary Scher, Tersa Tung, and Zen Kishimoto; to Ray Pfeifer, my program co-chair, who brought this program to us last year and so many of the case studies again this year, and his leadership to keep this program about the end-user; to LBNL, CEC, CIEE and PG&E for helping to fund case studies and support the program; to the many sponsors of the summit. And certainly to SVLG for their staff to help make this summit a reality, and most certainly also their lead person, Bob Hines, for his drive and energy. Overall, an excellent day, full of wonderful people, making new and great little discoveries with each other to advance the energy efficiency and financial success of our businesses, and helping to lead the data center industry to greater success.

See me about joining this wonderful program. We are always looking for more case studies and about hosting next year’s summit.

Is it possible, a data center PUE of 1.04, today?

Saturday, August 22nd, 2009

I’ve been involved in the design and development of over $6 billion of data centers, maybe about $10 billion now, I lost count after $5 billion a few years ago, so I’ve seen a few things. One thing I do see in the data center industry is more or less, the same design over and over again. Yes, we push the envelope as an industry, yes, we do design some pretty cool stuff but rarely do we sit down with our client, the end-user, and ask them what they really need. They often tell us a certain Tier level, or availability they want, and the MWs of IT load to support, but what do they really need? Often everyone in the design charrette assumes what a data center should look like without really diving deep into what is important.

When we do that, we can get some very interesting results. For example, I’ve been fortunate to have been involved with the design of three data centers this year and all three we were able to push the envelope of design and ask some of these difficult questions. Rarely did I get the answers from the end-users I wanted to hear, where they really questioned the traditional thinking and what a data center should be and why, but we did get to some unconventional conclusions about what they needed instead of automatically assuming what they needed or wanted. As a consequence, we designed three data centers with low PUEs, or even what I like to call “ultra-low PUEs“, those below 1.10. The first was at 1.08, the next at 1.06 and now we have a 1.046, OK, let’s call it 1.05 since the other two are rounded up as well. (We know we can get that one down to about 1.04 with a few more tweaks to that “what is really needed” question.)

Now, I figured that a PUE of 1.05 was going to take a few years to get to because the hardware needed to improve, i.e. chillers, UPS, transformers, etc. But what I didn’t take into account was that when we really look at what the client needs, not wants, and what we can do to design for efficiency without jumping to the same old way of designing a data center, we can reach some great results. I assume that this principal can apply to almost anything in life.

Now, you ask, how did we get to a PUE of 1.05? Let me hopefully answer a few of your questions: 1) yes, based on annual hourly site weather data; 2) all three have densities of 400-500 watts/sf; 3) all three are roughly Tier III to Tier III+, so all have roughly N+1 (I explain a little more below); 4) all three are in climates that exceed 90F in summer; 5) none use a body of water to transfer heat (i.e. lake, river, etc); 6) all are roughly 10 MWs of IT load, so pretty normal size; 7) all operate within TC9.9 recommended ranges except for a few hours a year within the  allowable range; and most importantly, 8) all have construction budgets equal to or LESS than standard data center construction. Oh, and one more thing: even though each of these sites have some renewable energy generation, this is not counted in the PUE to reduce it; I don’t believe that is in the spirit of the metric.

Now, for some of the juicy details (email or call me for more or read future blog posts). We questioned what they thought a data center should be: how much redundancy did they really need? Could we exceed ASHRAE TC9.9 recommended or even allowable ranges? Did all the IT load really NEED to be on UPS? Was N+1 really needed during the few peak hours a year or could we get by with just N during those few peak hours each year and N+1 the rest of the year?, etc. The main point of this blog post is to say that low PUEs, like that of 1.05, can be achieved, yes, been there and done that now, for the same cost or LESS than a standard design, and done TODAY, saving millions of dollars per year in energy, millions of tons of CO2, millions of dollars of capital cost up front, less maintenance, etc. We just need to really dive deep as to what we need, not what we want or think we need, and we’ll be better at achieving great things. Now, I need to apply this concept to other parts of my life; how about you?