Archive for April, 2010

Can we replace UPSs in our data centers?

Tuesday, April 27th, 2010

It has been common since I entered the data center realm 15 years ago that a data center had Uninterruptible Power Supplies (UPS) feeding all computer equipment or other critical loads. The UPS did two things: 1) kept the power flowing from batteries in the UPSs for a short duration until generators came on, utility power was restored computer equipment could be shut down; and 2) kept voltage and frequency stable for the computer load while the utility (or generator) power fluctuated, known as sags or surges. However, UPSs consume about 5-15% if the power entering them as losses in the units (a.k.a inefficiency). So if IT load equals 1 MW, UPS power will be about 1.1 MWs with the additional 100 kW lost as heat, which then requires additional cooling to keep at the roughly 75F temperature batteries and UPS run best. Here is a photo of some UPS systems:ups

Now, enter 2010. UPSs are still assumed by nearly every data center engineer and operator to be needed or required, yet, power electronics within the computer equipment can ride thru just about any voltage sag or surge a utility would pass on thru their protective equipment. Computer equipment power supplies have been rated for 100-240VAC and 50-60 Hertz for about 10 years now, so a far greater range than an utility will likely every pass on. Furthermore, due to capacitors in the power supplies, these devices can ride thru complete outages of about 15+ cycles, which is roughly 1/4 second. So the UPSs job is really now only to provide ride thru of outages over 1/4 second and until a generator comes on or as needed by the operation.

In many of the data center design charrettes that I have been part of over the last few years, we ask the users what really needs to be on UPS, avoiding the assumption that all computer load must be on UPS. Once we dive into the operations, we always come back with an answer from the data center operators that only a portion of the computer load needs to be on UPS and the rest can go down during a usually irregular utility outage. The reason is that these computers can stop operating for a few hours and not affect the business. Examples might be HR functions, crawlers, back up/long-term data storage, research computers, etc. Computers that might need to be on UPS include sales tools, accounting applications, short-term storage, email, etc. but not every application and function. Think about your own data center operations about what can go down every now and then from a utility outage (usually about once per year for a few hours) and see if you can reduce the total amount of UPS power you require and repurpose that expensive UPS capacity and energy loss to the critical functions.

Some data centers avoid UPSs entirely by putting a small battery on the computer itself, in widely publicized Google’s case, an inexpensive and readily available 9V battery. While this is an excellent idea for those that have custom computer hardware, it is not as easy to implement for most folks buying commodity servers today. Perhaps another idea better for the masses is to locate a capacitor on the computer board or within the server that can ride thru ~20+ seconds until generator(s) can supply load during a utility outage. Capacitor technology of today should make this fairly easy to implement and could be a standard feature on all computer equipment with a minimal added cost, much as the international power supplies did for us 10+ years ago and higher-efficiency power supplies (90+) are today. A great new technology that could make this easy to build on the computer board can be seen here:
http://newscenter.lbl.gov/feature-stories/2010/04/23/micro-supercapacitor/

Using a technology like this we could avoid UPSs entirely in our data centers by having enough ride thru built onto the computer boards, into the hardware, allowing us to save very expensive UPS power capacity, operating and maintenance expenses and space within our data centers for more important functions, compute and storage capacity. My thought for the day. Think about it and you might save some money and energy.

Economization and methods to data center efficiency, ASHRAAE 90.1 limits our options and efficiency

Thursday, April 15th, 2010

For years, the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) has had a standard for “recommended” and “allowable” humidity and temperature ranges within a data center. These ranges were way too limited for the past many generations of computer hardware that operates within our data centers. It is rumored by many folks that these standards were set by IBM back in the 1950’s to accommodate computer punch cards and keep them from getting “soft” with humidity and to keep arc flashing between exposed copper lines on computer boards, as well as to keep vacuum tubes cool to prevent them from burning out, literally. However, computer hardware has advanced much over the last 50 years, especially the last decade or two, and many called the old ASHRAE standards very out of date.

Finally, ASHRAE advanced the standard, which so many data center designers, operators and engineers relied on for operating specs of their data centers. Even though the new standards expanded the humidity and temperature ranges considered “allowable” within our data centers, the hardware specifications allow for even much broader ranges. While ASHRAE TC9.9 allows for up to about 80 degrees F (depending upon the humidity), hardware manufacturers spec their equipment to run up to 95F inlet air and humidity generally between 5-95%. The computer hardware of today lacks vacuum tubes and paper punch cards; circuit boards are coated to prevent arc flashing; everything is solid state except two moving parts: the harddrive, which are hermetically sealed, and cooling fans to draw air thru the server. (And each of these has a limited life, already with solid-state options or complete removal (fans)).

Many studies have been completed to prove that the hardware no longer needs tight temperature and humidity ranges, nor even filtration to prevent dust accumulation. Microsoft proved this point with what I call their data center in a tent experiment, in which they located a couple of racks under a tent canopy in Redmond, WA for many months. The servers sucked in leaves, ran when it rained, snowed, blew, was hot and cold, and even had water leak onto them, without any hardware failures, proving the point that servers really don’t need any environmental controls. A link to this study is here: http://blogs.msdn.com/the_power_of_software/archive/2008/09/19/intense-computing-or-in-tents-computing.aspx

I myself completed very low humidity (10-20%) and high inlet temperature (75-85F) testing, as well as unintentionally water leaks onto servers back in 2002-2003, all without any failures whatsoever.

Another bold move was by chip-maker Intel, in which they completed what I call their data center in a box test, but different than most containers, in which they put two containers, each loaded the same with 900 servers, in Santa Fe for a year. One had air conditioning and it’s inherent humidity control as well as filtration, the other, a fan to draw outside air in with minimal filtration and no air conditioning. What a great test! Santa Fe can get hot (92F) and cold (24F), being in the high desert, normally very dry, but also with sudden thunderstorms on hot summer afternoons that made humidity range from 4-90%. Intel found that the air economized container had a very visible layer of dust on the servers, and that even though temperature and humidity ranged dramatically during this year long study, the server failure rate was 4.46% versus 3.83% in their main air-conditioned data center. A failure rate of this low is ridiculous when it is economic today to refresh (replace) hardware at 18-24 month cycles and the failures seemed to coincide with dramatic humidity changes (10 to 90% in one hour) . This study proves that the hardware is very robust and capable of needing very little to any humidity, temperature and filtration control. Intel’s own engineers say their chips are good up to 135 degrees, Centigrade (275F)! This study is available from Intel, titled “Reducing Data Center Cost with an Air Economizer”, August 2008.

I very much commend the good folks at Microsoft and Intel for completing and publishing these studies. So, why does ASHRAE still limit temperature and humidity so much more tightly than necessary? To add to this, ASHRAE recently released 90.1, a data center energy efficiency standard. Well, I’ve been touting and pushing for data center energy efficiency since the late 90’s, and I have made it a keystone of my career for about a decade, so of all people, I fully support further improving data center energy efficiency. This is why I chair the SVLG data center demonstration efficiency program (http://dce.svlg.org). But what ASHRAE has done via this standard is require air-economization, one technology to improving the mechanical efficiency of our data centers, essentially picking one technology. It would be akin to saying that all homes must have fiberglass insulation instead of saying walls and ceilings that meet a specific R-value, which is really what we want. Let technology and the commercial market compete for the best solutions, not standards that require one technology.

My concern with this standard is not that I don’t want data centers to air-economize to reduce energy use, my qualm is that air-economization is only one method to improving energy efficiency and often not the best. For example, I have been involved with the design of several data centers over the last two years, many with the fine folks at Rumsey Engineers (ww.rumseyengineers.com) and all have achieved annual PUEs of 1.04-1.08, significantly better than most data centers. In all of these low PUE data centers, we studied air-economization, heat wheels, direct and indirect evaporative cooling, chilled-water plants, geothermal exchange, and many other methods to cool the data center. In all cases, hot or cold climates, dry or humid, high elevation or low elevation, we found that air-economization was about twice as energy intensive as other cooling methods, and more expensive to build and often less reliable. So ASHRAE 90.1 could actually INCREASE energy use in our data centers (and first time cost as well) instead of the intended affect of reducing it. Why would a standards group specify a technology to achieve efficiency. Why are the temperature and humidity standards a decade or more out of date with technology? We need impetus to be more efficient, but ability to innovate and challenge each other to be as efficient as we can using a variety of technologies, old and new. Limitations to one technology are out of date the minute they are published in the data center world. With amazing new technologies being released daily, and a generation of servers lasting less than two years, a standards body that takes years to update should not be prescribing any specific technology and allowing greater flexibility in operations and design, enabling us operators, designers and manufacturers to adapt to the most-efficient resource.

I congratulate those who have come out before me about this new standard and ask with them, that ASHRAE revise their thinking once again and provide more flexible standards, not limiting ones, which only hurt our energy efficiency achievements. Thank you for reading.