Posts Tagged ‘data center efficiency’

Economization and methods to data center efficiency, ASHRAAE 90.1 limits our options and efficiency

Thursday, April 15th, 2010

For years, the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) has had a standard for “recommended” and “allowable” humidity and temperature ranges within a data center. These ranges were way too limited for the past many generations of computer hardware that operates within our data centers. It is rumored by many folks that these standards were set by IBM back in the 1950’s to accommodate computer punch cards and keep them from getting “soft” with humidity and to keep arc flashing between exposed copper lines on computer boards, as well as to keep vacuum tubes cool to prevent them from burning out, literally. However, computer hardware has advanced much over the last 50 years, especially the last decade or two, and many called the old ASHRAE standards very out of date.

Finally, ASHRAE advanced the standard, which so many data center designers, operators and engineers relied on for operating specs of their data centers. Even though the new standards expanded the humidity and temperature ranges considered “allowable” within our data centers, the hardware specifications allow for even much broader ranges. While ASHRAE TC9.9 allows for up to about 80 degrees F (depending upon the humidity), hardware manufacturers spec their equipment to run up to 95F inlet air and humidity generally between 5-95%. The computer hardware of today lacks vacuum tubes and paper punch cards; circuit boards are coated to prevent arc flashing; everything is solid state except two moving parts: the harddrive, which are hermetically sealed, and cooling fans to draw air thru the server. (And each of these has a limited life, already with solid-state options or complete removal (fans)).

Many studies have been completed to prove that the hardware no longer needs tight temperature and humidity ranges, nor even filtration to prevent dust accumulation. Microsoft proved this point with what I call their data center in a tent experiment, in which they located a couple of racks under a tent canopy in Redmond, WA for many months. The servers sucked in leaves, ran when it rained, snowed, blew, was hot and cold, and even had water leak onto them, without any hardware failures, proving the point that servers really don’t need any environmental controls. A link to this study is here: http://blogs.msdn.com/the_power_of_software/archive/2008/09/19/intense-computing-or-in-tents-computing.aspx

I myself completed very low humidity (10-20%) and high inlet temperature (75-85F) testing, as well as unintentionally water leaks onto servers back in 2002-2003, all without any failures whatsoever.

Another bold move was by chip-maker Intel, in which they completed what I call their data center in a box test, but different than most containers, in which they put two containers, each loaded the same with 900 servers, in Santa Fe for a year. One had air conditioning and it’s inherent humidity control as well as filtration, the other, a fan to draw outside air in with minimal filtration and no air conditioning. What a great test! Santa Fe can get hot (92F) and cold (24F), being in the high desert, normally very dry, but also with sudden thunderstorms on hot summer afternoons that made humidity range from 4-90%. Intel found that the air economized container had a very visible layer of dust on the servers, and that even though temperature and humidity ranged dramatically during this year long study, the server failure rate was 4.46% versus 3.83% in their main air-conditioned data center. A failure rate of this low is ridiculous when it is economic today to refresh (replace) hardware at 18-24 month cycles and the failures seemed to coincide with dramatic humidity changes (10 to 90% in one hour) . This study proves that the hardware is very robust and capable of needing very little to any humidity, temperature and filtration control. Intel’s own engineers say their chips are good up to 135 degrees, Centigrade (275F)! This study is available from Intel, titled “Reducing Data Center Cost with an Air Economizer”, August 2008.

I very much commend the good folks at Microsoft and Intel for completing and publishing these studies. So, why does ASHRAE still limit temperature and humidity so much more tightly than necessary? To add to this, ASHRAE recently released 90.1, a data center energy efficiency standard. Well, I’ve been touting and pushing for data center energy efficiency since the late 90’s, and I have made it a keystone of my career for about a decade, so of all people, I fully support further improving data center energy efficiency. This is why I chair the SVLG data center demonstration efficiency program (http://dce.svlg.org). But what ASHRAE has done via this standard is require air-economization, one technology to improving the mechanical efficiency of our data centers, essentially picking one technology. It would be akin to saying that all homes must have fiberglass insulation instead of saying walls and ceilings that meet a specific R-value, which is really what we want. Let technology and the commercial market compete for the best solutions, not standards that require one technology.

My concern with this standard is not that I don’t want data centers to air-economize to reduce energy use, my qualm is that air-economization is only one method to improving energy efficiency and often not the best. For example, I have been involved with the design of several data centers over the last two years, many with the fine folks at Rumsey Engineers (ww.rumseyengineers.com) and all have achieved annual PUEs of 1.04-1.08, significantly better than most data centers. In all of these low PUE data centers, we studied air-economization, heat wheels, direct and indirect evaporative cooling, chilled-water plants, geothermal exchange, and many other methods to cool the data center. In all cases, hot or cold climates, dry or humid, high elevation or low elevation, we found that air-economization was about twice as energy intensive as other cooling methods, and more expensive to build and often less reliable. So ASHRAE 90.1 could actually INCREASE energy use in our data centers (and first time cost as well) instead of the intended affect of reducing it. Why would a standards group specify a technology to achieve efficiency. Why are the temperature and humidity standards a decade or more out of date with technology? We need impetus to be more efficient, but ability to innovate and challenge each other to be as efficient as we can using a variety of technologies, old and new. Limitations to one technology are out of date the minute they are published in the data center world. With amazing new technologies being released daily, and a generation of servers lasting less than two years, a standards body that takes years to update should not be prescribing any specific technology and allowing greater flexibility in operations and design, enabling us operators, designers and manufacturers to adapt to the most-efficient resource.

I congratulate those who have come out before me about this new standard and ask with them, that ASHRAE revise their thinking once again and provide more flexible standards, not limiting ones, which only hurt our energy efficiency achievements. Thank you for reading.

Is it possible, a data center PUE of 1.04, today?

Saturday, August 22nd, 2009

I’ve been involved in the design and development of over $6 billion of data centers, maybe about $10 billion now, I lost count after $5 billion a few years ago, so I’ve seen a few things. One thing I do see in the data center industry is more or less, the same design over and over again. Yes, we push the envelope as an industry, yes, we do design some pretty cool stuff but rarely do we sit down with our client, the end-user, and ask them what they really need. They often tell us a certain Tier level, or availability they want, and the MWs of IT load to support, but what do they really need? Often everyone in the design charrette assumes what a data center should look like without really diving deep into what is important.

When we do that, we can get some very interesting results. For example, I’ve been fortunate to have been involved with the design of three data centers this year and all three we were able to push the envelope of design and ask some of these difficult questions. Rarely did I get the answers from the end-users I wanted to hear, where they really questioned the traditional thinking and what a data center should be and why, but we did get to some unconventional conclusions about what they needed instead of automatically assuming what they needed or wanted. As a consequence, we designed three data centers with low PUEs, or even what I like to call “ultra-low PUEs“, those below 1.10. The first was at 1.08, the next at 1.06 and now we have a 1.046, OK, let’s call it 1.05 since the other two are rounded up as well. (We know we can get that one down to about 1.04 with a few more tweaks to that “what is really needed” question.)

Now, I figured that a PUE of 1.05 was going to take a few years to get to because the hardware needed to improve, i.e. chillers, UPS, transformers, etc. But what I didn’t take into account was that when we really look at what the client needs, not wants, and what we can do to design for efficiency without jumping to the same old way of designing a data center, we can reach some great results. I assume that this principal can apply to almost anything in life.

Now, you ask, how did we get to a PUE of 1.05? Let me hopefully answer a few of your questions: 1) yes, based on annual hourly site weather data; 2) all three have densities of 400-500 watts/sf; 3) all three are roughly Tier III to Tier III+, so all have roughly N+1 (I explain a little more below); 4) all three are in climates that exceed 90F in summer; 5) none use a body of water to transfer heat (i.e. lake, river, etc); 6) all are roughly 10 MWs of IT load, so pretty normal size; 7) all operate within TC9.9 recommended ranges except for a few hours a year within the  allowable range; and most importantly, 8) all have construction budgets equal to or LESS than standard data center construction. Oh, and one more thing: even though each of these sites have some renewable energy generation, this is not counted in the PUE to reduce it; I don’t believe that is in the spirit of the metric.

Now, for some of the juicy details (email or call me for more or read future blog posts). We questioned what they thought a data center should be: how much redundancy did they really need? Could we exceed ASHRAE TC9.9 recommended or even allowable ranges? Did all the IT load really NEED to be on UPS? Was N+1 really needed during the few peak hours a year or could we get by with just N during those few peak hours each year and N+1 the rest of the year?, etc. The main point of this blog post is to say that low PUEs, like that of 1.05, can be achieved, yes, been there and done that now, for the same cost or LESS than a standard design, and done TODAY, saving millions of dollars per year in energy, millions of tons of CO2, millions of dollars of capital cost up front, less maintenance, etc. We just need to really dive deep as to what we need, not what we want or think we need, and we’ll be better at achieving great things. Now, I need to apply this concept to other parts of my life; how about you?