Posts Tagged ‘Data Center’

The Design of NCAR’s “Chillerless” data center with over 600 Watts/SF

Sunday, May 22nd, 2011

“Chiller-less”, “refrigeration-less”, and “compressor-less” designs have been something I have been striving for several years, with my testing and use of air-economized systems in data centers staring in 2002. In 2008-2009, I was lucky to join with Rumsey Engineers (now the Integral Group) as a consultant to work on data center projects. A fantastic experience, as Rumsey Engineers designs the most efficient mechanical systems of any team I know. In 2009, they believed that they had more Platinum LEED buildings than any engineering firm, and their numbers prove it.

Together in early 2009 we led a design Charrette for a new data center for the National Center for Atmospheric Research (NCAR), the folks who study climate data. As part of our design scope, we researched future generations of High-Performance Computing (HPC, aka supercomputer) equipment; it’s expected future energy use, load density and cooling system connections and inlet temperature requirements (some were air based, others water based). We looked at future generation of equipment as by the time the data center was built and the systems ordered and delivered, densities and cooling system connections would be different than today. This is a key point that we make with all of our projects: to look at what the hardware system needs will be several years from now, as it usually takes 1-2 years to build a data center, several years to fully load it, and we expect it to meet our operational needs for 10, 20 or more years. So, if the median of the data center’s life will be 7-15+ years away, than why would we design it to meet today’s computers? This is a mistake we see often in many people’s designs and site selections. Life changes, we must think ahead.

And this is why I research and pay attention to many cutting or leading edge technologies. Why I sit on boards of new and innovative technologies. This helps me see the future. And even though I was shocked to find future HPC systems had densities of over 2,500 Watt’s per square foot, I know that many computing systems of the future will use much lower densities than the average today, and there are always many technologies that we employ, not just one. Hence, we took a pragmatic approach to this analysis of future HPC systems and the needs of the leading researchers in climate change. (Incidentally, we also did an operating cost analysis of HPC systems that will come out between 2012 and 2014, and it yielded fairly broad cost differences, enough that first pass based upon compute performance would seem to lead to one system while just purchasing more of another system to get the same performance would still cost less, stressing the important point to always choose equipment that affords the lowest true total cost of ownership.)

Being that the site chosen for this data center was to be Cheyenne, Wyoming, a state with one the highest percentages of coal-generated electricity, energy efficiency in this design was essential. Although we were pretty certain we knew which type of mechanical system would be most energy efficient (and likely also lowest cost to build—they almost always go hand-in-hand when working pragmatically and holistically), we reviewed a rough design of several systems, including calculated annual PUE and a rough estimated build cost. We explored airside economization with 68F and 90F air a supply temps, the Kyoto cooling system (heat wheel), a modified heat wheel approach with economization, and waterside economization with 46F and 64F chilled supply water. Our modified heat wheel, and high supply temp air and water economized solutions did not require chillers, hence the temperatures as they were, as we pushed them until we did not require chillers. We choose the water economized system, which was our guess of the best system before we started any design analysis, as it provided 64F supply water, which was important as many HPC systems of the future will only run on chilled water and this temp is acceptable for the majority of the systems, and it also provided the lowest PUE of about 1.11 AND the lowest cost to build. This once again proves my motto that we build most efficient data centers at the lowest cost—the two seemingly disparate goals of capital cost and operating expense are once again aligned. Hence why we take a very pragmatic and holistic approach with an open mind to achieve the most.

This new 153,000 SF building designed to accommodate and secure the Scientific Computing Division’s (SCD) future in sustaining the computing initiatives and needs of UCAR’s scientific research constituents. Final design was based upon NCAR’s actual computing and data storage needs and a thorough review of future High Performance Computing (HPC) and storage technologies, leading to a 625 Watts/SF HPC space and a 250 Watts/SF medium density area. The data center is divided into two raised floor modules of 12,000 SF each with a separate data tape system area to reduce costs, increase efficiency and provide different temperature and humidity requirements than the HPC area. Also provided is a 16,000 SF office and visitor area heated by waste heat from the data center and a total facility capacity of 30 MVA.

Unique requirements of this high density, HPC data center were to also achieve ultra-high energy efficiency and LEED Silver certification for a modest construction budget. Various cooling options were analyzed, including Kyoto and other heat wheels, air economization, a creative solution of direct heat exchange with city water supply pipe and variations of water economized systems. Ultimately, LEED Gold certification and an annual operating PUE of about 1.14 is expected. This low of a PUE was thought to be impossible at the time of design (early 2009), especially for such high-density at TIER III. Through creative problem solving, the low PUE is obtained by designing a 9’ interstitial space above the raised floor combined with a 10’ waffle-grid raised floor to provide a low-pressure drop air recirculation system designed as part of the building. Ten day one chillers of 100 tons each provide supplemental cooling and optimum efficiency as load varies during hot summer months, while an indirect evaporative system with 96 fans in a fan wall provide ultra-low energy use cooling. An on site water supply tank, a total of nine standby generators at full build out of 2.5 MVA each, six 750 kVA UPS modules and other systems support the total low PUE and low construction budget for this high density HPC data center.

Here is a drawing of this data center now under construction:

Considering all of the vulnerabilities of data center sites

Thursday, May 5th, 2011

Where to hide your data center and protect it from damaging natural disasters?

I have built two data centers in the Raleigh, North Carolina. I traveled to Raleigh about once per month over a couple of years for these projects, many times driving in ice storms. It’s really quite fun to drive around when everything is coated in a sheet of ice. It’s like driving a Zamboni without an ice rink. Quite frankly, only people like me who have too much confidence in their driving abilities drive—everyone else stays home and for good reason as many cars are stuck on the roads and crashed up while driving in these conditions. Recently, storms in the Raleigh area caused a wide path of “death and damage” as reported here in the NY Times–declaring emergencies throughout North Caroline, Mississippi and Alabama. More extreme weather is predicted for the eastern seaboard with the ever-increasing climate change. Hurricane frequency and strength has increased several times over the last few years. Remember when one good hurricane a year was normal? Now it’s dozens, so much so, that the naming convention has changed completely from alphabetical names to female names to including male names and now numbered names similar to star systems.

Remember when California was the only place we expected to receive large earthquakes? Well, except for Japan, which reminded us once again of the devastation that can occur being along the Pacific Rim. I was in middle Baja following the recent Japan earthquake and had to change plans due to a tsunami warning from the Japan earthquake nearly 10,000 miles away, proving the point that near the ocean following an earthquake can be risky.

The largest earthquake in 35 years hits Arkansas…what you ask?! Arkansas? Yes, the largest in that state yet amongst more than 800 earthquakes in Arkansas since September 2010. Wow!! You can read more about it in this AP/Yahoo news article.

But even more spectacular–as I bring up earthquakes in Arkansas as merely an example–is that the largest risk of large-scale damage from an earthquake in the US is located right under the middle of the US, the New Madris Fault. Directly under Kentucky, Indiana, Illinois, Tenessee, Missippii, Arkansas, this baby is HUGE! With an ability to create horizontal acceleration of 1.89g, almost 5 times greater than the amount of ground acceleration at The Reno Technology Park near Reno, NV, which is located on stable ground absent of any earthquake faults. See this thesis on the affects of earthquakes on bridge design, which is the pinnacle of civil engineering for earthquakes, as they look at 75-year affects, not 20 as it is for most building construction. Even Texas is not immune to earthquakes. Having damaging earthquakes in 1882, 1891, 1917, 1925, 1931, 1932, 1936, 1948, 1951, 1957, 1964, 1966, 1969, and 1974. Many of these being felt as much as two states away from Texas, which covers a very large area. I type out all of these just to prove the point that even areas thought to be immune from damaging earthquakes have them, and more frequently than we care to remember. You can read more in this USGS article about Texas earthquakes here.

And thus the punch line is to consider data center site selection very carefully. Just because an earthquake has not happened for a long time does not mean that an area is not immune to a damaging earthquake. Check out this map of large earthquake potential and look at the two large circles of converging lines in the middle of the US and under South Carolina—these are the areas of greatest earthquake threat to public and buildings in the US:

How about volcanoes? Sure why worry unless you’re in the South Pacific, Hawaii or Costa Rica, right? Wrong. Over half of the world’s active volcanoes are in … did you guess…. The good ‘ole US of A. That’s right. Most of those are in Alaska as the Aleutian island chain is a pretty exciting place to be. And most of the them in the Continental US are located in Washington and Oregon. But guess what, the most exciting place in the US for a very damaging earthquake of proportions 1,000’s of times greater than the atomic bombs exploded on Japan to end Wrold War II? Wyoming. Yellowstone has been famous for Old Faithful. Heated by a geological hot spot, the same type that has created and is still creating the Hawaiian Islands. But new research calls it a supervolcano. Two of the larger eruptions from this supervolcano produced 2,500 times more ash than Mt St. Helens eruption in 1980, and that provided about 10’ of ash through eastern Washington and elsewhere. And this hot spot is getting hotter. Expected to impact Idaho, Wyoming and Montana with a greater frequency of earthquakes and a possible very large explosion that could wipe out a very large area. Read more about it here.

Why are these important to point out? Because we’ve designed and built data centers to withstand the impacts of what we EXPECT in a certain area, yet so many areas have more impacts than we imagined. Which leads me to site selection. Site selection isn’t so easy as to look at what has recently occurred or what we think might occur in an area; it should involve thorough research and understanding of what really are the risks over time and choose a site that best meets our risk tolerance/”comfort” during the life of the data center. And any risks should be reviewed, even those that seem unlikely, as we can see from many of these examples, that unlikely events can turn out to be devastating to any data center. Hence, location research is paramount to good site selection and these issues not overlooked. A good example is the over 20 active volcanoes in the Portland and Seattle area. Be aware of the risks in your decision or it could lead to a really bad day.

The Data Center Vibratation Penalty to Storage Performance

Thursday, June 10th, 2010

Every now and then a really great way to reduce energy use comes along that is so simple we all whack our head wondering, “why didn’t I think of that!” My principles of achieving ultra-efficient data centers (PUEs between 1.03-1.08; I call anything less than 1.10 ultra-efficient) are based upon simplicity and a holistic approach while meeting the need not the want or convention. Generally the simpler the better, as simple is always lower cost up front and ongoing, as well as easier to maintain, more reliable and more efficient.

So, here is one that will not catch you by surprise: a rack that saves energy. We’ve all heard of passive and active cooling racks: those with fans, heat exchangers or direct cooling systems. I explored some of the front & rear door heat exchanger racks back in 2003, which work really well for high-density applications but can be very expensive compared to better-designed data center cooling systems. But how about a rack that not only reduces energy costs but also improves hardware performance?

I’ve had the pleasure of exploring with Green Platform’s CEO, Gus Malek-Madani, their anti-vibration rack (“AVR”), a carbon-fiber composite rack actually designed to remove vibration. Why remove vibration? Green Platform claims that a typical datacenter experiences vibration levels of around 0.2 G-Root Mean Square (GRMS); this, it claims can degrade a disk drive’s performance (both I/O and throughput) by up to 66%; a fact that was borne out during a ‘rigorous’ testing exercise it did in conjunction with Sun Microsystems.

As harddrives get “larger” in capacity, bits get crammed into a smaller space. This along with smaller drives force tolerances between rotating platters and the movement of mechanical actuator arms within the drives to get tighter, and thus, vibration causes drives to slow down or have higher mis-read & writes, slowing down I/O performance. “As a result of this ‘vibration penalty,’ the company believes that up to a third of all US datacenter spending – on both hardware and power – is wasted on vibration, amounting to some $32bn of wastage. The company also says there’s evidence that reducing the impact of vibration will serve to improve the reliability of drives (and improving mean time between failure.)”

In order to back up this figure, early tests with Sun Microsystems (pre-Oracle) and Q Associates (“Effects of Data Center Vibration on Compute System Performance” by Julian Turner) showed IOPS improvement of up to 247% in random I/O. The following chart shows this storage performance degradation:

You can also watch the following video that clearly shows that just yelling into the face of storage hardware causes a very visible degradation of storage performance:

If the vibration from yelling into a rack causes performance degradation, think about the vibration affects from HVAC systems, thousands of server fans, and even walking thru your data center.

The company says its carbon-composite design massively reduces the vibrations that can cripple hard disk drive performance, boosting performance, efficiency and even reliability. From the results of these tests stated above, they assume that most folks should see a 100% improvement in storage throughput, 50% shorter job times and consequently, 50% less power consumed per job. In testing with Sun Microsystems the AVP dissipated vibration by a factor of 10x to 1000x. In further testing with systems integrator Q-Associates, which pitted the AVP against a regular steel rack – it found that random read IOPS increased by between 56% and 246%, with random write IOPS showing a 34% to 88% improvement with the AVP.  

“The throughput and I/O rate of storage remains a significant performance bottleneck within the datacenter; though hard disk drive (HDD) capacities have increased by several orders of magnitude in the last 30 years, drive performance has improved by a much smaller factor. This issue is exacerbated by the fact that server performance, driven by Moore’s law, has increased massively, to the extent that there’s now a server-storage performance gap. The way most datacenters engineer around this problem is inefficient; typically workloads are striped across many disks in parallel, and disks are ‘short stroked’ — i.e. data is only written to the edge of platters – in order to minimize latency. Although this does address performance, the trade-off is that disk capacity is massively underutilized, wasting datacenter space and energy, not to mention the cost of reliably maintaining an unnecessarily large disk estate.”

In the many data centers I have had the pleasure of working in lately, storage is growing faster than server capacity and the greatest performance limitation is storage throughput. This product works for the high-end video/audio and scientific markets; a niche space where another of Malek-Madani’s company– Composite Products, LLC — is focused. The test results clearly show storage throughput dramatically improved by reducing vibration at the rack of storage hardware. With some 3 million storage racks currently in use inside datacenters worldwide, and growing by the second to probably eventually exceed server racks, this is a very large opportunity to improve performance while reducing energy use, always one of my main mantras. Green Platform expects to have their racks as an option from storage vendors, NetApp, EMC, and others, so that as you purchase and provision new storage systems, you pay a small incremental increase in price of the storage system for a very large improvement in performance and energy reduction. Think of all of those servers waiting so much less for data throughput and how much that can improve the utilization of those systems? Think about it.

I’m looking to conduct an end-user test with their rack; contact me if you’re interested so we can determine results for your organization.

Can we replace UPSs in our data centers?

Tuesday, April 27th, 2010

It has been common since I entered the data center realm 15 years ago that a data center had Uninterruptible Power Supplies (UPS) feeding all computer equipment or other critical loads. The UPS did two things: 1) kept the power flowing from batteries in the UPSs for a short duration until generators came on, utility power was restored computer equipment could be shut down; and 2) kept voltage and frequency stable for the computer load while the utility (or generator) power fluctuated, known as sags or surges. However, UPSs consume about 5-15% if the power entering them as losses in the units (a.k.a inefficiency). So if IT load equals 1 MW, UPS power will be about 1.1 MWs with the additional 100 kW lost as heat, which then requires additional cooling to keep at the roughly 75F temperature batteries and UPS run best. Here is a photo of some UPS systems:ups

Now, enter 2010. UPSs are still assumed by nearly every data center engineer and operator to be needed or required, yet, power electronics within the computer equipment can ride thru just about any voltage sag or surge a utility would pass on thru their protective equipment. Computer equipment power supplies have been rated for 100-240VAC and 50-60 Hertz for about 10 years now, so a far greater range than an utility will likely every pass on. Furthermore, due to capacitors in the power supplies, these devices can ride thru complete outages of about 15+ cycles, which is roughly 1/4 second. So the UPSs job is really now only to provide ride thru of outages over 1/4 second and until a generator comes on or as needed by the operation.

In many of the data center design charrettes that I have been part of over the last few years, we ask the users what really needs to be on UPS, avoiding the assumption that all computer load must be on UPS. Once we dive into the operations, we always come back with an answer from the data center operators that only a portion of the computer load needs to be on UPS and the rest can go down during a usually irregular utility outage. The reason is that these computers can stop operating for a few hours and not affect the business. Examples might be HR functions, crawlers, back up/long-term data storage, research computers, etc. Computers that might need to be on UPS include sales tools, accounting applications, short-term storage, email, etc. but not every application and function. Think about your own data center operations about what can go down every now and then from a utility outage (usually about once per year for a few hours) and see if you can reduce the total amount of UPS power you require and repurpose that expensive UPS capacity and energy loss to the critical functions.

Some data centers avoid UPSs entirely by putting a small battery on the computer itself, in widely publicized Google’s case, an inexpensive and readily available 9V battery. While this is an excellent idea for those that have custom computer hardware, it is not as easy to implement for most folks buying commodity servers today. Perhaps another idea better for the masses is to locate a capacitor on the computer board or within the server that can ride thru ~20+ seconds until generator(s) can supply load during a utility outage. Capacitor technology of today should make this fairly easy to implement and could be a standard feature on all computer equipment with a minimal added cost, much as the international power supplies did for us 10+ years ago and higher-efficiency power supplies (90+) are today. A great new technology that could make this easy to build on the computer board can be seen here:

Using a technology like this we could avoid UPSs entirely in our data centers by having enough ride thru built onto the computer boards, into the hardware, allowing us to save very expensive UPS power capacity, operating and maintenance expenses and space within our data centers for more important functions, compute and storage capacity. My thought for the day. Think about it and you might save some money and energy.

SVLG Data Center Summit a GREAT Success

Friday, October 16th, 2009

Yesterday, October 15th, after a culmination of a year of work from over 60 people, the SVLG Data Center Energy Efficiency Summit went off smoothly. We had 44 presenters, 24 case case studies presented and about 450 people at the summit. The event was hosted by NetApp in Sunnyvale. Representatives from numerous Silicon Valley elites, start ups, VCs, organizations, and solution companies were present. All case studies were presented from data center end-users, showing what they are doing to reduce energy use in their data centers. We had brief sessions about cloud, carbon reductions, notable sessions called the Chill Off 2, in which various cooling technologies were tested with real load in a real data center, also testing the systems at various temperature ranges. Andrew Fanara with EPA gave a quick update of EnergyStar for servers, storage and networking gear. Paul Scheihing with DOE provided an update of the energy efficiency and data center programs. I had a candid interview with California Energy Commission Commissioner and friend Jeff Byron about California’s energy policy, net-zero energy buildings requirement, renewable portfolio standards, energy efficiency standards for TVs and other consumable devices, etc. It was fun!

The day went off without a hitch, thanks to Ralph Renne’s leadership with host NetApp, which provided a wonderful venue and support, with a team of probably 20-30 people directly supporting the event all day, along with a nice keynote from their CEO, Tom Georgens.

Numerous cooling case studies were presented, the Stack Framework which is a method of measuring complete data center load and efficiency, we had a CIO roundtable during lunch, Mark Bramfitt with PG&E talked about advancing PG&E’s compliment of data center energy efficiency incentives as well as his planned departure from PG&E, which caught many in the audience by surprise and left not only tears in Mark’s eyes, but also many in the audience with kind comments from Deborah Grove and others during the Q&A session about his excellent work over many years advancing these programs.

We had a great case study from Intel and NetApp about using on-board hardware temperature sensors to control HVAC equipment, which had some very compelling results within their data centers. Also Cytak presented about turning off computers automatically while not in use using Power Assure’s product.

Overall, a wonderful event. It was great to see so many industry friends and to make new friends. As the co-chair of the program and summit, it was great to see so many people interacting with each other, beginning collaborations stimulated from the excellent case studies presented, which is what the program is all about: Innovation through collaboration. Together we all benefit when we share with each other, and consequently, we as an industry then improves. It was wonderful to see every presenter do a fantastic job showing off their wonderful case studies. No vendors showing off their product, instead, everyone sharing information.

The cocktail reception at the end of the day drew about 200 people that wanted to stay and chat, make friends, and just have fun. So many thanks go out to my committee, which brought the case studies and presentations to us, which includes but not limited to: Bill Tschudi, Bob Hines, Bruce Myatt, Dale Sartor, David Mastrandrea, Deborah Grove, James Bickford, Kelly Aaron, Mukesh Khattar, Patricia Nealon, Ralph Renne, Rosemary Scher, Tersa Tung, and Zen Kishimoto; to Ray Pfeifer, my program co-chair, who brought this program to us last year and so many of the case studies again this year, and his leadership to keep this program about the end-user; to LBNL, CEC, CIEE and PG&E for helping to fund case studies and support the program; to the many sponsors of the summit. And certainly to SVLG for their staff to help make this summit a reality, and most certainly also their lead person, Bob Hines, for his drive and energy. Overall, an excellent day, full of wonderful people, making new and great little discoveries with each other to advance the energy efficiency and financial success of our businesses, and helping to lead the data center industry to greater success.

See me about joining this wonderful program. We are always looking for more case studies and about hosting next year’s summit.