CMAS Hardware Blog

From CMASWIKI
Jump to: navigation, search


Discussions on the best hardware options for air quality modeling.

February 2, 2016

 Is it possible to run CMAQ/CAMx in a single desktop CPU?
 What is the minimum and optimal hardware requirement for
 running CAMx? For example, if I want to run it for continental
 North American scale with 36km, 12km, and 4km grid resolution,
 what would be the minimum and optimal hardware requirement for
 one day or week simulation (considering reasonable time)?
 Any information would be highly appreciated. Thanks!


Answer: (Carlie Coats, UNC-IE)

Yes, for both. The main thing is that you'll need an x86_64 processor with adequate memory (which fortunately is cheap, these days) and disk (likewise). Both do rather a bit of I/O, so fast disk (maybe SSD) would be helpful.

CMAQ is MPI-parallel, with at least "decent" scaling (out to 16-24 cores, or so), so it will benefit from a "bigger" machine. However, the way current Xeons can have a dozen or more cores per socket (with two sockets per motherboard), that still allows you to run with a desktop/deskside machine.

If I were going to be doing CAMx, my dream machine would be built around a 4-core Intel 6700K "Skylake" processor, 32GB DDR4 RAM, with a 500GB Samsung 950-Pro PCI/NVME SSD and at least three times as much long-term "spinning disk" storage as I thought I needed (in my experience, you *always* need more disk-storage for modeling) -- 4x4TB in RAID5 would be maybe overkill :-) It should cost on the order of $3K

If I were doing large-scale CMAQ, then my dream machine would be a two-socket one with a pair of Xeon E5-2687v3 10-core or 2690v3 12-core processors, 64GB DDR3 RAM, and disk as above. It should run around $10-12K. (I could back off to cheaper processors, or even a single-socket 10-core system and do quite well with CMAQ, for quite a bit of savings.)

And I'm a monitor-bigot: get a 30-inch 2560x1600 display because you're going to want really good visualization for your big grids. (in my opinion, the "4K" displays aren't yet ready for prime time -- the software hasn't quite caught up).

I'd probably go with CentOS 7, activate the EPEL ("Extra Packages for Enterprise Linux") repository, Intel compilers, KDE4 (yum groupinstall "KDE Plasma Workspaces"), and also most of the "development" stuff...

Others had probably best tell you what kind of performance to expect...

October 1, 2015

 We are getting new computers and would like more information on optimal hardware
 configurations given our network and resources.  We would like to have
 the ability to run the CMAS suite of models locally and efficiently, but
 do not necessarily want/need state-of-the art (expensive) equipment.
 Do you have any recommendations? 

Answer: (Carlie Coats, UNC-IE)

With Intel's "Haswell-E", "Haswell-EP", and "Haswell-EX" Xeons, you can put together an enormously powerful computer in an incredibly small package, by historical standards. And very cheaply, by those same standards.

See <http://ark.intel.com/products/family/78583/Intel-Xeon-Processor-E5-v3-Family#@All>

The "E" is the 4-8-core 1-socket version; the "EP" is the 2-socket version; and the "EX" is the 4-8 socket version. The last two offer 8-18 cores per socket (!), which (if run with hyperthreading, which I don't recommend for HPCC applications like CMAQ) supports two virtual cores per physical core.

Note that CMAQ is a memory- and MPI-bound application, so that a single-computer (no-Infiniband) solution will be much faster than the same number of processor-cores split across a multi- computer cluster. The same is true of whatever meteorology model (WRF? MM5?) you use.

Note also that the operating system can use extra RAM for buffering, making the runs much faster (I've seen 20% speedups for met models that actually used 1 GB for the model itself, when going from 8GB RAM to 32GB RAM (that was several years ago...)), so choosing a relatively large-RAM configuration will help with I/O bottlenecks.

Note that CMAQ tends not to scale well past 8-32 cores, depending upon problem-size.

Note also that if a single computer can do the job, not only do you not have to pay for Infiniband cluster interconnect, you also do not suffer from networking overheads for MPI (which can function in-RAM for a single system, and is much faster).

Depending upon just how big your domain is, how fast your turnaround needs to be, and what your budget is, there are several possibilities, from least- to most-expensive:

  • Haswell-E 1-socket server/workstation: fits on/beside a desk. Probably under $10K. Use E5-1680v3 (8 cores at 3.2 GHz). Fit it with plenty of RAM (64GB or so) and disk (16TB or so( and you can use that machine for modeling tasks as well as visualization, evaluation, and all the rest.
  • Haswell-EP 2-socket server or workstation (you can actually fit this out as a desk-side machine if you want, or put it in your server rack.) In the $15K-$25K range, depending on configuration. Give it 128-256 GB RAM. For a desk-side unit, or for a 4U or 5U rack unit, working disk storage can be on-board. Because CMAQ is relatively a memory-bound model, I'd recommend the 12-core processors like E5-2690v3 rather than the 14-18-core E5-2697v3 or 2699v3 ones, for cost-effectiveness reasons. With 24-36 cores in one single computer, this is a very powerful system, and almost certainly can meet your modeling needs.
  • Cluster of Haswell-EP systems. If two systems provide enough compute power, you can connect with 10G Ethernet much more cheaply than with Infiniband. However, for larger clusters, 10G-E does not scale properly (overhead quadratic in size of the cluster), so you do need (expensive) Infiniband interconnect.
  • Haswell-EX 4-socket server. Processors are substantially more expensive per core. In the $50-60K range. Give it 256-768GB RAM. *VERY* powerful system; still does not have MPI slowdowns due to networking overhead.
  • Haswell-EX 8-socket server. Processors again are substantially more expensive per core. In the $75-90K range. Give it 256-768GB RAM. *VERY*VERY* powerful system; also does not have MPI slowdowns from networking overhead

As for disk space, in my experience, I've always eventually wanted at least twice what was initially specified for me :-)

Vendors will want to sell you SCSI, which is grossly over-priced in terms of your needs (they're thinking about huge databases with (in our terms) lots of tiny transactions; our transactions are much-larger 2-D or 3-D grid reads and writes.)

You will want both "working-disk" and "archive" areas. Working-disk needs to be fast; archive can be much slower. Two ways to get fast working-disk are:

  • Use "striped-RAID" disk arrays. That way, you get parallelism in large-volume reads and writes. 4x4TB will cost you less than $2K. 8x4TB+high-performance controller will be $6K.
  • Use (especially PCI) SSD or striped-SSD RAID. Large PCI SSD is 2-4TB, and is quite expensive; striped-SATA SSD arrays are a bit cheaper (but still more expensive than the same amount of spinning disk).