Here at Lamont I did install and run both CCSM2.0.1 and CCSM3.0 on a Linux cluster
using the PGI compilers.
However, our platforms are different.
Here we have a 32-node cluster.
The nodes have dual Athlon MP (x86, 32bit) 1.2GHz cpus, from the days before hypertheading,
with 1GB or memory.
Our interconnect is Myrinet 2000, and we have MPICH-GM 2.1.5 (not P4).
FYI, independently of the Bern Univ. folks, we successfully compiled and ran CCSM2.0.1,
with a few fixes on code, and a lot of tweaking with scripts, MPI environment, etc.
Successful compilations happened with PGI 4.0-2 (the same as Bern), and also 5.1-3, 5.1-6, and 5.2-4.
All versions produced correct runs, except 5.1-6.
I tried also version 5.0-1, but it threw too many compilation errors on correct code, and I gave it up.
It took me quite some time to discover that PGI 5.1-6 compiles the CCSM2.0.1 code,
but produces a wrong executable (POP), because the 5-day runs do not show the problem.
Note that our validation included not only the 5-day "test.1a" run, but also a several-year run, with restarts.
When compiled with PGI 5.1-6, the ocean of those longer runs freezes, and you get an ice age
(actually a snowball Earth) in a few months!
Here my experience may differ from George Carr's.
The same subroutines that were incorrectly compiled by PGI 5.1-6
are present in both CCSM2.0.1 and CCSM3.0 versions of POP.
(The code is right, the compilation was wrong.)
I wonder if the same fortuitous ice age wouldn't be produced also by CCSM3.0, when compiled with PGI 5.1-6.
However, I never compiled CCSM3.0 with 5.1-6, only with 5.2-4.
The good news is that the problem (freezing ocean, fake ice age) doesn't happen with PGI 5.2-4.
I have discussed these and other compilation issues with PGI.
Finally the dust settled down with version 5.2-4,
which compiles and produces correct executables both of CCSM2.0.1 and CCSM3.0
(to the extent that our several-year validations can tell).
To my knowledge, 5.2-4 is the best version PGI produced since 4.0-2 , and also the most complete.
In addition, we used 5.2-4 to successfully compile and run CCSM3.0.
If you are contemplating to move from CCSM2.0.1 to CCSM3.0, please note that
the latter has an atmosphere component which is significantly more demanding then the former
(a more complex radiative scheme, aerosols, etc), as George Carr pointed out.
This may be important, since you have a small cluster like we do (with faster processors, but a slower interconnect).
We ran both models here with the same resolution (T42, gx1v3) on 16 nodes (32 processors).
To simulate one year, CCSM2.0.1 takes about 21h or wallclock time,
whereas CCSM3.0 lingers for 26.5 hours.
On the other hand, there are much less compilation/configuration problems on CCSM3.0 then on CCSM2.0.1.
Regarding the particular errors you listed, it is hard to tell what is causing them from
the terse messages on the log alone.
Here are a few wild guesses.
1) Your error log suggest you have the following distribution of MPI processes:
cpl=1, atm=1, lnd=1, ice=1, ocn=4 (total 8)
This sounds too few, and may trigger memory paging.
When memory swapping kicks in MPI may hang or crash, issuing elusive error messages.
You may need to use more cpus.
Here we ran CCSM2.0.1 on 32 processors for T42 gx1v3 resolution:
cpl=2 (actually cpl in OpenMP mode, but you may skip OpenMP and use cpl=1), atm=8, lnd=2, ice=4, ocn=16.
For T31 gx3v5 resolution we could manage to trim down the number of processor to 16, minimum,
but it was very slow.
2) Hyperthreading.
Have you checked on the nodes what processes are running and where?
Are you launching one MPI processs per thread or one per cpu?
Your error log suggests it is one per cpu, but I may be wrong.
One per thread may be tricky.
Hopefully you already ran simpler MPI programs with hypethreading turned on,
and hyperthreading is not the issue.
If it is the issue, then you may consider turning hyperthreading off.
(What do the MPICH maintainers at ANL have to say about P4 and hyperthreading?
It may be worth asking them.)
3) Have you changed the way POP does I/O, to funnel it to a single processor?
We leaned this the hard way, after a long debugging campaign of our restart runs.
4) Make sure your mpirun command is launching all the processes on the correct nodes,
and that all of them are getting the environment variables they need.
The correct implementation of MPMD using the "pgfile" is a subtle point.
In particular, I had to send to the nodes (i.e. to each executable), as environment variables via "pgfile",
the and name of the stdin and stdout files (i.e. log file), and the directory path where those files were located.
I had to rewrite our mpirun to force it to pass the environment variables correctly to the nodes,
since our original mpirun wouldn't do so.
However, with MPICH-P4 this may not be a problem.
Good luck,
Gus Correa
--
---------------------------------------------------------------------
Gustavo J. Ponce Correa - Email: gus@ldeo.columbia.edu
Lamont-Doherty Earth Observatory - Columbia University
P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
Oceanography Bldg., Rm. 103-D, ph. (845) 365-8911, fax (845) 365-8736
---------------------------------------------------------------------
Alexander Stine wrote:
George,
Thanks for breakdown.
Perhaps I should step back and ask a more
general question.
Given the hardware that we have acess to, what
would be the most apropriate software combination
for us to use? We would be happy with just about
any version of CAM or CCSM.
I chose CCSM 2.0.1 to begin with because folks
at the University of Bern had ported it to linux
and I was under the impression that later versions
had not been ported. It sounds like I may have
been mistaken.
It sounds like I will also have to contact the
portland group and find out if it is possible to
buy old versions of the compiler (only the current
version appears to be on their web site).
thanks for the help
Zan
On Thu, 24 Feb 2005, George R Carr Jr wrote:
Hmmm. Not sure where to start.
First, I do not support CCSM2. I do support CCSM3.
PGI: I have validated CCSM3 using pgi5.1-3 and pgi5.1-6 on several Xeon
Linux clusters. There are bugs in the pgi5.2-4 compiler that require CCSM3 code
workarounds when I attempt to run on our Opteron cluster but this is not a validated
configuration. I cannot give you a date with I might be able to complete a 5.2x
validation. And pgi is already planning on releasing a 6.x compiler before long.
Ethernet: Any Linux cluster running over ethernet that we have used or tested
is running MPICH1.2.5 or 6 using the P4 driver. CCSM3 as released will
not run with this setup until we can get some additional code changes into the
CCSM3 baseline. My tests show that this works and generates correct climate.
This problem does not appear if you have a Myranet network with mpich-gm
configured properly. Without these changes you will see CCSM3 starting up
and then blowing out or hanging with ethernet.
Thanks.
At 3:49 PM -0800 2/24/05, Alexander Stine wrote:
Hello,
I am trying to install CCSM on a 16 node rocks
linux cluster.
Each node has:
2 Intel(R) Xeon(TM) CPU 2.40GHz 32Bit chips (with
hyperthreading turned on, so /proc reports 4 chips).
2 GB memory
The nodes are connected with 100 Megabit Ethernet
(which we plan to upgrade to gigabit if we can
get the model running).
I am trying to install the version of CCSM 2.0.1
ported to linux by the University of Bern.
The model compiles, but crashes early with what looks like
some mpi passing errors ? ...
-------------------------------------------------------------------------
j. Run the model, execute models simultaneously allocating CPUs
-------------------------------------------------------------------------
Wed Feb 23 17:48:49 PST 2005 -- CSM EXECUTION BEGINS HERE
start mpirun
(shr_msg_chdir) chdir for model = cpl
(shr_msg_chdir) changed cwd to /home/zan/ccsm/run/linux_test/cpl
t_setoption: option disabled: Usr Sys
(shr_msg_chdir) chdir for model = atm
(shr_msg_chdir) changed cwd to /home/zan/ccsm/run/linux_test/atm
t_setoption: option disabled: Usr Sys
(shr_msg_chdir) chdir for model = lnd
(shr_msg_chdir) changed cwd to /home/zan/ccsm/run/linux_test/lnd
(shr_msg_chdir) chdir for model = ocn
(shr_msg_chdir) changed cwd to /home/zan/ccsm/run/linux_test/ocn
(shr_msg_chdir) chdir for model = ocn
(shr_msg_chdir) changed cwd to /home/zan/ccsm/run/linux_test/ocn
(shr_msg_chdir) chdir for model = ocn
(shr_msg_chdir) changed cwd to /home/zan/ccsm/run/linux_test/ocn
(shr_msg_chdir) chdir for model = ocn
(shr_msg_chdir) changed cwd to /home/zan/ccsm/run/linux_test/ocn
(shr_msg_chdir) chdir for model = ice
(shr_msg_chdir) changed cwd to /home/zan/ccsm/run/linux_test/ice
p1_6591: p4_error: net_recv read: probable EOF on socket: 1
p3_5984: p4_error: net_recv read: probable EOF on socket: 1
rm_l_2_6414: (10.902192) net_send: could not write to fd=8, errno = 9
p4_error: latest msg from perror: Bad file descriptor
rm_l_2_6414: p4_error: net_send write: -1
Received disconnect from 10.255.255.252: Command terminated on signal 11.
Received disconnect from 10.255.255.251: Command terminated on signal 13.
Received disconnect from 10.255.255.247: Command terminated on signal 13.
Received disconnect from 10.255.255.249: Command terminated on signal 13.
P4 procgroup file is pgfile.
Wed Feb 23 17:49:05 PST 2005 -- CSM EXECUTION HAS FINISHED