Welcome, Guest
Username: Password: Remember me
Forum header

TOPIC: Running HIRLAM on c2a

Running HIRLAM on c2a 7 years 6 months ago #1003

  • Ulf Andrae
  • Ulf Andrae's Avatar
  • OFFLINE
  • Administrator
  • Posts: 286
  • Thank you received: 31
ECMWF new IBM power 7 machine c2a has been available for a while now. ECMWF is about to turn off c1a on the 28th of Januray so it's time to move your harmonie experiments to c2a. This is the situation for HIRLAM

- GLAMEPS runs successfully on c2a

- Per Dahlgren has migrated his EURO4M 4DVAR runs to c2a with the limitation that it fails when running on to many nodes.

- The trunk is being tested but currently fails in the hdf part of the climate generation.

[16:29:51] Climate_month[337]> rm -f hl.hdf tsoil.hdf
#[16:29:51] Climate_month[346]> step=landmask
#[16:29:51] Climate_month[347]> Boot rgn_file lc_data landmask hl.hdf
#[16:29:51] Climate_month[347]> PGM=ctopo
<font size=-1 color="grey">
</font>
ERROR, cannot open HDF file /scratch/ms/se/snh/hl_home/test_c2a/lib/data/gtopo_9000_0.0125/9000_900W_150S_0250_glccsp.hdf
/scratch/ms/se/snh/hl_home/test_c2a/lib/scripts/Boot failed
#[16:29:51] Climate_month[347]> exit
#[16:29:51] Climate_month[347]> echo /scratch/ms/se/snh/hl_home/test_c2a/lib/scripts/Climate_month failed in
#[16:29:51] Climate_month[347]> step=landmask

An example of the experiment setup is available as a tar file on
ecgate:/scratch/ms/se/snh/test_c2a.tar

Any suggestions on how solve this is welcome.

Ulf

Re:Running HIRLAM on c2a 7 years 6 months ago #1005

  • Ulf Andrae
  • Ulf Andrae's Avatar
  • OFFLINE
  • Administrator
  • Posts: 286
  • Thank you received: 31
After some suggestions and fixes with hdf/netcdf version as

hirlam.org/trac/changeset/11470

the climate generation almost works. However, comparing what happens on c1a and c2a we find that the lake part still fails silently on c2a.

READ: UNIT,PAR,TYP,LEV,MAX,MIN 95 193 105 955 0.0000E+00 0.0000E+00
READ: UNIT,PAR,TYP,LEV,MAX,MIN 96 193 105 955 0.0000E+00 0.0000E+00
WRITING: UNIT,PAR,TYP,ALEV,MAX,MIN 97 193 105 955.00 0.0000E+00 0.0000E+00
StartLake: Wrong or no lake cold start data file LAKE_LTA.nc 22
#[03:59:28] Climate[133]> cp -f fort.97 /scratch/ms/se/snh/hl_home/test_c2a/cl00010100

whereas on c1a we have

WRITING: UNIT,PAR,TYP,ALEV,MAX,MIN 97 193 105 955.00 0.0000E+00 0.0000E+00
StartLake: Opened lake cold start data file LAKE_LTA.nc
StartLake: Preparing initial values of prognostic variables, please wait...
StartLake: finished successfully
ITYPE JPAR JLEV 160 80 702

Ulf

Re:Running HIRLAM on c2a 7 years 6 months ago #1014

  • Laura Rontu
  • Laura Rontu's Avatar
  • OFFLINE
  • Administrator
  • Finnish Meteorological Institute
  • Posts: 154
  • Thank you received: 8
Without climate generation,. continuing from first guess from the same, quite small Nordic area, experiment in c1a, I manage to stay forever in 3DVAR, memory problem reported. Continuing with NOUA works fine, and is sufficient for this particular experiment, so I did not try to fix it.

Re:Running HIRLAM on c2a 7 years 5 months ago #1021

I tried to run normal 7.4 on c2a using earlier created climatologies. In addition I made some modifications to the submission.db suggested in
www.ecmwf.int/services/computing/hpcf/migration.html

The run failed in 4DVAR minimization in screening. It computed the the kstarmax totally crazy
"kstarmax in x-direction= 9388441" and could not then alloocate so much memory. At the moment I don't have any clue where this comes from.

My experiment name is test_c2a_rc
and everything can be seen on
/scratch/ms/fi/fne/hl_home/test_c2a_rc
if some wants to see details.

Below some lines before abort

Kalle

Number of redundant AIREPS : 375
&NLLOWRES
NXL_GLOBAL_LOW=180, NYL_GLOBAL_LOW=120, NX_GLOBAL_LOW=130, NY_GLOBAL_LOW=103, KMAX_GLOBAL_LOW=89, LMAX_GLOBAL_LOW=59, DTDYN_LOW=1800, DTPHYS_LOW=1800, DTVDIF_LOW=1800, NSTEP_DA_LOW=10, NLHIST_FR_LOW=2, NXL_GLOBAL_EZ=144, NYL_GLOBAL_EZ=120, KMAX_GLOBAL_EZ=71, LMAX_GLOBAL_EZ=59
/
===== kstep_ob_list =============
1 -1
2 0
3 2
4 4
5 6
6 8
7 10
===== kstep_nl_list =============
1 -1
2 0
3 2
4 4
5 6
6 8
7 10
=================================
hxinvm,hyinvm= 1.82082741265383574 0.165316462691989548E-04
Jb file first record=statbal oldmoist
D_strfun= 10428148.5536535345
kstarmax in x-direction= 9388441
kstarmax in ydirection= 85
kstarmax_max reset from 480 to 9388442
"/s2ms/s2ms_lb/scratch/ms/fi/fne/hl_home/test_c2a_rc/lib/src/modules/set_baloptormod.F", line 6: 1525-108 Error encountered while attempting to allocate a data object. The program will stop.
"/s2ms/s2ms_lb/scratch/ms/fi/fne/hl_home/test_c2a_rc/lib/src/

Re:Running HIRLAM on c2a 7 years 5 months ago #1023

  • Ulf Andrae
  • Ulf Andrae's Avatar
  • OFFLINE
  • Administrator
  • Posts: 286
  • Thank you received: 31
Kalle,

The problem occurs in the non-buffered MPI in the transpose routines. It's still not clear exactly what happens. If you recompile without -DMPINONBUFFERED it should work.

Ulf

Re:Running HIRLAM on c2a 7 years 5 months ago #1024

Thanks Ulf

It worked. I tried both in 3DVAR and 4DVAR.

In addition I had to make small changes in slswap.F:
In routines slswap_ad and slswap_an_ondemand some definitions were inside #ifdef MPINONBUFFERED, although nonbuffered MPI is not used in those routines. So the comments like below in two places helped that


c
#ifdef MPILIB
#include "mpif.h"
integer errormpi,statusmpi(MPI_STATUS_SIZE),realtype
real slbuf(nx_local)
cccc#ifdef MPINONBUFFERED
integer bufsize,realsize,packsize,pos
integer sendreq1,sendreq2
real sendbuf1(nx_local*nslhalo*(nlev+1))
real sendbuf2(nx_local*nslhalo*(nlev+1))
real recvbuf( nx_local*nslhalo*(nlev+1))
cccccc#endif
#endif

Kalle

Re:Running HIRLAM on c2a 7 years 5 months ago #1038

  • Laura Rontu
  • Laura Rontu's Avatar
  • OFFLINE
  • Administrator
  • Finnish Meteorological Institute
  • Posts: 154
  • Thank you received: 8
In this post, I meant of course c2a, where 3DVAR did not work in my Nordic domain. After all updates by Kalle (see below) it also did not work before I modified submission.db slightly, to fit better to this small domain properties. Such a modification was suggested by Xioahua some years ago, but was not necessary afterwards in c1a. Now it seems to be necessary again in c2a. So this problem is solved, instead of NOUA I may again use 3DVAR.

Re:Running HIRLAM on c2a 7 years 5 months ago #1048

  • Laura Rontu
  • Laura Rontu's Avatar
  • OFFLINE
  • Administrator
  • Finnish Meteorological Institute
  • Posts: 154
  • Thank you received: 8
In the system working week wiki, success in solving the remaining HIRLAM-c2a problems was reported by Toon:

MPI problems (Toon: I ran 7.2.2.bf1 on its default domain without problems)

Climate generation (Toon: I was successful building my own HDF4 from scratch - version 4.2.9)

The first piece of news seems to concern a very old HIRLAM version. What would the second piece of good news mean for people running reference HIRLAM 7.4 experiments over a new domain, which requires climate generation? Does it mean v.4.2.9 HDF4 solves the climate generation problem? How to make this version available on c2a platform?
Time to create page: 0.093 seconds