Welcome, Guest
Username: Password: Remember me
HELPDESK

Here we can describe more what should be posted here

  • Page:
  • 1
  • 2

TOPIC: cy40 crash in Canari (canaco.F90) at CCA

cy40 crash in Canari (canaco.F90) at CCA 1 month 3 days ago #2423

  • Erik
  • Erik's Avatar
  • OFFLINE
  • Fresh Boarder
  • Posts: 11
The Canari crash is probably related to Obs usage, the date is: 2020050600 (and also 03Z)
Help would be appreciated, we are a bit stuck.

We tried to use only Synop Obs but it still crash.
With cy43 it works.

The Exp has been running several cycles, without problem, before the crash.

Exp setup: /home/ms/fi/fie/hm_home/Exp_NWC_40h12_TEST

The Exp log is in: /hpc/perm/ms/fi/fie/HARMONIE/Exp_NWC_40h12_TEST/Date/Hour/Cycle/Analysis/AnSFC/Canari.1

Traceback:



ADDVIEWDB("castor" : db="ECMA") : total#, dbhandle, viewhandle, thread-id = 30 1 1437999105 1
ADDVIEWDB("cancer_robhdr" : db="ECMA") : total#, dbhandle, viewhandle, thread-id = 31 1 1400999937 1
ADDVIEWDB("cancer_robody" : db="ECMA") : total#, dbhandle, viewhandle, thread-id = 32 1 1408348161 1
18:09:46 Fin 1er calcul residus
ADDVIEWDB("cantik_robhdr" : db="ECMA") : total#, dbhandle, viewhandle, thread-id = 33 1 1417809921 1
ADDVIEWDB("cantik_robody" : db="ECMA") : total#, dbhandle, viewhandle, thread-id = 34 1 1420988417 1
18:09:46 Debut analyses
ADDVIEWDB("canaco_robhdr" : db="ECMA") : total#, dbhandle, viewhandle, thread-id = 35 1 1384734721 1
ADDVIEWDB("canaco_robody" : db="ECMA") : total#, dbhandle, viewhandle, thread-id = 36 1 1391771649 1
JSETSIG: sl->active = 0
signal_harakiri(SIGALRM=14): New handler installed at 0x202e1560; old preserved at (nil)
***Received signal = 11 and ActivatED SIGALRM=14 and calling alarm(10), time =1591207803.84
[myproc#4,tid#1,pid#69507,signal#11(SIGSEGV)]: Received signal :: 7345MB (heap), 4304MB (rss), 0MB (stack), 0 (paging), nsigs 1, time 1591207803.84

.
.
.

ABORT! 2 Dr.Hook calls ABOR1 ...
(pid=69507.46912658881536) [3]: libpthread.so.0(+0xf850) [0x2aaaaaf6d850] : __do_global_ctors_aux() at crtstuff.c:0
(pid=69506.46912658881536) [3]: libpthread.so.0(+0xf850) [0x2aaaaaf6d850] : __do_global_ctors_aux() at crtstuff.c:0
(pid=69507.46912658881536) [4]: MASTERODB(canaco_+0x1c0) [0x20aae350] : canaco_() at canaco.F90:149
(pid=69507.46912658881536) [5]: MASTERODB(canari_+0x1d54) [0x20671904] : canari_() at canari.F90:378
(pid=69506.46912658881536) [4]: MASTERODB(canaco_+0x1c0) [0x20aae350] : canaco_() at canaco.F90:149
(pid=69507.46912658881536) [6]: MASTERODB(can1_+0x21d) [0x2066bc4d] : can1_() at can1.F90:151
(pid=69506.46912658881536) [5]: MASTERODB(canari_+0x1d54) [0x20671904] : canari_() at canari.F90:378
(pid=69507.46912658881536) [7]: MASTERODB(cnt0_+0xbd6) [0x202f8496] : cnt0_() at cnt0.F90:195
(pid=69506.46912658881536) [6]: MASTERODB(can1_+0x21d) [0x2066bc4d] : can1_() at can1.F90:151
(pid=69507.46912658881536) [8]: MASTERODB(main+0x6f) [0x202c42bf] : master() at master.F90:83
(pid=69506.46912658881536) [7]: MASTERODB(cnt0_+0xbd6) [0x202f8496] : cnt0_() at cnt0.F90:195
(pid=69506.46912658881536) [8]: MASTERODB(main+0x6f) [0x202c42bf] : master() at master.F90:83
(pid=69507.46912658881536) [9]: libc.so.6(__libc_start_main+0xe6) [0x2aaab1beac36] : __do_global_ctors_aux() at crtstuff.c:0
(pid=69506.46912658881536) [9]: libc.so.6(__libc_start_main+0xe6) [0x2aaab1beac36] : __do_global_ctors_aux() at crtstuff.c:0
(pid=69507.46912658881536) [10]: MASTERODB() [0x202c436d] : _start() at start.S:116
[LinuxTraceBack] : End of backtrace(s)
Last Edit: 1 month 3 days ago by Erik.

cy40 crash in Canari (canaco.F90) at CCA 1 month 3 days ago #2424

  • Ole Vignes
  • Ole Vignes's Avatar
  • OFFLINE
  • Administrator
  • Posts: 39
  • Thank you received: 10
I took a look in the log, which also includes the NODE.001_01 file.
As far as I can see you don't have any observations at all, no dribu and no synop.

cy40 crash in Canari (canaco.F90) at CCA 1 month 3 days ago #2425

  • Eoin Whelan
  • Eoin Whelan's Avatar
  • OFFLINE
  • Gold Boarder
  • Posts: 209
  • Thank you received: 41
Hi Erik,

my fault ...

I think you need to change your CONV_SOURCE to mcp in scr/include.ass.

This isn't as much of an issue in CY43.

Lots of "inconnu" messages in your Bator log.

Eoin

cy40 crash in Canari (canaco.F90) at CCA 1 month 3 days ago #2426

  • Erik
  • Erik's Avatar
  • OFFLINE
  • Fresh Boarder
  • Posts: 11
Thank you for such fast response, both Ole and Eoin!

Eoin: I don't think your change worked.
I included your change and also changed scr/Bator to use all Obs again. Then reran InitRun, Prepare_ob, Bator and then Canari -> same error.
And, as Ole said, no observations seems to enter (as seen in NODE.001_01).
Did I execute this correctly?
Last Edit: 1 month 3 days ago by Erik.

cy40 crash in Canari (canaco.F90) at CCA 1 month 3 days ago #2427

  • Erik
  • Erik's Avatar
  • OFFLINE
  • Fresh Boarder
  • Posts: 11
Ole: Yes, I noticed the same! But why is that... I have tried to include all Obs as normal, but still the Obs list is empty.
Last Edit: 1 month 3 days ago by Erik.

cy40 crash in Canari (canaco.F90) at CCA 1 month 3 days ago #2428

  • Ulf Andrae
  • Ulf Andrae's Avatar
  • OFFLINE
  • Administrator
  • Posts: 286
  • Thank you received: 31
What's the output from Bator? This is a MetCoOp setup running with MARS observations, right?

cy40 crash in Canari (canaco.F90) at CCA 1 month 3 days ago #2429

  • Erik
  • Erik's Avatar
  • OFFLINE
  • Fresh Boarder
  • Posts: 11
Correct, we run a MetCoOp type of setup, using MARS observations (but FG from MEPS, i.e. not cycling)

Note: We did run several cycles of the Exp, without any problems, until it crashed.

The Bator log-file does not look too healthy, a lot of warnings!


BUFR TABLES TO BE LOADED B0000000000098013001.TXT,D0000000000098013001.TXT
* WARNING - BATOR : template inconnu pour fichier N. 1
307005 13023 13013 222000 101049 31031 1031 1032 101049 33007
* WARNING - BATOR : template inconnu pour fichier N. 2
307005 13023 13013 222000 101049 31031 1031 1032 101049 33007
* WARNING - BATOR : template inconnu pour fichier N. 3
...
...
...
Last Edit: 1 month 3 days ago by Erik.

cy40 crash in Canari (canaco.F90) at CCA 1 month 3 days ago #2430

  • Ole Vignes
  • Ole Vignes's Avatar
  • OFFLINE
  • Administrator
  • Posts: 39
  • Thank you received: 10
Eoin knows this better than me, but with mars observations I would
have expected CONV_SOURCE=mars instead of mcp?

cy40 crash in Canari (canaco.F90) at CCA 1 month 3 days ago #2431

  • Eoin Whelan
  • Eoin Whelan's Avatar
  • OFFLINE
  • Gold Boarder
  • Posts: 209
  • Thank you received: 41
Your are correct Ole.

I will try to take a proper look at this issue on Monday.

In the meantime you should compare the contents of the param file you are using and the list in those "inconnu" messages. It is possible my list of ECMWF SYNOP param definitions/templates is incomplete.

Eoin
The following user(s) said Thank You: Erik

cy40 crash in Canari (canaco.F90) at CCA 1 month 13 hours ago #2432

  • Erik
  • Erik's Avatar
  • OFFLINE
  • Fresh Boarder
  • Posts: 11
Eoin, just to let you know.
There are no sfc Obs generated from "Prepare_ob" task, so no wonder that Bator and Canary crashes...
Is Prepare_ob also depnding on the "list of ECMWF SYNOP param definitions/templates" you mention here above?

Thanks for the support!
Last Edit: 1 month 13 hours ago by Erik.

cy40 crash in Canari (canaco.F90) at CCA 4 weeks 1 day ago #2433

  • Eoin Whelan
  • Eoin Whelan's Avatar
  • OFFLINE
  • Gold Boarder
  • Posts: 209
  • Thank you received: 41
Hi Erik,

Are you sure you are using MARS observations?

Taking a look at your logs - there are no MARS requests required as the ob files are already in place and the files only contain TAC SYNOP data.

Have you tried CONV_SOURCE=mcp?

Eoin

cy40 crash in Canari (canaco.F90) at CCA 4 weeks 1 day ago #2434

  • Erik
  • Erik's Avatar
  • OFFLINE
  • Fresh Boarder
  • Posts: 11
Eoin,

We use MARS obs (at least we do not attempt to do anything else).
The Exp run had a longer DTGEND, so I assume the MakeCycleInput (and Prepare_ob) just progressed to generate/fetch those ob-file, and that is why they are in place.

I did try "mcp" but that did not help.

cy40 crash in Canari (canaco.F90) at CCA 4 weeks 1 day ago #2435

  • Eoin Whelan
  • Eoin Whelan's Avatar
  • OFFLINE
  • Gold Boarder
  • Posts: 209
  • Thank you received: 41
Hi again,

OK - I think the MARS request in Prepare_ob needs to be updated.

I would suggest the following:
* take the MARS specific logic/entries in the script from develop and replace what you have
* remove the ob files from your experiment on cca
* re-run the Prepare_ob tasks

The mars param may also need to be updated. We can cross that bridge when we get to it!

Eoin

cy40 crash in Canari (canaco.F90) at CCA 4 weeks 1 day ago #2436

  • Erik
  • Erik's Avatar
  • OFFLINE
  • Fresh Boarder
  • Posts: 11
Hi,

Now "/splitObs/" dir have Synop and other obs prepared in dir: /scratch/ms/fi/fie/hm_home/Exp_NWC_40h12_TEST/20200506_00/

The strange thing is that I only removed the ob-file and reran "Prepare_ob". The new ob-file is same as old one, so I don't understand why "splitObs/" gets different now!?

I guess "Prepare_ob" is doing what it should now? No need to do what you suggested above?

Bator is now looking differently, log-file: /hpc/perm/ms/fi/fie/HARMONIE/Exp_NWC_40h12_TEST/MakeCycleInput/Hour/Cycle/Observations/Bator.4
Can't say for sure if all is fine in there... it does something with the Synop at least.

Canari does not get Obs (according to "NODE.001_01" file) and fails with same error as before.
Last Edit: 4 weeks 1 day ago by Erik.

cy40 crash in Canari (canaco.F90) at CCA 4 weeks 1 day ago #2437

  • Eoin Whelan
  • Eoin Whelan's Avatar
  • OFFLINE
  • Gold Boarder
  • Posts: 209
  • Thank you received: 41
OK. If Bator is reporting a healthy output of SYNOP data you need to look in the ODB to see what is being presented to Canari. Are data rejected or blacklisted for some reason?
  • Page:
  • 1
  • 2
Time to create page: 0.103 seconds