Welcome, Guest
Username: Password: Remember me
Forum on HARMONIE Surface development

TOPIC: Crashes in CANARI

Crashes in CANARI 2 years 3 weeks ago #1982

  • Ulf Andrae
  • Ulf Andrae's Avatar
  • OFFLINE
  • Administrator
  • Posts: 283
  • Thank you received: 30
In the context of HarmonEPS over the METCOOP25B domain we've experienced two troublesome dates with crashes in CANARI, 2017-06-13 06Z and 2017-09-06 00Z. The crashes are reproducible with harmonie-40h1.1.bf1@cca on the first assimilation cycle. The traceback is shown below. Any suggestions from anyone?

Ulf

signal_harakiri(SIGALRM=14): New handler installed at 0x202d0140; old preserved at (nil)
***Received signal = 8 and ActivatED SIGALRM=14 and calling alarm(10), time =1510918066.32
[myproc#4,tid#1,pid#18048,signal#8(SIGFPE)]: Received signal :: 4425MB (heap), 2158MB (rss), 0MB (stack), 0 (paging), nsigs 1, time 1510918066.32
tid#1 starting drhook traceback, time =1510918066.32
[myproc#4,tid#1,pid#18048]: 4425 MB (maxheap), 2158 MB (maxrss), 0 MB (maxstack), walltime = 1510918066.32s
[myproc#4,tid#1,pid#18048]:  MASTER 
[myproc#4,tid#1,pid#18048]:   CNT0<1> 
[myproc#4,tid#1,pid#18048]:    CAN1 
[myproc#4,tid#1,pid#18048]:     CANARI 
[myproc#4,tid#1,pid#18048]:      CADAVR 
[myproc#4,tid#1,pid#18048]:       STEPO 
[myproc#4,tid#1,pid#18048]:        OBSV 
[myproc#4,tid#1,pid#18048]:         TASKOB 
[myproc#4,tid#1,pid#18048]:          TASKOB>KSET_LOOP 
[myproc#4,tid#1,pid#18048]:           TASKOB>OBSGRP=01                         
[myproc#4,tid#1,pid#18048]:            HOP 
[myproc#4,tid#1,pid#18048]:             PPOBSAC 
[myproc#4,tid#1,pid#18048]:              ACHMT 
tid#1 starting sigdump traceback, time =1510918066.32
Last Edit: 2 years 3 weeks ago by Ulf Andrae.

Crashes in CANARI 2 years 3 weeks ago #1983

  • Eoin Whelan
  • Eoin Whelan's Avatar
  • OFFLINE
  • Gold Boarder
  • Posts: 195
  • Thank you received: 33
Hi Ulf,

could you point me to the experiment on cca. I am guessing all the best people have already looked at this but I am intrigued!

Eoin

Crashes in CANARI 2 years 3 weeks ago #1984

  • Ulf Andrae
  • Ulf Andrae's Avatar
  • OFFLINE
  • Administrator
  • Posts: 283
  • Thank you received: 30
Eoin,

The experiment with the troublesome dates can be found on cca under /scratch/ms/se/snh/hm_home/canari_crash. Thanks for taking a look at it!

Ulf

Crashes in CANARI 2 years 3 weeks ago #1994

  • Ulf Andrae
  • Ulf Andrae's Avatar
  • OFFLINE
  • Administrator
  • Posts: 283
  • Thank you received: 30
I managed to get 20170613 06 through by setting all observation types to false in oulan except SYNOP. I haven't checked the result but I wonder if it's a reasonable hack to do? Do we really need any other observations than SYNOP for CANARI? For a more proper solution I wonder if the new strategy with Bator only handles this situation better?

Ulf

Crashes in CANARI 2 years 3 weeks ago #1995

  • Eoin Whelan
  • Eoin Whelan's Avatar
  • OFFLINE
  • Gold Boarder
  • Posts: 195
  • Thank you received: 33
Hi Ulf,

this approach should be fine. I did something similar when we encountered some troublesome aircraft data in good old 36h1.3 a few years ago.

I didn't make this suggestion as I thought the "TASKOB>OBSGRP=01" in the traceback was a reference to SYNOP data.

The new Bator approach (USEOBSOUL=0) only processes synop and ship data for surface DA.

However, we should still understand the cause of this problem. I will continue to have a look at this when I can.

Eoin
Last Edit: 2 years 3 weeks ago by Eoin Whelan.

Crashes in CANARI 2 years 2 weeks ago #1996

  • Ulf Andrae
  • Ulf Andrae's Avatar
  • OFFLINE
  • Administrator
  • Posts: 283
  • Thank you received: 30
It crashes again if I switch on SHIP.

Ulf

Crashes in CANARI 2 years 2 weeks ago #1997

  • Eoin Whelan
  • Eoin Whelan's Avatar
  • OFFLINE
  • Gold Boarder
  • Posts: 195
  • Thank you received: 33
Thanks Ulf.

This narrows the search for the problematic BUFR / ODB entry.

Eoin

Crashes in CANARI 2 years 1 week ago #2005

  • Eoin Whelan
  • Eoin Whelan's Avatar
  • OFFLINE
  • Gold Boarder
  • Posts: 195
  • Thank you received: 33
Hi Ulf,

I have been going around in circles a little bit looking for our problematic SHIP observation. Just a short a note to say I haven't forgotten about this issue.

Eoin

Crashes in CANARI 2 years 1 week ago #2006

  • Ulf Andrae
  • Ulf Andrae's Avatar
  • OFFLINE
  • Administrator
  • Posts: 283
  • Thank you received: 30
Eoin,

Thanks for your (future) efforts. I have a new date (20170916 18Z) where SHIP causes problems if you are interested. Does it really make sense to include SHIP observations in a surface analysis with the main aim to update TG{1,2} and WG{1,2} to reduce forecast errors?

Ulf

Crashes in CANARI 2 years 1 week ago #2007

  • Eoin Whelan
  • Eoin Whelan's Avatar
  • OFFLINE
  • Gold Boarder
  • Posts: 195
  • Thank you received: 33
Hi Ulf,

there is no need to include SHIP observations. However, my concern is that the problem with these SHIP observations may appear with a SYNOP observation at some stage in the future.

We need to identify and understand the problem.

Eoin

Crashes in CANARI 2 years 1 week ago #2008

  • Ulf Andrae
  • Ulf Andrae's Avatar
  • OFFLINE
  • Administrator
  • Posts: 283
  • Thank you received: 30
Fair enough. Would be interesting to know if the problem also occurs in the BATOR only setup or if it's a Oulan dependent feature?

Ulf

Crashes in CANARI 1 year 7 months ago #2076

  • Ulf Andrae
  • Ulf Andrae's Avatar
  • OFFLINE
  • Administrator
  • Posts: 283
  • Thank you received: 30
The problem is still present in 40h111rc1 which is perhaps not a surprice since there has not been any real changes concerning this as far as I see.

Any ideas?

Ulf
Time to create page: 0.084 seconds