[Home]

Summary:ASTERISK-18025: CPU spikes when using timerfd timing
Reporter:Jacob LIpstein (jacobli)Labels:
Date Opened:2011-06-15 14:15:36Date Closed:2011-06-17 10:49:13
Priority:MajorRegression?
Status:Closed/CompleteComponents:Resources/res_timing_timerfd
Versions:1.8.4 Frequency of
Occurrence
Related
Issues:
Environment:SuSE 11.2Attachments:
Description:
Several days ago I opened an Improvement issue ASTERISK-17996 "timinng dahdi priority":
-------------------
At all of asterisks we are running we were using dahdi timing. But at some point after linux kernel upgrade the timerfd became available and asterisk started using timerfd timing instead dahdi. After timerfd took priority over dahdi we started noticing CPU usage spikes and even asterisk froze completely 1-2 times in a week. It took more than two months for us to figure out what is a root of problem. So I'd suggest to set dahdi .priority (is 100 now) to something higher than 200 (current priority of timerfd timing) in order to force asterisk to use dahdi timing by default (if it is available). I'm pretty sure it affects any asterisk 1.8 version.
-------------------

The issue was closed with the following comment:
-------------------
After speaking with the developers, no change is required here. If there are bugs with res_timing_timerfd, that is not a reason to make it less of a priority. The priority is setup like so because res_timing_dahdi is less efficient than res_timing_timerfd.
-------------------

I'm pretty sure that developers are right claiming that timing_timerfd is more efficient then timing_dahdi. But in our particular configuration timing_dahdi works several years without any problem and switching to timing_timerfd caused the problem mentioned above.
More details on what sometimes happens when using timing_timerfd:
We are running several asterisks connected via IAX trunking. Let say we have a conference call (meetme application) at asterisk A. Some call to the conference comes from asterisk B. When the respondent calling to the conference from asterisk B hangs up it could happen that the meetme application is not finished for him what follows with "Unable to write frame ..." records in log file and causes CPU spikes. More over if try to make an additional call to this particular conference then CPU usage jumps to 100% and the computer practically freezes.      


Jun  8 08:15:08 pbx01 asterisk[15909]: DEBUG[15941]: chan_iax2.c:10631 in socket_process: Immediately destroying 3789, having received hangup
Jun  8 08:15:08 pbx01 asterisk[15909]: WARNING[3565]: app_meetme.c:3650 in conf_run: Unable to write frame to channel IAX2/exadellink-3789
Jun  8 08:15:08 pbx01 asterisk[15909]: WARNING[3565]: app_meetme.c:3650 in conf_run: Unable to write frame to channel IAX2/exadellink-3789
Jun  8 08:15:08 pbx01 asterisk[15909]: WARNING[3565]: app_meetme.c:3650 in conf_run: Unable to write frame to channel IAX2/exadellink-3789
...
...


In a normal situation a hangup from asterisk B side follows with completion of meetme (which is the 7th priority in our conference macro) at the asterisk A.

Conference macro:

exten => s,1,Ringing
exten => s,n,Wait(2)
exten => s,n,Playback(exadel_conference)
exten => s,n,Wait(1)
exten => s,n,Set(TIMEOUT(absolute)=${CONF_MAX_DU})
exten => s,n,Set(CHANNEL(musicclass)=standard)
exten => s,n,MeetMe(${ARG1},M)

Log example in situation without any problem:

Jun  8 08:40:24 pbx01 asterisk[15909]: DEBUG[15941]: chan_iax2.c:10631 in socket_process: Immediately destroying 5707, having received hangup
Jun  8 08:40:24 pbx01 asterisk[15909]: VERBOSE[4025]: app_macro.c:425 in _macro_exec: [Jun  8 08:40:24]   == Spawn extension (macro-conference, s, 7) exited non-zero on 'IAX2/exadel\
link-5707' in macro 'conference'
Jun  8 08:40:24 pbx01 asterisk[15909]: DEBUG[4025]: pbx.c:4755 in __ast_pbx_run: Spawn extension (eltegra-ext,1587,1) exited non-zero on 'IAX2/exadellink-5707'
Jun  8 08:40:24 pbx01 asterisk[15909]: VERBOSE[4025]: pbx.c:4756 in __ast_pbx_run: [Jun  8 08:40:24]   == Spawn extension (eltegra-ext, 1587, 1) exited non-zero on 'IAX2/exadellink-\
5707'
Jun  8 08:40:24 pbx01 asterisk[15909]: DEBUG[4025]: channel.c:2630 in ast_softhangup_nolock: Soft-Hanging up channel 'IAX2/exadellink-5707'                                          
Jun  8 08:40:24 pbx01 asterisk[15909]: DEBUG[4025]: channel.c:2758 in ast_hangup: Hanging up channel 'IAX2/exadellink-5707'                                                          
Jun  8 08:40:24 pbx01 asterisk[15909]: DEBUG[4025]: chan_iax2.c:5200 in iax2_hangup: We're hanging up IAX2/exadellink-5707 now...
Jun  8 08:40:24 pbx01 asterisk[15909]: DEBUG[4025]: chan_iax2.c:5217 in iax2_hangup: Really destroying IAX2/exadellink-5707 now...
Jun  8 08:40:24 pbx01 asterisk[15909]: DEBUG[4025]: chan_iax2.c:2714 in sched_delay_remove: schedule decrement of callno used for 93.191.103.5 in 60 seconds
Jun  8 08:40:24 pbx01 asterisk[15909]: VERBOSE[4025]: chan_iax2.c:5233 in iax2_hangup: [Jun  8 08:40:24]     -- Hungup 'IAX2/exadellink-5707'


We have a lot of conference calls at our asterisk but never run into the problem described above when using timing_dahdi but with timerfd timing run into it 3-5 times in a week. When we were lucky and noticed the problem before the computer was finally frozen we were able to restore asterisk functionality by destroying the conference using asterisk command "meetme kick <meetme number> all".
Comments:By: Leif Madsen (lmadsen) 2011-06-17 10:49:13.830-0500

I understand there was/is an issue with res_timing_timerfd on your system (although if you use the 1.8 branch directly I know quite a bit of work has been done recently for timing issues with that module).

However, your request to have the priorities changed has already been rejected.

By: Jacob LIpstein (jacobli) 2011-06-17 11:36:17.644-0500

1. I did not require to change priority: it was my suggestion only. And reason was very clear: my experience shows that using timerfd in versions up to 1.8.4.1 can badly affect asterisk.
2. We are not experience any problem with our asterisks now because we have switched back to dahdi timing.
3. I hope you have closed the issue knowing that the timerfd timing is fixed or is supposed to be fixed soon.
4. We have no plans to switch to timerfd timing (last switch happened accidentally) and ensure it works without issues.