ASTERISK-13945: chan

[Home]

Summary: ASTERISK-13945: chan_sip deadlock

Reporter: Alan Graham (zerohalo) Labels:

Date Opened: 2009-04-13 10:26:23 Date Closed: 2011-06-07 14:02:58

Priority: Major Regression? No

Status: Closed/Complete Components: Channels/chan_sip/General

Versions: Frequency of
Occurrence

Related
Issues:

Environment: Attachments: ( 0) bt_scrubbed.txt
( 1) csl.txt

Description: output of 'core show locks' and bt attached

Comments: By: Joshua C. Colp (jcolp) 2009-04-22 08:57:29

After looking at this core show locks and backtrace I see two things:

The devicestate code is doing a DNS lookup on the name of a peer. This should only happen if the peer doesn't exist and it should also not take that long. Is your configuration proper with that peer configured and is your server setup with working/reliable nameservers?

Secondly the scheduler looks to be rescheduling things, which is perfectly normal and should not cause a deadlock.

What exactly did you experience that made you consider this a deadlock? What couldn't you do? What were you doing? What sort of call capacity does the machine run? How many configured entries in sip.conf?
By: Alan Graham (zerohalo) 2009-04-27 13:45:17

file - my apologies... I seem to be juggling too many things and not paying enough attention to what I'm posting.

This is a production server using multiple, reliable nameservers that are monitored for any failures or response-time under 1 second.

There is about a thousand active peers, all of which are accounted for in the realtime (cached) database. There are many, many more inactive (at least ~10k) peers, not active here. Average call load for this server is around 30-50 concurrent calls.

When this occured, peers were not able to register, nor are calls able to complete.

I'm pretty sure I somehow posted this as a duplicate to the info I posted in http://bugs.digium.com/view.php?id=14918,
By: Alan Graham (zerohalo) 2009-04-27 14:14:42

file-

I think I may have found out why this was happening per the notes in 14918, mysql queries possibly failing because of load issues due to reloading sip, though we're using realtime for peers.

This might explain the apparent dns failure.
By: Joshua C. Colp (jcolp) 2009-04-27 14:20:31

zerohalo: I bet that is indeed your problem. Queries are failing which are causing things to fall back to DNS lookups, and since they are names the DNS lookups are taking some time to fail. All of this together causes chan_sip to lock up.
By: Alan Graham (zerohalo) 2009-05-04 07:05:02

file: this appears to be resolved, though I think it probably warrants a new bug for the reload issue. You can close this out - thanks - I'll open a new bug report as soon as I can provide details on the mysql / chan_sip reload bug.
By: Joshua C. Colp (jcolp) 2009-05-04 10:52:26

Closed per reporter.