[Home]

Summary:ASTERISK-02578: qualify != 'no' in realtime (sipfriends extconfig) make the peer unreachable and sip show peer PEER_NAME endup hanging Asterisk
Reporter:Ousmane Wilane (wilane)Labels:
Date Opened:2004-10-10 09:03:02Date Closed:2004-10-12 22:26:39
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/DatabaseSupport
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) sipfriends.sql
( 1) sipfriends.sql.txt
Description:Having a qualify field in Realtime holding an integer have 2 effects:
- few sip show peer PEER_NAME endup hanging asterisk
- The peer is not reachable
- The status field in sip show peer listing shows UNKNOWN



****** STEPS TO REPRODUCE ******

Add a qualify field in realtime and put an integer then sip show peer few time (less than 10 times from my experience) and you should see asterisk hangs. Moreover, you can't call the peer, puting qualify='no' seems to fix the problem
Comments:By: Olle Johansson (oej) 2004-10-10 10:09:43

Hmmm. Don't know the details of the realtime config stuff, but if we have a similar setup as the old mysqlfriends, mailbox= and qualif= is not supported. It really should be ignored in the config.

By: Brian West (bkw918) 2004-10-10 11:22:54

Realtime peers never hang around.  They will never show up in sip show peer.

By: Brian West (bkw918) 2004-10-10 11:27:27

accually they will show up in sip show peer blah ... but qualify shouldn't be used on a realtime peer and I think should be totally ignored.  You can't qualify sipfriends because they aren't hanging around.   Also MWI doesn't work with sipfriends.

By: twisted (twisted) 2004-10-10 11:30:58

I somewhat agree, and somewhat disagree bkw.  qualify should ALWAYS be able to be used, as the peer may register with a 3600 timeout, and if they become unreachable during that timeout, we need to be able to recognize it.

By: Mark Spencer (markster) 2004-10-10 12:25:04

If you don't add a "qualify" column to the table then you won't have this problem, right?  Further, if you're using an integer for qualify, then whatever integer you put is the integer amount that the qualify will be.  Am I missing something?  maybe you need a dump of the table you're using.

By: Olle Johansson (oej) 2004-10-10 12:39:11

sip show peer <peername> doesn't support realtime peers, so we need to find out what hangs asterisk. You should get an error message. Do you have a gdb backtrace for us from a coredump?

We also need to add an error message if someone uses qualify and mailbox in realtime SIP config until we have fixed some kind of support for mailboxes.

If you need nat-keep-alives, set maximum registration time to 30 secs or so, depending on the NAT timeout.

By: Brian West (bkw918) 2004-10-10 12:58:06

twisted,

realtime peers aren't in the internal peer/user struct so they can't be checked because they only live when needed.  Bottom line.  Realtime you give up qualify and MWI.

bkw

By: Mark Spencer (markster) 2004-10-10 16:35:41

Actually sip show peer does work with relatime peers just not "sip show peers"

By: Brian West (bkw918) 2004-10-10 16:37:57

yaya thats what I was thinking :P

By: Mark Spencer (markster) 2004-10-10 16:43:06

i'm unable to duplicate this problem.  Are you sure you're using latest CVS?

By: Brian West (bkw918) 2004-10-10 17:07:54

WAIT WAIT WAIT... I see the problem.  If you have qualify to anything other than NO you can't call the realtime peer... Totally see it now.

By: Brian West (bkw918) 2004-10-10 17:55:07

This doesn't block. It just dosen't allow you to call realtime friends that have qualify on them.

bkw

By: Mark Spencer (markster) 2004-10-10 20:16:38

What are you putting in the "qualify" column?

By: Ousmane Wilane (wilane) 2004-10-10 20:31:42

Yeah I'm using current cvs maybe my build is kinda broken.
The only message I'm seeing (whatsoever anyway) is:
`SIP Seeding PEER_NAME at PEER_NAME@PEER_IP:PER_PORT for 120'

The dump of the table is atached.

And yeah it actualy block and there's something I didn't mention to reproduce the crash (sorry). To crash it you'll need a non realtime peer then:
sip show peer NON_REALTIME_PEER (few times)
will crash it.
If anyone needs an ssh account I'll let him in.

By: Brian West (bkw918) 2004-10-10 21:14:33

Please find kram or bkw_ on irc I can't recreate the this crash you describe with sip show peer XX for non-realtime vs realtime.  You can't call a realtime peer if you have qualify in your table and its set to anything other than no.  I can't recreate it.  

bkw

By: Brian West (bkw918) 2004-10-10 21:46:46

#0  0x4207d6f3 in strcasecmp () from /lib/i686/libc.so.6
#1  0x413bc4ec in find_peer (peer=0x415e37dc "    -- SIP Seeding '501' at 501@192.168.1.104", sin=0x0) at chan_sip.c:1113
#2  0x413bc64f in create_addr (r=0x8161a88, opeer=0xa <Address 0xa out of bounds>) at chan_sip.c:1217
#3  0x413befbd in sip_send_mwi_to_peer (peer=0x811fbd8) at chan_sip.c:7607
#4  0x413b95e2 in do_monitor (data=0x0) at chan_sip.c:7646
ASTERISK-1  0x4003d881 in pthread_detach () from /lib/i686/libpthread.so.0

In addition to this you forgot to mention you have tx/rx_fax and chanspy installed.  These are all unsupported applications that "could" becausing issues.  Please do a CLEAN fresh install.  Remove all .so's in /usr/lib/asterisk/modules

Thanks,
Brian

By: Ousmane Wilane (wilane) 2004-10-10 22:18:16

Cleaned everything up and co fresh vanilla * but the pesky oops still showing.

By: Brian West (bkw918) 2004-10-11 07:26:23

Problem is I can't get mine to do what yours did when I was on your box.  Its very strange.  I can do sip show peer NONREALTIME and sip show peer REALTIME till i'm blue in the face and it won't crash.

bkw

By: Brian West (bkw918) 2004-10-11 07:36:27

what database are you using?

By: Brian West (bkw918) 2004-10-11 08:03:44

installed unixODBC from src or binary?  Database driver from src or binary?

bkw

By: Ousmane Wilane (wilane) 2004-10-11 09:10:56

Yeah you're right, this have to do with my setup so don't waste more time on this. Thanks for your time all.

By: Brian West (bkw918) 2004-10-11 11:10:45

No if it has something to do with your setup we need to find it and make sure its not a bug in some of our code. Db Drivers or such as well.  WE need to know! :)

By: Mark Spencer (markster) 2004-10-11 11:25:37

I'd like to see it run under valgrind.  I have a feeling that there's a real bug lurking in there somewhere.

By: Ousmane Wilane (wilane) 2004-10-11 13:53:11

#0  0x4207d6f3 in strcasecmp () from /lib/i686/libc.so.6
#1  0x411abbdf in build_peer (name=0x80ff21e "501", v=0x8104708)
   at chan_sip.c:8162
#2  0x411bf2a2 in realtime_peer (peername=0x80ff21e "501", sin=0x43)
   at chan_sip.c:1063
#3  0x411bb4e9 in find_peer (peer=0x80ff21e "501", sin=0x0) at chan_sip.c:1132
#4  0x411b6b4d in sip_show_peer (fd=1, argc=4, argv=0xbffff370)
   at chan_sip.c:5660
ASTERISK-1  0x0807f6ca in ast_cli_command (fd=1,
   s=0x786f62 <Address 0x786f62 out of bounds>) at cli.c:1150
ASTERISK-2  0x0809da64 in main (argc=1092445104, argv=0xbffff370) at asterisk.c:705
ASTERISK-3  0x42015967 in __libc_start_main () from /lib/i686/libc.so.6

By: Ousmane Wilane (wilane) 2004-10-11 14:01:12

a break on chan_sip.c:8162 show this just before the crash:
Breakpoint 1, build_peer (name=0x80fb1be "501", v=0x811ee00) at chan_sip.c:8162
8162                    if (!strcasecmp(peer->name, name)) {
(gdb) print name
$3 = 0x80fb1be "501"
(gdb) print peer->name
$4 = "À±\017\b¨\025\020\b\0\0\0\0!\0\0\0Hë\021\bÜ\234\023B\0\0\0\0\021\0\0\0Hë\0
21\bÜ\234\023B0\0\0\0\030\0\0\0\200\025\023\b168.1.104\0\0\0\0\0\0\0±\002\0\0À±\
017\bÜ\234\023B"
(gdb)
$5 = "À±\017\b¨\025\020\b\0\0\0\0!\0\0\0Hë\021\bÜ\234\023B\0\0\0\0\021\0\0\0Hë\0
21\bÜ\234\023B0\0\0\0\030\0\0\0\200\025\023\b168.1.104\0\0\0\0\0\0\0±\002\0\0À±\
017\bÜ\234\023B"
(gdb) cont
Continuing.

Breakpoint 1, build_peer (name=0x80fb1be "501", v=0x811ee00) at chan_sip.c:8162
8162                    if (!strcasecmp(peer->name, name)) {
(gdb) print peer->name
Cannot access memory at address 0x36303500
(gdb)
Cannot access memory at address 0x36303500
(gdb)
Cannot access memory at address 0x36303500
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x4207d6f3 in strcasecmp () from /lib/i686/libc.so.6

By: Ousmane Wilane (wilane) 2004-10-11 14:02:42

If any marshal needs access to the boxen just post it here or send me an email to wilane at arc dot sn

By: Mark Spencer (markster) 2004-10-11 14:09:18

Can you find me on IRC so I can login and take a look?  Thanks!

By: Ousmane Wilane (wilane) 2004-10-11 14:24:20

Sure I'm hanging around. Thanks.

By: Ousmane Wilane (wilane) 2004-10-11 15:42:43

I didn't notice that I have a peer (501) configured REALTIME and NON_REALTIME and choosing it either REALTIME or NON_REALTIME seems to fix it. Sorry.

By: Mark Spencer (markster) 2004-10-11 22:15:29

So are you able to duplicate the segfault?  Even if it was a configuration problem it shouldn't lead to a segfault, so we need to be sure that whatever causes that is gone.  I did fix something yesterday that could have caused it but still I want to be sure it's really gone.

By: Ousmane Wilane (wilane) 2004-10-12 06:39:47

Yeah it still segfaulting with a fresh cvs (current) but things seems to move out. Here is a bt:

(gdb) bt
#0  0x420749db in calloc () from /lib/i686/libc.so.6
#1  0x42073ef1 in malloc () from /lib/i686/libc.so.6
#2  0x410f3cd5 in QR_read_tuple () from /usr/lib/libodbcpsql.so
#3  0x410f3859 in QR_next_tuple () from /usr/lib/libodbcpsql.so
#4  0x410f3387 in QR_fetch_tuples () from /usr/lib/libodbcpsql.so
ASTERISK-1  0x410df911 in CC_send_query () from /usr/lib/libodbcpsql.so
ASTERISK-2  0x410f7422 in SC_execute () from /usr/lib/libodbcpsql.so
ASTERISK-3  0x410e72a7 in PG_SQLExecute () from /usr/lib/libodbcpsql.so
ASTERISK-4  0x410e745f in SQLExecute () from /usr/lib/libodbcpsql.so
ASTERISK-5  0x4108b419 in SQLExecute () from /usr/lib/libodbc.so.1
ASTERISK-6 0x410649bb in realtime_odbc (database=0x8103d18 "sip showì\234\023Br 501",
   table=0xbfffeff0 "sipfriends", ap=0xbffff220 "") at res_config_odbc.c:101
ASTERISK-7 0x08057cf4 in ast_load_realtime (
   family=0x8103d18 "sip showì\234\023Br 501") at config.c:429
ASTERISK-8 0x411bf2a5 in realtime_peer (peername=0x0, sin=0x30352072)
   at chan_sip.c:1058
ASTERISK-9 0x411bb509 in find_peer (peer=0x80ff30e "501", sin=0x0) at chan_sip.c:1132
ASTERISK-10 0x411b6b6d in sip_show_peer (fd=1, argc=4, argv=0xbffff3e0)
   at chan_sip.c:5660
ASTERISK-11 0x0807f6ca in ast_cli_command (fd=1, s=0x8103d18 "sip showì\234\023Br 501")
   at cli.c:1150
ASTERISK-12 0x0809da64 in main (argc=1092445104, argv=0xbffff3e0) at asterisk.c:705
ASTERISK-13 0x42015967 in __libc_start_main () from /lib/i686/libc.so.6

By: Ousmane Wilane (wilane) 2004-10-12 07:34:38

Moved my unixODBC to 2.2.10 it was 2.2.2 and no more segfault. I'm not having the peers registered now but it may be a config error I'm trying to track it down:

Oct 12 12:32:24 WARNING[229389]: res_config_odbc.c:68 realtime_odbc: SQL Alloc Handle failed!

By: Ousmane Wilane (wilane) 2004-10-12 07:49:38

If a peer is configured at the same time as REALTIME and NON_REALTIME (say 501) I see a strange behavior:

- As long as `sip show peer 501' is not issued, the peer is reachable.
- ASA `sip show peer 501' is issued, the peer becomes unreachable and the above error shows up with that peer.

On the PostgreSQL side, the only error I'm seing is:
Oct 12 12:47:21 MyHostName postgres[28216]: [10] LOG:  pq_recvbuf: unexpected EOF on client connection
Oct 12 12:47:23 MyHostName postgres[28263]: [1] LOG:  query: begin; select getdatabaseencoding(); commit

By: Ousmane Wilane (wilane) 2004-10-12 09:01:00

Takes longer to reproduce but still there sorry:

#0  0x4207d6f3 in strcasecmp () from /lib/i686/libc.so.6
#1  0x4119ebdf in build_peer (name=0x80ff7b6 "501", v=0x814e4f8)
   at chan_sip.c:8162
#2  0x411b22c2 in realtime_peer (peername=0x80ff7b6 "501", sin=0xffffffeb)
   at chan_sip.c:1063
#3  0x411ae509 in find_peer (peer=0x80ff7b6 "501", sin=0x0) at chan_sip.c:1132
#4  0x411a9b6d in sip_show_peer (fd=1, argc=4, argv=0xbffff410)
   at chan_sip.c:5660
ASTERISK-1  0x0807f6ca in ast_cli_command (fd=1,
   s=0x786f62 <Address 0x786f62 out of bounds>) at cli.c:1150
ASTERISK-2  0x0809da64 in main (argc=1092391856, argv=0xbffff410) at asterisk.c:705
ASTERISK-3  0x42015967 in __libc_start_main () from /lib/i686/libc.so.6

The segfault usualy happen just before this mesg:

Oct 12 13:59:35 NOTICE[147465]: sched.c:290 ast_sched_del: Attempted to delete non-existant schedule entry 21!

By: Ousmane Wilane (wilane) 2004-10-12 10:45:41

Switched sipfriends mapping to an SQLITE DSN and now the SEGFAULT is easier to reproduce, moreover, the realtime_peers registration failed while they're ok with the same data in a PG DSN.

By: Brian West (bkw918) 2004-10-12 10:54:56

did you compile from src or binary install?  Did you install the PGSql driver from the postgres people?  Or are you using the somewhat broken one thats included with unixODBC?

By: Ousmane Wilane (wilane) 2004-10-12 11:19:15

unixODBC is installed from source.
Using the PG driver in unixODBC. Going to try POstgreSQL peoples one. Didn't know it was broken sorry.

By: Ousmane Wilane (wilane) 2004-10-12 11:39:52

Installed PG drivers from pg guys (psqlodbc) --with-unixodbc:

COMMAND    PID USER  FD   TYPE DEVICE   SIZE   NODE NAME
asterisk 27058 root mem    REG    3,4 800236 362704 /usr/lib/psqlodbc.so
asterisk 27059 root mem    REG    3,4 800236 362704 /usr/lib/psqlodbc.so
asterisk 27061 root mem    REG    3,4 800236 362704 /usr/lib/psqlodbc.so
asterisk 27062 root mem    REG    3,4 800236 362704 /usr/lib/psqlodbc.so
asterisk 27063 root mem    REG    3,4 800236 362704 /usr/lib/psqlodbc.so
asterisk 27064 root mem    REG    3,4 800236 362704 /usr/lib/psqlodbc.so
asterisk 27067 root mem    REG    3,4 800236 362704 /usr/lib/psqlodbc.so
asterisk 27070 root mem    REG    3,4 800236 362704 /usr/lib/psqlodbc.so
asterisk 27073 root mem    REG    3,4 800236 362704 /usr/lib/psqlodbc.so
asterisk 27075 root mem    REG    3,4 800236 362704 /usr/lib/psqlodbc.so
asterisk 27076 root mem    REG    3,4 800236 362704 /usr/lib/psqlodbc.so
asterisk 27077 root mem    REG    3,4 800236 362704 /usr/lib/psqlodbc.so
asterisk 27078 root mem    REG    3,4 800236 362704 /usr/lib/psqlodbc.so

but the pesky oops still shows up while the MsgErr is beter I think:

Oct 12 16:35:59 WARNING[16384]: res_config_odbc.c:143 realtime_odbc: SQL Get Data error!
[SELECT * FROM sipfriends WHERE name=?]

Peer 501 not found.

By: Ousmane Wilane (wilane) 2004-10-12 12:55:47

There's something that also poluate peerl somehow, here is what happen just after * loads:

*CLI> sip show peers
Name/username    Host            Dyn Nat ACL Mask             Port     Status    
ousmane/ousmane  (Unspecified)    D          255.255.255.255  0        UNKNOWN  
501/501          (Unspecified)    D   N      255.255.255.255  0        UNKNOWN  
fwd.pulver.com/  69.90.155.70         N      255.255.255.255  5060     OK (251 ms)
*CLI>

Then I issue `sip show peer 501` and then:
*CLI> sip show peers
Name/username    Host            Dyn Nat ACL Mask             Port     Status    
ousmane/ousmane  (Unspecified)    D          255.255.255.255  0        UNKNOWN  
<

Notice the prompt.

and then:
*CLI> sip show peers
Name/username    Host            Dyn Nat ACL Mask             Port     Status    
ousmane/ousmane  (Unspecified)    D          255.255.255.255  0        UNKNOWN  
faddr->IP    192.168.1.16     D       A  255.255.255.255  5060     Unmonitored
*CLI>

By: Brian West (bkw918) 2004-10-12 18:07:44

I think mark might have fixed this.  Try the latest CVS and we will see.

bkw

By: Mark Spencer (markster) 2004-10-12 18:31:50

Fixed in CVS

By: Ousmane Wilane (wilane) 2004-10-12 18:44:24

Thanks you for your valuable time you've all spent on this, special thanks for  Mark.

By: Russell Bryant (russell) 2004-10-12 22:26:39

doesn't apply to stable