Summary:ASTERISK-15929: chan_sip sends to peer mwi notify for wrong mailbox
Reporter:Jamuel Starkey (jamuel)Labels:
Date Opened:2010-04-08 15:29:31Date Closed:2010-05-07 17:19:40
Versions:Frequency of
Description:A SIP peer can be sent the wrong message count for an MWI subscription.

In 1.6.1 we're seeing an issue where a peer is receiving the message count for another peer's mailbox:

Peer A has messages 10/8 (as verfied by counting messages in filesystem and confirmed by voicemail show users)
Peer B as messages 0/0 (as verfied by counting no messages in filesystem and confirmed by voicemail show users)

on an MWI SUBSCRIBE by Peer B to Peer B's mailbox, * returns Peer A's message count in the NOTIFY to Peer B.

Mailbox polling is turned on in app_voicemail so you were to then go and add a message to Peer B's INBOX via the linux filesystem (e.g. touch /var/spool/asterisk/voicemail/default/PEERB/INBOX/msg0000.txt) then after the mailbox poll interval, Peer B's message count is correctly reflected (1/0) in the subsequent NOTIFY (as well as voicemail show users in the cli).

Next if you delete this message, * correctly sends a NOTIFY reflecting (0/0) to Peer B.

But then eventually after some time has elapsed (not sure if its time-based or some other event triggering yet) Peer B starts getting Peer A's message count again.

In all cases CLI "voicemail show users" shows Peer B's message count correctly (0/0).  This leads me to guess that there is an issue with the MWI caching that chan_sip implement (get_cached_mwi).
Comments:By: Paul Belanger (pabelanger) 2010-04-08 15:53:39

We are going to need debug logs and traces for your issue.

Thank you for taking the time to report this bug and helping to make Asterisk better. Unfortunately, we cannot work on this bug because your description didn't include enough information. You may find it helpful to read "Asterisk Issue Guidelines" http://www.asterisk.org/developers/bug-guidelines. [^] We'd be grateful if you would then provide a more complete description of the problem.

At a minimum, we need:
1. the specific steps or actions you took that caused you to encounter the problem,
2. the behavior you expected, and
3. the behavior you actually encountered (in as much detail as possible).

This likely includes output from the console with debug level logging, a SIP trace (if this is SIP related), and configuration information such as dialplan (e.g. extensions.conf) and channel configuration (e.g. sip.conf).


By: Paul Belanger (pabelanger) 2010-04-08 15:54:19

Have you also tried the patches from ASTERISK-15479?

By: Jamuel Starkey (jamuel) 2010-04-13 00:45:45

Patch from 0016660 had no effect.  Still see chan_sip intermittently sending MWI NOTIFY's with message counts from wrong mailbox.

I've created a script that can parse a pcap of the related SIP NOTIFY messages. The script then logs when NOTIFY's are sent with the wrong message count as compared to actual messages counted on the file system by the script for the same mailbox.

This is the regex and bpf ngrep filter that will capture all NOTIFY's for message-summary:

ngrep -l -q -t -W byline "Voice-Message: [0-9]+" port 5060 > mwi.txt 2>&1 &

I've then gone and patched chan_sip's sip_send_mwi_to_peer method to no longer invoke get_cached_mwi, ignore returning immediately on the cache_only = 1 parameter, and instead to always use ast_app_inboxcount:

Basically comment out the compound if/else if block starting with:
} else { /* Fall back to manually checking the mailbox */

and just leaving:

struct ast_str *mailbox_str = ast_str_alloca(512);
peer_mailboxes_to_str(&mailbox_str, peer);
ast_app_inboxcount(mailbox_str->str, &newmsgs, &oldmsgs);

So that chan_sip always invokes ast_app_inboxcount and never relies upon caching or the EVENT system.

I'll will run in this mode for a day or so and see if my issue goes away and report back.  On a busy machine this issue presents itself at least every hour.  My impression if this does put the issue into bypass then that ought to at least implicate the get_cached_mwi method.

By: Ben Winslow (rain) 2010-04-13 11:13:16

Jamuel: I was seeing this issue before I applied the patch in ASTERISK-1639607. I suspect that if you apply the diff attached to that bug (or upgrade to, which includes that bug fix), your problem will go away.

I do see that, on this bug report, you have listed Asterisk, so maybe this is a different problem.

By: Jamuel Starkey (jamuel) 2010-05-07 16:55:10

Patch in 0016607 works for me.  Please close this ticket.