Summary: | ASTERISK-25103: Roundup - investigate Asterisk DTLS crashes | ||||||||||||||
Reporter: | Rusty Newton (rnewton) | Labels: | |||||||||||||
Date Opened: | 2015-05-19 15:25:11 | Date Closed: | 2015-07-07 14:56:59 | ||||||||||||
Priority: | Major | Regression? | |||||||||||||
Status: | Closed/Complete | Components: | Resources/res_rtp_asterisk | ||||||||||||
Versions: | Frequency of Occurrence | ||||||||||||||
Related Issues: |
| ||||||||||||||
Environment: | Asterisk 11, 13, Master | Attachments: | |||||||||||||
Description: | A issue for an investigation into the various DTLS crashes currently hanging about.
I'll link the issues currently on the tracker to this issue rather than linking them all to each other. | ||||||||||||||
Comments: | By: Thomas Guebels (tguescaux) 2015-05-20 03:10:26.263-0500 Hi, Shouldn't you also have a look at ASTERISK-24651 ? The patch proposed there, while not pretty, fixed most of the crashes I had during DTLS handshakes (see my comment 224900). By: Rusty Newton (rnewton) 2015-05-28 19:23:21.574-0500 Linked! By: Stefan Engström (StefanEng86) 2015-06-28 07:00:46.241-0500 I still think the unsynchronized call to SSL_do_handshake(dtls->ssl); on a ssl object on which SSL_set_accept_state(ssl) has been called is the main problem. When we have initialized the dtls->ssl to act as server, all the code needed to progress and finish the handshake is essentially already in __rtp_recvfrom. That function will at some point receive a client handshake message such as 'client hello', then write it to the input buffer (BIO_write(dtls->read_bio, buf, len)) and call SSL_read to make openssl progress the handshake (write server hello to output buffer). When we are dtls-clients it is appropriate to call SSL_do_handshake after ice completion because that kicks off the handshake by producing a client hello so this call cannot race with the ssl-processing within __rtp_recvfrom. I do not understand all the possible call-flows within asterisk so i created a test program at http://pastebin.com/8TGqN10j based off the example at http://www.roxlu.com/2014/042/using-openssl-with-memory-bios Indeed my program crashes half the time when SSL_do_handshake and other SSL related calls race. I noted that in roxlu's example they call SSL_do_handshake within the krx_ssl_handle_traffic function instead of our SSL_read; this seems to be two alternative ways of progressing the handshake (i.e. both those calls makes ssl write server hello to output buffer given that we have written client hello to input buffer), but this is different from our call to SSL_do_handshake, i believe one should only call SSL_do_handshake (or SSL_read) as server whenever there's a possible change in the BIO buffers, not from a asynchronously callback. renegotiation is a different story... I might have missed something as I base my conclusions on empirical tests on a very narrow use case as well as looking at unofficial documentation/guides on google for openssl :) By: Joshua C. Colp (jcolp) 2015-07-06 06:00:00.787-0500 A change is now up for review at the following addresses for a fix to this problem. While our code review process is pretty fast these days if anyone would like to test the change and provide feedback on this issue it would be welcome: 11: https://gerrit.asterisk.org/#/c/786/ 13: https://gerrit.asterisk.org/#/c/787/ master: https://gerrit.asterisk.org/#/c/788/ The patch can be downloaded by clicking the "Download" dropdown and selecting the method you wish. By: Stefan Engström (StefanEng86) 2015-07-07 09:11:00.866-0500 I tested applying patchsets 1-3 from 788 to a 13.1.0 installation and then ran some webrtc-calling tests using calling robots. It seemed to work fine for 1000+ calls (within 4 hours). By: Joshua C. Colp (jcolp) 2015-07-07 10:59:34.648-0500 It's probably the ICE negotiation that's slowing things down, nothing really to be done there except to implement trickle ICE which is a SUBSTANTIAL amount of work. The DTLS negotiation also takes some time. Just the way it is. By: Joshua C. Colp (jcolp) 2015-07-07 11:59:19.459-0500 DTLS is all OpenSSL territory, we start the negotiation as soon as we can. By: JoshE (n8ideas) 2015-07-07 12:36:34.653-0500 @Dade- would you be willing to share the hacks for ICE that you've done? We have almost the exact list on our end as well, especially around reducing candidate generation for Internet-facing multi-NIC setups, and it might be nice to try to get some of this standardized and into trunk, where appropriate. On the core issue Mr. Colp's patch addresses... it also looks good in our test suite. We were able to repro the crashes easily before and we're clean now. Thanks a bunch for your work on this! |