|ASTERISK-12058: Incorrect usage of errno can cause segfaults in ast_expr2.y
|Ardjan Zwartjes (ardjan)
|( 0) backtrace.txt
|Several of our live asterisk servers experienced crashes after printing the following log line:
Conversion of 1 to number under/overflowed!
Further research led us to the file main/ast_expr2.y where this line is printed on line 578 in the function "to_number". If this error is printed this function frees vp->u.s and returns 0. This function is used quite often without checking for errors, so subsequent usage of the passed struct can (and does) cause segfaults.
Although each unchecked use of to_number could be called a bug this isn't the real problem here (and in fact it is quite understandable). Looking into the code of ast_expr2.y and considering the fact that the value 1 clearly doesn't cause an over- or underflow we came to the conclusion that errno must have been changed by another thread, so in fact it is used in a thread-unsafe manner.
During compilation of asterisk the compiler option -D_REENTRANT is passed, but looking at errno.h we came to the conclusion that it is also necessary to pass the compile option -D_LIBC_REENTRANT to make errno thread safe.
We compiled asterisk with this option and have had no crashes for several weeks, before this change we had a crash every couple of days.
Our live servers are currently running Asterisk 1.2.24 but I checked the current asterisk SVN (revision 117400) trunk and the same problem is possible here. I assume that this problem is present in all versions.
So assuming that this extra compile option indeed fixed this problem I would advise to add -D_LIBC_REENTRANT on every place where -D_REENTRANT is set.
|By: Tilghman Lesher (tilghman) 2008-05-21 11:23:54
On what platform are you using this? On Linux, errno is ALWAYS a thread-specific variable.
By: Ardjan Zwartjes (ardjan) 2008-05-22 02:42:42
We are using a Linux 2.6.18 system, so if what you're saying is true our change shouldn't have made a difference, the funny thing is however that since we've recompiled asterisk with this option our servers are running stable. Before the recompile we experienced crashes every couple of days.
A small test program I just made seems to verify that errno indeed is thread local, but in that case a couple of questions remain:
- Why does adding -D_LIBC_REENTRANT seem to have solved our problems? (Ok, this might be a coincidence, but if so it's a very big one).
- Why does to_number fail on the relatively small value of 1? Since to_number is used without checking for errors this causes serious problems (segfaults every couple of days).
I will run some further tests to see if I can come up with the answers, but any help you can provide is greatly appreciated.
By: Ardjan Zwartjes (ardjan) 2008-05-22 06:43:42
Ok, further testing shows that our initial explanation was wrong. errno is indeed thread local (gdb's "macro expand" shows that errno is (*__errno_location ())). The question why errno is set however still remains. I can't really come up with a plausible scenario that explains our crashes. Any ideas?
By: Tilghman Lesher (tilghman) 2008-05-22 07:50:08
Perhaps you should post a stack backtrace, as detailed in doc/backtrace.txt, and let us examine it instead.
By: Ardjan Zwartjes (ardjan) 2008-05-22 09:54:55
I have attached a backtrace (With some accounts/phone numbers replaced by X's). This is a backtrace from an asterisk based on 1.2.24. Unfortunately we don't have backtraces from a more recent version.
By: Tilghman Lesher (tilghman) 2008-05-22 10:21:44
I'm afraid you're going to have to reproduce this in 1.4, 1.6, or trunk. 1.2 has already been end-of-lifed, and we do not support crash issues here.
I'll keep this open until Monday, if you'd care to reproduce on a supported release; otherwise, I'll have to close this.
By: Ardjan Zwartjes (ardjan) 2008-05-23 01:27:36
I was afraid of that, unfortunately we only run 1.2 on our systems. We have tried 1.4 but it proved to be rather unstable.
I probably can't convince you to look into this case by pointing out that if errno can be set after a call to strtoll in 1.2 it is probably also possible in 1.4 and up?
Anyway, thanks for your time. The crashes seem to have been solved by this compiler flag, so lets just hope that this isn't a coincidence.
By: Joshua C. Colp (jcolp) 2008-07-02 21:41:56
Suspended per Corydon's comment.