[srslte-users] srsUE Segmentation fault after long runtimes?
Ismael Gomez
ismael.gomez at softwareradiosystems.com
Wed Dec 14 17:21:28 UTC 2016
I guess it happens at full rate also. It's probably the same buffer
overflow reason. I think that it might be possible to disable completely
the reset_ul(). I created a new branch called test_lock. Can you please
pull that and test if that solves the issue? Thanks
On Wed, 14 Dec 2016 at 18:13 Patrick Cutno <PCutno at girdsystems.com> wrote:
> Speaking of... it just happed again.
>
> I ran a 'bt' and a 'bt full', it looks like it might be in a different
> place.
>
>
> (gdb) bt
> #0 __lll_unlock_elision (lock=0x10a8ae0, private=0)
> at ../sysdeps/unix/sysv/linux/x86/elision-unlock.c:29
> #1 0x00000000004aae1a in srsue::phch_common::worker_end(unsigned int,
> bool, floatcomplex *, unsigned int, srslte_timestamp_t) ()
> #2 0x00000000004a5157 in srsue::phch_worker::work_imp() ()
> #3 0x00000000004d3d51 in srslte::thread_pool::worker::run_thread() [clone
> .localalias.78] ()
> #4 0x00000000004738e9 in thread::thread_function_entry(void*) ()
> #5 0x00007ffff7bc16fa in start_thread (arg=0x7fffd51ab700)
> at pthread_create.c:333
> #6 0x00007ffff39f0b5d in clone ()
> at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
>
> (gdb) bt full
> #0 __lll_unlock_elision (lock=0x10a8ae0, private=0)
>
> at ../sysdeps/unix/sysv/linux/x86/elision-unlock.c:29
> No locals.
> #1 0x00000000004aae1a in srsue::phch_common::worker_end(unsigned int,
> bool, floatcomplex *, unsigned int, srslte_timestamp_t) ()
>
> No symbol table info available.
> #2 0x00000000004a5157 in srsue::phch_worker::work_imp() ()
>
> No symbol table info available.
> #3 0x00000000004d3d51 in srslte::thread_pool::worker::run_thread() [clone
> .localalias.78] ()
>
> No symbol table info available.
> #4 0x00000000004738e9 in thread::thread_function_entry(void*) ()
>
> No symbol table info available.
> #5 0x00007ffff7bc16fa in start_thread (arg=0x7fffd51ab700)
>
> at pthread_create.c:333
> __res = <optimized out>
> pd = 0x7fffd51ab700
> now = <optimized out>
> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140736768685824,
> -7000932799496599927, 1, 140737488346303, 140736768686528,
> 140737093259776, 7000874771778881161,
> 7000950959935702665},
> mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0},
> data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
> ---Type <return> to continue, or q <return> to quit---
> not_first_call = <optimized out>
> pagesize_m1 = <optimized out>
> sp = <optimized out>
> freesize = <optimized out>
> __PRETTY_FUNCTION__ = "start_thread"
> #6 0x00007ffff39f0b5d in clone ()
>
> at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> No locals.
> (gdb)
>
> ------------------------------
> *From:* Patrick Cutno
> *Sent:* Wednesday, December 14, 2016 12:08 PM
>
> *To:* Ismael Gomez; srslte-users at lists.softwareradiosystems.com
> *Subject:* RE: [srslte-users] srsUE Segmentation fault after long
> runtimes?
>
> Apologies, I forgot to keep a copy of the backtrace. If I run into it
> again, I reply with the full backtrace.
>
>
>
> *From:* Ismael Gomez [mailto:ismael.gomez at softwareradiosystems.com]
> *Sent:* Wednesday, December 14, 2016 12:03 PM
> *To:* Patrick Cutno <PCutno at girdsystems.com>;
> srslte-users at lists.softwareradiosystems.com
> *Subject:* Re: [srslte-users] srsUE Segmentation fault after long
> runtimes?
>
>
>
> Do you know if in the same place?
>
>
>
> On Wed, 14 Dec 2016 at 17:47 Patrick Cutno <PCutno at girdsystems.com> wrote:
>
> I may have spoken too soon. While not as frequent, I occasionally get the
> elision lock segfault.
>
>
>
> *From:* Patrick Cutno
> *Sent:* Tuesday, December 13, 2016 9:41 AM
> *To:* Ismael Gomez <ismael.gomez at softwareradiosystems.com>;
> srslte-users at lists.softwareradiosystems.com
>
>
> *Subject:* RE: [srslte-users] srsUE Segmentation fault after long
> runtimes?
>
>
>
> I have also ran a few tests with no issues with the elision-unlock.c or
> ue_dl.c.
>
> Thanks a lot!
> Patrick
> ------------------------------
>
> *From:* Patrick Cutno
> *Sent:* Friday, December 09, 2016 2:07 PM
> *To:* Ismael Gomez; srslte-users at lists.softwareradiosystems.com
> *Subject:* RE: [srslte-users] srsUE Segmentation fault after long
> runtimes?
>
> Ok, I managed to run a few 5 minute tests and a 1 hour test without the
> segfault caused by 'ue_dl.c'.
>
> However, I am running into a new segfault caused by elision-unlock.c on
> the linux system where it seems like srsUE is attempting to unlock
> something thats not locked? A quick search on Google showed that some
> people could resolve the issue with using different versions of libc6 or
> glibc, some say its a processor compatibility issue (I am using a core
> i7-6770HQ if that matters). I occasionally got this segfault before
> switching to the 'next' branch as well but it was overshadowed by the
> number of times the ue_dl.c segfault happened.
>
> Is this an issue with my computer, srsUE, or both? Any thoughts on the
> matter? I pasted my backtrace with the segfault below.
>
> Thanks again
> Patrick
>
> .
> .
> .
> RRC Connection released.
> Random Access Transmission: seq=1, ra-rnti=0xa
> Random Access Complete. c-rnti=0x4d, ta=10
> RRC Connected
> Sync error.
>
> Thread 22 "ue" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffd67fc700 (LWP 3716)]
> __lll_unlock_elision (lock=0x801ca8, private=0)
> at ../sysdeps/unix/sysv/linux/x86/elision-unlock.c:29
> 1481306531:297860 29 ../sysdeps/unix/sysv/linux/x86/elision-unlock.c:
> No such file or directory.
> (gdb) bt full
> #0 __lll_unlock_elision (lock=0x801ca8, private=0)
> at ../sysdeps/unix/sysv/linux/x86/elision-unlock.c:29
> No locals.
> #1 0x00000000004abbdf in srsue::phch_common::reset_ul() ()
> No symbol table info available.
> #2 0x00000000004a9634 in srsue::phch_recv::run_thread() ()
> No symbol table info available.
> #3 0x000000000045e159 in thread::thread_function_entry(void*) ()
> No symbol table info available.
> #4 0x00007ffff7bc16fa in start_thread (arg=0x7fffd67fc700)
> at pthread_create.c:333
> __res = <optimized out>
> pd = 0x7fffd67fc700
> now = <optimized out>
> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140736792086272,
> 1035420685597574661, 1, 140737488346367, 140736792086976,
> 140737201365504, -1035509743916775931,
> -1035437757262428667},
> mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0},
> data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
> not_first_call = <optimized out>
> pagesize_m1 = <optimized out>
> sp = <optimized out>
> freesize = <optimized out>
> ---Type <return> to continue, or q <return> to quit---
> __PRETTY_FUNCTION__ = "start_thread"
> #5 0x00007ffff3c36b5d in clone ()
> at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> No locals.
> (gdb)
> ------------------------------
>
> *From:* Patrick Cutno
> *Sent:* Friday, December 09, 2016 8:51 AM
> *To:* Ismael Gomez; srslte-users at lists.softwareradiosystems.com
> *Subject:* RE: [srslte-users] srsUE Segmentation fault after long
> runtimes?
>
> Wow, thanks for the fast and thorough response! I will do my best to try
> it out today or early next week.
>
>
>
> Patrick
>
>
>
> *From:* Ismael Gomez [mailto:ismael.gomez at softwareradiosystems.com
> <ismael.gomez at softwareradiosystems.com>]
> *Sent:* Friday, December 09, 2016 5:13 AM
> *To:* Patrick Cutno <PCutno at girdsystems.com>;
> srslte-users at lists.softwareradiosystems.com
> *Subject:* Re: [srslte-users] srsUE Segmentation fault after long
> runtimes?
>
>
>
> Hi Patrick,
>
>
>
> Thanks for testing srsUE. The messages: "Invalid Frequency Hopping
> parameters" are just warning messages for us but are not really important.
> We'll probably eliminate them in future releases. During such a long run,
> it is likely that the UE find a DCI message in the PDCCH for which the CRC
> matches the C-RNTI by coincidence. Very likely the grant will be invalid
> and that's the message the UE is printing.
>
>
>
> The segfault could be because an incorrectly decoded CFI or some other bug
> in some part of the code. I've added a check in the function to skip
> decoding if the CFI is not valid. Just committed it to next branch in
> srsLTE. Can you check again when you got a chance?
>
>
>
> Thanks again for using and testing srsUE.
>
>
>
> Best regards,
>
> Ismael
>
>
>
> On Wed, 7 Dec 2016 at 16:02 Patrick Cutno <PCutno at girdsystems.com> wrote:
>
> Hello world,
>
>
>
> I’m new to srsLTE and this mailing list type forum (please bear with me if
> I do or say something silly).
>
>
>
> I am currently trying to perform long iperf3 tests to measure bandwidth
> between a b210 with srsLTE UE and a PicoLTE with Amarisoft. Every now and
> again, the ue side of my system will segfault and it seems to occur
> randomly to me. Sometime I see the segfault after 30 mins. and other times,
> I can run the system for 10 hours without a problem. (Nothing else is
> running on the computers aside from srsLTE/Amarisoft and iperf3)
>
>
>
> According to gdb, the fault happens in ../srsLTE/srslte/lib/ue/ue_dl.c:399
> ‘current_ss->format = SRSLTE_DCI_FORMAT0;’. In gdb, when I try to print
> current_ss->format, it reports the memory is not accessible. I have pasted
> my back trace below if anyone could potentially give me some insight of why
> this randomly happens and how to fix it? If you need any other info, just
> let me know.
>
>
>
> Thanks
>
> Patrick
>
>
>
> Starting program: /home/gird/srsUE/build/ue/src/ue ue_custom_1_4.conf
>
> [Thread debugging using libthread_db enabled]
>
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
>
> linux; GNU C++ version 5.4.0 20160609; Boost_105800;
> UHD_003.010.001.000-release
>
>
>
> [New Thread 0x7fffe03ec700 (LWP 5395)]
>
> [New Thread 0x7fffdfbeb700 (LWP 5396)]
>
> --- Software Radio Systems LTE UE ---
>
>
>
> Reading configuration file ue_custom_1_4.conf...
>
> Using srsLTE version 001.004.000
>
> [New Thread 0x7fffddaed700 (LWP 5397)]
>
> [New Thread 0x7fffdd2ec700 (LWP 5398)]
>
> [New Thread 0x7fffdcaeb700 (LWP 5399)]
>
> [Thread 0x7fffdcaeb700 (LWP 5399) exited]
>
> [Thread 0x7fffdd2ec700 (LWP 5398) exited]
>
> [New Thread 0x7fffdd2ec700 (LWP 5400)]
>
> [New Thread 0x7fffdcaeb700 (LWP 5401)]
>
> [Thread 0x7fffdcaeb700 (LWP 5401) exited]
>
> [Thread 0x7fffdd2ec700 (LWP 5400) exited]
>
> [New Thread 0x7fffdd2ec700 (LWP 5402)]
>
> [New Thread 0x7fffdcaeb700 (LWP 5403)]
>
> [Thread 0x7fffdcaeb700 (LWP 5403) exited]
>
> [Thread 0x7fffdd2ec700 (LWP 5402) exited]
>
> [New Thread 0x7fffdd2ec700 (LWP 5404)]
>
> [New Thread 0x7fffdcaeb700 (LWP 5405)]
>
> [Thread 0x7fffdcaeb700 (LWP 5405) exited]
>
> [Thread 0x7fffdd2ec700 (LWP 5404) exited]
>
> Opening USRP with args: type=b200
>
> [New Thread 0x7fffdd2ec700 (LWP 5406)]
>
> [New Thread 0x7fffdcaeb700 (LWP 5407)]
>
> [Thread 0x7fffdcaeb700 (LWP 5407) exited]
>
> [Thread 0x7fffdd2ec700 (LWP 5406) exited]
>
> [New Thread 0x7fffdd2ec700 (LWP 5408)]
>
> [New Thread 0x7fffdcaeb700 (LWP 5409)]
>
> [Thread 0x7fffdcaeb700 (LWP 5409) exited]
>
> [Thread 0x7fffdd2ec700 (LWP 5408) exited]
>
> [New Thread 0x7fffdd2ec700 (LWP 5410)]
>
> [New Thread 0x7fffdcaeb700 (LWP 5411)]
>
> -- Detected Device: B210
>
> -- Operating over USB 3.
>
> [New Thread 0x7fffd7fff700 (LWP 5412)]
>
> -- Initialize CODEC control...
>
> -- Initialize Radio control...
>
> -- Performing register loopback test... pass
>
> -- Performing register loopback test... pass
>
> -- Performing CODEC loopback test... pass
>
> -- Performing CODEC loopback test... pass
>
> -- Setting master clock rate selection to 'automatic'.
>
> -- Asking for clock rate 16.000000 MHz...
>
> -- Actually got clock rate 16.000000 MHz.
>
> -- Performing timer loopback test... pass
>
> -- Performing timer loopback test... pass
>
> -- Asking for clock rate 32.000000 MHz...
>
> -- Actually got clock rate 32.000000 MHz.
>
> -- Performing timer loopback test... pass
>
> -- Performing timer loopback test... pass
>
> [New Thread 0x7fffd77fe700 (LWP 5413)]
>
> [New Thread 0x7fffd6ffd700 (LWP 5414)]
>
> [New Thread 0x7fffd67fc700 (LWP 5415)]
>
> Setting frequency: DL=375.0 Mhz, UL=325.0 MHz
>
> [New Thread 0x7fffd5693700 (LWP 5416)]
>
> [New Thread 0x7fffd4e92700 (LWP 5417)]
>
> [New Thread 0x7fffcbfff700 (LWP 5418)]
>
> Searching for cell...
>
> Found CELL ID: 1 CP: Normal , CFO: 0.3 KHz.
>
> Trying to decode MIB...
>
> - Cell ID: 1
>
> - Nof ports: 1
>
> - CP: Normal
>
> - PRB: 6
>
> - PHICH Length: Normal
>
> - PHICH Resources: 1
>
> - SFN: 0
>
> MIB received BW=1.4 MHz
>
> [New Thread 0x7fffcb7fe700 (LWP 5419)]
>
> Initializating cell configuration...
>
> Setting Sampling frequency 1.92 MHz
>
> Setting TX/RX offset 54 samples, 28.12 us
>
> SIB1 received, CellID=257, PLMN Id: MCC 1 MNC 1
>
> SIB2 received
>
> [Thread 0x7fffcb7fe700 (LWP 5419) exited]
>
> Random Access Transmission: seq=1, ra-rnti=10
>
> Random Access Complete. c-rnti=257, ta=10
>
> RRC Connected
>
> Network attach successful. IP: 192.168.2.2
>
> [New Thread 0x7fffcaffd700 (LWP 5421)]
>
> RRC Connection released.
>
> Random Access Transmission: seq=1, ra-rnti=10
>
> Random Access Complete. c-rnti=258, ta=9
>
> RRC Connected
>
> RRC Connection released.
>
> Random Access Transmission: seq=1, ra-rnti=10
>
> Random Access Complete. c-rnti=259, ta=9
>
> RRC Connected
>
> Invalid Frequency Hopping parameters. Offset: 2, n_prb_1: 0
>
> Invalid Frequency Hopping parameters. Offset: 2, n_prb_1: 0
>
> Invalid Frequency Hopping parameters. Offset: 2, n_prb_1: 0
>
> RRC Connection released.
>
>
>
> Thread 21 "ue" received signal SIGSEGV, Segmentation fault.
>
> [Switching to Thread 0x7fffd6ffd700 (LWP 5414)]
>
> srslte_ue_dl_find_ul_dci (q=0x7ffff7f47240, cfi=0, sf_idx=<optimized out>,
>
> rnti=<optimized out>, dci_msg=0x7fffd6ffc8a0)
>
> at /home/gird/srsLTE/srslte/lib/ue/ue_dl.c:399
>
> 399 current_ss->format = SRSLTE_DCI_FORMAT0;
>
> (gdb) bt full
>
> #0 srslte_ue_dl_find_ul_dci (q=0x7ffff7f47240, cfi=0, sf_idx=<optimized
> out>,
>
> rnti=<optimized out>, dci_msg=0x7fffd6ffc8a0)
>
> at /home/gird/srsLTE/srslte/lib/ue/ue_dl.c:399
>
> search_space = {format = 3607086992, loc = {{L = 32767,
>
> ncce = 3607086832}, {L = 32767, ncce = 4764800}, {L = 0,
>
> ncce = 4007973720}, {L = 32767, ncce = 3607086992}, {L =
> 32767,
>
> ncce = 3607086848}, {L = 32767, ncce = 4764800}, {L = 0,
>
> ncce = 4979944}, {L = 0, ncce = 4294967295}, {L =
> 4294967295,
>
> ncce = 3099525120}, {L = 32767, ncce = 3607086824}, {L =
> 32767,
>
> ncce = 3607086864}, {L = 32767, ncce = 42}, {L = 0, ncce =
> 53}, {
>
> L = 0, ncce = 2952822816}, {L = 32767, ncce = 42}, {L = 0,
>
> ncce = 42}, {L = 0, ncce = 3220809265}, {L = 1041867344,
>
> ncce = 2952818992}, {L = 32767, ncce = 53}, {L = 0, ncce =
> 53}, {
>
> L = 0, ncce = 4058310079}, {L = 32767, ncce = 1686670400}},
>
> nof_locations = 3214023806}
>
> current_ss = 0x872ff7f54a54
>
> #1 0x00000000004b003c in
> srsue::phch_worker::decode_pdcch_ul(srsue::mac_interface_phy::mac_grant_t*)
> ()
>
> No symbol table info available.
>
> #2 0x00000000004b591a in srsue::phch_worker::work_imp() ()
>
> No symbol table info available.
>
> #3 0x00000000004e5031 in srslte::thread_pool::worker::run_thread() [clone
> .localalias.78] ()
>
> ---Type <return> to continue, or q <return> to quit---
>
> No symbol table info available.
>
> #4 0x000000000046bdc9 in thread::thread_function_entry(void*) ()
>
> No symbol table info available.
>
> #5 0x00007ffff7bc16fa in start_thread (arg=0x7fffd6ffd700)
>
> at pthread_create.c:333
>
> __res = <optimized out>
>
> pd = 0x7fffd6ffd700
>
> now = <optimized out>
>
> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140736800478976,
>
> 7185695482018004286, 1, 140737488346351, 140736800479680,
>
> 140737201361408, -7185746060066685634,
> -7185677373655352002},
>
> mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0},
>
> data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
>
> not_first_call = <optimized out>
>
> pagesize_m1 = <optimized out>
>
> sp = <optimized out>
>
> freesize = <optimized out>
>
> __PRETTY_FUNCTION__ = "start_thread"
>
> #6 0x00007ffff35fab5d in clone ()
>
> at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> No locals.
>
> (gdb)
>
> _______________________________________________
> srslte-users mailing list
> srslte-users at lists.softwareradiosystems.com
> http://www.softwareradiosystems.com/mailman/listinfo/srslte-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.srsran.com/pipermail/srsran-users/attachments/20161214/07392a44/attachment.htm>
More information about the srsran-users
mailing list