OK.... so here's a big fat post on what I've been doing so far on this WSJS issue, and a big fat tarball too, with logfiles and other goodies. So grab yerself a cup o' java and prepare to be bored.
First some comments going back to the 07-July posts, just to backfill some gaps in the thread.
On TCP/UDP: Yeah, I'm not sure how I got the wrong idea that WS was UDP-based. It was a few years back that I was originally dorfing around with it and by now have totally forgotten what led me to that wrong conclusion. Maybe I was looking at the link traffic and noticed the mdns UDP stuff and thought it was webservice traffic. I dunno.
In any case, sorry for the confusion on that, but glad that it is in fact TCP. This makes it less likely that the problem is an app-level comm deadlock as I was conjecturing under the assumption of UDP.
About the outsourced webservice code: Yeaaaahhhhhhhh... hooo baybay. It's a shame you guys got sold a bill o' goods on that. I've had my head in it now for a week or so and it is truly gawd-awful. Essentially uncommented, undocumented, syntactically ugly and disorganized, poorly structured... violates just about every style and best-practices coding recommendation ever made. I used to teach internal "C for DSP" classes at Bell Labs many years ago, and sometimes I would try to find real-world code like this to show as "don't do" examples, make people's eyes water. Honestly, it's irritating even just to look through it.
Anyway, for your sake, I hope it was like a Midas Muffler and you didn't pay a lot for it.
A little about how the experiments were performed that led to the attached logs and other detritus:
The "code base" for the experiments is phidgetwebservice-126.96.36.19930618 and libphidget-188.8.131.5230320. I built each on the SBC1 from the tarballs, then made sure they worked (and failed) just like the installed stock versions. Yep.
Then made some minor mods to improve the logging output format, force timestamps under all circumstances, and some other convenience stuff like that. (The stock log output is some silly-ass csv format with no whitespace that gives me a headache after about 10 minutes.)
Then I began instrumenting it by adding pu_log() and DPRINT()s to various points of interest in the code as I slogged along learning (with great difficulty) what the heck the code was doing or trying to do. Honestly, even after more than a week, I still have only a very vague understanding of the thread structure and functionality. It's that hard to follow.
I have made a few benign "actual code" mods too, but just stuff like improving or rewriting the way that some routines handle errors so that more details are available in the log output if such errors occur. (For example, stream_server_accept(): It dutifully checks for about half a dozen error conditions, and then immediately throws what it found on the floor by always returning 0 to the caller. Very useful.)
A few words about execution environment: I noticed that the threads implementation on the SBC1 appears to be based on the older so-called "LinuxThreads" lib (which implements each thread as a distinct process) rather than PosixThreads (which implements threads within a single process). I'm by no means an expert in either one (in fact barely familiar with both) but just to mention a few items: On the plus side, since each SBC1 thread is a process, the threads can be easily observed using the BusyBox ps(1) that's installed on the stock SBC. So that's a major convenience. (This would not have been true for PosixThreads because that BusyBox ps is wuss and doesn't have thread inspection.)
On the minus side -- and this is something I've only read about, not something I have any direct experience with -- it seeems that LinuxTheads have some peculiarities with regard to thread creation and teardown. I thought possibly this might be implicated in the WSJS issue. Do you know anything about this? I don't, but have been reading up on it a little.
OK, the WS logs themselves: In the tarball are ten logfiles, named as 201307dd.hhmm, which is the starting date/time. Each is briefly documented within. (Btw, as you go thru the logs in date order, you'll notice a progression of formats as I tweaked around with the logging printfs, but I think all the mods are pretty self-explanatory.)
These are by no means all of the WSJS events that have occurred seen since I began instrumenting the setup, but most others were qualitatively similar enough to the first two (20130721.0956 and 20130721.2157) that they didn't seem worth keeping and documenting.
There are (at least) two post-mortem modes:
PM0: No WS processes are running after WSJS observed.
PM1: One WS process is still running after the WSJS.
In PM1, I'm not yet sure which thread function is the one left standing, or even whether it is consistent from WSJS to WSJS. It might be a different one each time. Unfortunately I've not kept track of which of the logged WSJS events resulted in which PM. PM1 is definitely more frequent though.
About the system logfile that you asked about: Fortunately, since I've not rebooted the SBC1 since I began the experiments, the entire syslog (/var/log/messages) is integral back to the start of the experiments. So it covers all of the logged WSJS events seen so far. It's in the tarball as systemlog_0721-0729. I looked thru it briefly and there are some USB-related warnings, but didn't see anything nasty, and did not try to correlate those warnings with events from the WS logs. (And just to be clear, I have *zero* under-the-hood knowledge of USB, so I really don't know what to make of these warnings.) Hopefully they will be useful to you and maybe you can deduce something from them.
OK, that's it! Have fun, and certainly shoot any questions back.