Just to add one more voice on this issue, and also perhaps some useful diagnostic input for tracking it down: I also have experienced this "webservice just stops" issue since I began working with the 1070 SBC1 several years ago. Like some of the other reporters in this thread, I consider it a fairly serious problem because it really gets in the way of reliable long-term data collection. In my environment, the typical connection lifetime is on the order of tens of hours, and the only way to restore connectivity is a reboot. Not pretty.
Here's some add'l diagnostic info and observations that may perhaps be useful:
1. My setup is as follows:
Code: Select all
sensors <---> SBC1 <-- WiFi link --> lapop
The sensors come into SBC1 via the on-board IFKit 8/8/8.
The laptop app looks more or less something like this (pseudocode):
Code: Select all
read PH1070 IK8/8/8 sensors;
conditionally update some of the PH1014 relays;
2. During the time that the webservice link is operating properly, and with the laptop running only my SBC interface app (i.e. neither sending nor receiving any other traffic over the WiFi interface) the steady-state data rate over the WiFi link averages around 1-2 kB/s (laptop->SBC1) and around 5-6 kB/s in the other direction, even when no sensor or relay data is being transferred. I assume this traffic is just the webservice protocol itself exchanging some sort of idle-loop background keepalive messages of some sort. (Although, as an aside, I'm a little surprised that the idle message rate is that high. But let's ignore that.)
3. When the webservice dies, the traffic rate over the wireless link drops to essentially zero (something like a few dozen B/s in both directions) and it stays that way forever. My app of course dies off too, upon attempting a remote sensor read or relay write, which times out with error 13.
4. Restarting the app on the laptop (thus forcing the above init sequence of create/open/attach) *never* restores webservice comm once it dies. In all cases that I've ever observed, it's always a hard fault from that point onward, and the only cure seems to be a reboot.
5. Even while the SBC1 is in the webservice-dead state, the SBC1 is nevertheless pingable and slogin-able from the laptop. This makes it clear -- if there was even any doubt -- that the source of the problem is the webservice process itself, and furthermore, regardless of what causes the webservice to enter the dead state, it does not return to life upon restoral of reliable link-level transport.
6. The webservice lock-up mode can often be induced by simply introducing some random transient link-layer packet loss. For example, I've found that simply turning on a mobile phone nearby and enabling its WiFi cocnnection to the same access point being used by the laptop-SBC link often causes the webservice lockup to occur.
Given all the above, my guess as to what is going on is simply that the webservice protocol is probably not overly robust to transient loss of link- level connectivity, and winds up getting easily wedged in a deadlock condition with the remote app. Both ends are probably waiting for either a timeout or a response from the other end which never occurs. Obviously this is not a rocket-science observation -- you've probably already concluded much the same thing yourself -- just my suspicion based on a (very) quick look at your webservice code.
From what I was able to tell, it appears that webservice is built upon a home grown app-layer protocol running over an unreliable transport layer (UDP). Having designed such arrangements myself on several occasions during my career, I can certainly sympathize with the trickiness involved in designing in a high degree of app-layer robustness against a wide variety of link-layer fault scenarios, many of which are difficult to simulate during testing or reproduce during actual operation. It is not easy, and is made even more difficult by strict latency and rate constraints.
One obvious suggestion -- though I'm sure you've considered it already -- is to try to migrate your webservice to TCP, and then go in and agressively tune the TCP parameters to meet your realtime/latency needs.
Anyway, hope the above is useful to you. If obtaining more diagnostic info -- link dumps, whatever -- would be helpful to you, just ask, I'll be glad to help if I can.