SBC webservice stops

General PhidgetSBC Discussion.
User avatar
Patrick
Lead Developer
Posts: 3078
Joined: Mon Jun 20, 2005 8:46 am
Location: Canada
Contact:

Re: SBC webservice stops

Postby Patrick » Thu Feb 07, 2013 3:54 pm

SBC2 web interface uses session cookies for authentication so you can't just use basic http auth. The best solution would be something running on the SBC which keeps track of making sure the webservice is running - like a cron job which checks if it's running every minute and starts it if needed.

The best solution would be a webservice that doesn't crash. We have been tracking these issues for a while. Most configurations using the webservice are stable long-term, but some setups cause the webservice to crash at regular intervals. We are working on this.

SBC3 uses the same software - no help there.

-Patrick

Laertes
Phidgetly
Posts: 46
Joined: Mon May 05, 2008 10:02 am

Re: SBC webservice stops

Postby Laertes » Thu Feb 07, 2013 4:35 pm

I agree that it would be nice if it doesnt crash at all, but I know how hard it is to find this kind of bugs..

At this moment it crashes about once an hour, so I need to come up with some kind of workaround. If you want crashdata i can give you access to the board ;)

whats is the command to start it from the SBC2 itself?
can I use the standard 'service XXX restart'?

If so, what is the name of the service?
regards,
/Lars

User avatar
Patrick
Lead Developer
Posts: 3078
Joined: Mon Jun 20, 2005 8:46 am
Location: Canada
Contact:

Re: SBC webservice stops

Postby Patrick » Thu Feb 07, 2013 5:06 pm

service phidgetwebservice start

You could help by providing code which makes it crash every hour - that is a lot of crashing, definitely not expected.

patrick@phidgets.com

-Patrick

AndreGermain
Phidgetly
Posts: 41
Joined: Fri Sep 30, 2011 7:09 am
Contact:

Re: SBC webservice stops

Postby AndreGermain » Thu Jun 27, 2013 5:33 am

Patrick,

this problem is very vexing - I have already put a lot of effort into figuring out Phidgets issue for a few years, and can't afford yet more time to figure out how to get code running on an SBC2. Can Phidgets.com not provide us a package to install on an SBC to routinely force the webservice to start?

I've attached the logs from this SBC1. It is configured as such below:

Board Revision 100
Firmware Version 1.0.4.20130320 (minimal)
Kernel Version Linux version 2.6.32.14 (root@sl) (gcc version 4.3.4 (Buildroot 2010.05) ) #1 PREEMPT Wed Mar 20 14:56:43 MDT 2013
Phidget Library Phidget21 - Version 2.1.8 - Built Mar 20 2013 14:54:35

The date/time has been wrong for a few months now - haven't checked NTP. Does it matter? It's way back in 1969, so perhaps the NVM or battery is faulty?

Cheers
Last edited by AndreGermain on Thu Jun 27, 2013 11:23 am, edited 1 time in total.

User avatar
Patrick
Lead Developer
Posts: 3078
Joined: Mon Jun 20, 2005 8:46 am
Location: Canada
Contact:

Re: SBC webservice stops

Postby Patrick » Thu Jun 27, 2013 11:15 am

This can be done with 'supervise' - part of the daemontools package. I have been meaning to add automatic webservice restarts to the SBC for a while, but don't know when this will be ready.

You can set this up yourself:

Code: Select all

service phidgetwebservice stop
insserv -r phidgetwebservice
apt-get install daemontools daemontools-run
mkdir /etc/service/phidgetwebservice
touch /etc/service/phidgetwebservice/run
chmod 755 /etc/service/phidgetwebservice/run

/etc/service/phidgetwebservice/run should contain:

Code: Select all

#!/bin/sh

NAME=phidgetwebservice
BIN=phidgetwebservice21
DAEMON=/usr/bin/$BIN
CFG=/etc/default/$NAME

# Gracefully exit if the package has been removed.
test -x $DAEMON || exit 0

# load config
pws_enabled="true"
pws_port="5001"
pws_serverid=""
pws_password=""
[ -f $CFG ] && . $CFG

[ -z "$pws_port" ] || OPTIONS="-p $pws_port "
[ -z "$pws_password" ] || OPTIONS="$OPTIONS-P $pws_password "

if [ -z "$pws_serverid" ]; then
   OPTIONS="$OPTIONS -n $( hostname )"
else
   OPTIONS="$OPTIONS -n $pws_serverid"
fi

$DAEMON $OPTIONS

Then, reboot and make sure phidgetwebservice is started. It is now being started/managed by supervise, and will restart when it dies.

-Patrick

AndreGermain
Phidgetly
Posts: 41
Joined: Fri Sep 30, 2011 7:09 am
Contact:

Re: SBC webservice stops

Postby AndreGermain » Thu Jun 27, 2013 11:32 am

Patrick,

thank you, will try this out when I return from vacation in a bit. I edited my previous message at the same time as you wrote yours - I added the logs of the SBC and such - perhaps that'll help figure out what is wrong. Note this SBC is on wireless, with a good strong signal all the time. I have an SBC2 that has no issues, but is wired on the same router.

Note that I used to have issues with the SBC2 that were similar, but I solved them by removing a Phidgets encoder (replaced with a spatial on the other SBC) to sense my observatory's dome position. Apparently EMI was getting into the SBC from this cable as large relays and PWM signals drive the dome motors. However this present SBC issue is not related as it happens regardless of whether the motors are powered or not. Seems much worst recently, perhaps due to humidity? Very wet and hot weather. The 1045 IR phidget is dropping off often.

Half the time the webservice is stopped, or is started but needs to be cycled to make it work properly again. If it's a memory leak, seems to happen as fast as 5 minutes and other times weeks - so not so sure as the configuration of this SBC never changes (how it is accessed [10 Hz continuously], and the h/w connected to it [static]). When the problem occurs, of course the client complains of Network error (asynchronous).

Cheers

glenn
Phidgeteer!
Posts: 93
Joined: Sun Sep 05, 2010 4:42 pm
Contact:

Re: SBC webservice stops

Postby glenn » Sun Jul 07, 2013 12:26 pm

Hi Patrick,

Just to add one more voice on this issue, and also perhaps some useful diagnostic input for tracking it down: I also have experienced this "webservice just stops" issue since I began working with the 1070 SBC1 several years ago. Like some of the other reporters in this thread, I consider it a fairly serious problem because it really gets in the way of reliable long-term data collection. In my environment, the typical connection lifetime is on the order of tens of hours, and the only way to restore connectivity is a reboot. Not pretty.

Here's some add'l diagnostic info and observations that may perhaps be useful:


1. My setup is as follows:

Code: Select all


               sensors <---> SBC1 <-- WiFi link --> lapop
                              |
                              u
                              s
                              b
                              |
                            PH1014
                              |
               motors <-------'




The sensors come into SBC1 via the on-board IFKit 8/8/8.

The laptop app looks more or less something like this (pseudocode):

Code: Select all


        CPhidgetInterfaceKit_create(PH1070);
        CPhidget_openRemoteIP(PH1070);
        CPhidget_waitForAttachment(PH1070);

        CPhidgetInterfaceKit_create(PH1014);
        CPhidget_openRemoteIP(PH1014);
        CPhidget_waitForAttachment(PH1014);

        while (1)
        {
            sleep(10);
            read PH1070 IK8/8/8 sensors;
            conditionally update some of the PH1014 relays;
        }




2. During the time that the webservice link is operating properly, and with the laptop running only my SBC interface app (i.e. neither sending nor receiving any other traffic over the WiFi interface) the steady-state data rate over the WiFi link averages around 1-2 kB/s (laptop->SBC1) and around 5-6 kB/s in the other direction, even when no sensor or relay data is being transferred. I assume this traffic is just the webservice protocol itself exchanging some sort of idle-loop background keepalive messages of some sort. (Although, as an aside, I'm a little surprised that the idle message rate is that high. But let's ignore that.)

3. When the webservice dies, the traffic rate over the wireless link drops to essentially zero (something like a few dozen B/s in both directions) and it stays that way forever. My app of course dies off too, upon attempting a remote sensor read or relay write, which times out with error 13.

4. Restarting the app on the laptop (thus forcing the above init sequence of create/open/attach) *never* restores webservice comm once it dies. In all cases that I've ever observed, it's always a hard fault from that point onward, and the only cure seems to be a reboot.

5. Even while the SBC1 is in the webservice-dead state, the SBC1 is nevertheless pingable and slogin-able from the laptop. This makes it clear -- if there was even any doubt -- that the source of the problem is the webservice process itself, and furthermore, regardless of what causes the webservice to enter the dead state, it does not return to life upon restoral of reliable link-level transport.

6. The webservice lock-up mode can often be induced by simply introducing some random transient link-layer packet loss. For example, I've found that simply turning on a mobile phone nearby and enabling its WiFi cocnnection to the same access point being used by the laptop-SBC link often causes the webservice lockup to occur.


Given all the above, my guess as to what is going on is simply that the webservice protocol is probably not overly robust to transient loss of link- level connectivity, and winds up getting easily wedged in a deadlock condition with the remote app. Both ends are probably waiting for either a timeout or a response from the other end which never occurs. Obviously this is not a rocket-science observation -- you've probably already concluded much the same thing yourself -- just my suspicion based on a (very) quick look at your webservice code.

From what I was able to tell, it appears that webservice is built upon a home grown app-layer protocol running over an unreliable transport layer (UDP). Having designed such arrangements myself on several occasions during my career, I can certainly sympathize with the trickiness involved in designing in a high degree of app-layer robustness against a wide variety of link-layer fault scenarios, many of which are difficult to simulate during testing or reproduce during actual operation. It is not easy, and is made even more difficult by strict latency and rate constraints.

One obvious suggestion -- though I'm sure you've considered it already -- is to try to migrate your webservice to TCP, and then go in and agressively tune the TCP parameters to meet your realtime/latency needs.

Anyway, hope the above is useful to you. If obtaining more diagnostic info -- link dumps, whatever -- would be helpful to you, just ask, I'll be glad to help if I can.

Regards,
Glenn

AndreGermain
Phidgetly
Posts: 41
Joined: Fri Sep 30, 2011 7:09 am
Contact:

Re: SBC webservice stops

Postby AndreGermain » Mon Jul 08, 2013 7:01 am

glenn, I also have determined over time that the weak link is the comm layer; I've reduced the failure rate much by setting lower resolution on the sensors so they don't report as often. Wired LAN has been of course more reliable than WiFi, but I have two SBCs, one on either. Throw a NET cam on the LAN, and watch your SBC webservice cough in no time. I have a love/hate relationship with the Phidgets, and I would have NOT gone the Phidget and SBC root if I had known the hours I'd have to spend over the past 5 or so years. If Phidget were to pay me for the hours lost [and others], they would have gone out of business, IT IS THAT BAD. But I can't change my system, it would cost yet even more hours [13 Phidgets and 2 SBCs run an automatic Astronomical observatory, on 24/7]. I sometimes find the webservice stopped, or started but N/A. The former is corrected by pressing the web interface start button, whereas the latter can be fixed by cycling the start/stop, or sometimes requiring a reboot, quite variable.

Patrick, I've got the SBC 100 SSH + full debian installed, and yet I can't create folders as the file-system is Read-only (?), wasn't so last time around [SBC was reset a while ago to factory default]. Also service doesn't exist (?!). I gather I need to install a package? Time to Google.

Cheers

User avatar
Patrick
Lead Developer
Posts: 3078
Joined: Mon Jun 20, 2005 8:46 am
Location: Canada
Contact:

Re: SBC webservice stops

Postby Patrick » Mon Jul 08, 2013 8:46 am

Hi,

The webservice does run over TCP. We are aware of the issues, but have been unable - obviously - to fix these satisfactorily over the years. This stems mostly from the core codebase having been outsourced many years ago. We have plans to do a complete re-write of the webservice, with a much simpler model (simpler internally, but this won't be noticed by users) - with much higher reliability constraints, but this won't be released for 1-2 years. In the meantime, I continue to work on the current webservice. I'll admit that I haven't done extensive testing with packet-loss injection, so I can look into that.

For those who work with SBCs, I will always recommend that you write a C or Java program to run directly on the SBC, and use this to communicate directly with your Phidgets. Use the PhidgetDictionary for network communication, rather then opening Phidgets directly over it - or even, don't use the webservice, and implement a simple server/client on your own. Not a good solution, but at least it will be more reliable then opening Phidgets over the webservice at this time.

I'd also like to note, that we have found the webservice to be reliable at the office - part of why tracking down these problems has been so difficult. I have 2 SBCs at home, and their webservice uptime is measured in many months of trouble-free operation. Perhaps, I just have an error-free network..

-Patrick

AndreGermain
Phidgetly
Posts: 41
Joined: Fri Sep 30, 2011 7:09 am
Contact:

Re: SBC webservice stops

Postby AndreGermain » Tue Jul 09, 2013 7:59 pm

Patrick,

you had provided code for 'supervise' to wake the web interface, but I had originally indicated my issues were with the SBC2, when in fact they are now on the SBC1. The long standing SBC2 issues were ultimately EMI coming through the Phidget optical encoder cable. But the current situation on the SBC1 (over WiFi) is the random shutdown of the webinterface. It is continuously monitored [I can provide the code]. Is it because of the SBC1 that can't get any of the steps you provided (supervise)? Admin/root account? They all have read only for the path.

Cheers


Return to “General”

Who is online

Users browsing this forum: No registered users and 1 guest