Products for USB Sensing and Control
It is currently Sun May 19, 2013 5:17 pm

All times are UTC - 7 hours [ DST ]




Post new topic Reply to topic  [ 31 posts ]  Go to page 1, 2, 3  Next
Author Message
PostPosted: Wed Nov 25, 2009 1:13 pm 
Offline
Phidget Mastermind

Joined: Tue Feb 07, 2006 5:16 am
Posts: 102
Location: Northwest UK
I am trying to write a program which is able to recover from the case where the SBC shuts down either through power loss or failure of the wifi link. Python code is running on a Linux PC and is communicating with the webserver via wifi on an absolutely standard SBC.

Currently it waits patiently for the SBC to start, then runs a loop where it reports changes to the various inputs and periodically sets and clears one of the digital outputs, flashing an LED. So far so good. However I've tried several techniques for detecting shutdown, yet the loop continues to attempt to toggle the digital output even after I've pulled out the power plug on the SBC. The techniques I've applied are as follows:

objIFK is the interface kit fitted to the 1070 which has been opened remotely via a wifi link. I've set the following handlers, which if called will cause the loop to exit:

objIFK.setOnServerDisconnectHandler(objConnection.Disconnect)
objIFK.setOnDetachHandler(objConnection.Disconnect)

Within the loop I have code like

objIFK.setOutputState(1, 1)

enclosed within an exception handler. Again if the Phidget exception occurs the loop should terminate. As far as I can see none of these events ever occurs. What should I really be doing?

Regards, Pete


Top
 Profile Send private message  
 
PostPosted: Wed Nov 25, 2009 3:00 pm 
Offline
Lead Developer
User avatar

Joined: Mon Jun 20, 2005 8:46 am
Posts: 2345
Location: Canada
Do the serverDisconnect or Detach events ever get fired? It could take quite a while for the program to notice that the SBC has gone away simply because network libraries uses long timeouts (on the order of about 60 seconds). This also applies to setting outputs, but since outputs are set asynchronously, you will not get an exception from the setOutput call - rather you need to watch the Error event for network errors.

If you need faster reaction to the SBC going down, you would need to implement some sort of 'heartbeat'. This could be implemented using the Phidget Dictionary by setting up a key change listener and then setting the key every second or so - if you don't receive a key change event within 500ms, you can assume that something has gone wrong on the network. You'll have to tune the timing to your network of course, since a 500ms delay would be quite normal over some networks.

-Patrick


Top
 Profile Send private message  
 
PostPosted: Thu Nov 26, 2009 11:27 am 
Offline
Phidget Mastermind

Joined: Tue Feb 07, 2006 5:16 am
Posts: 102
Location: Northwest UK
While I'm still trying to get to the bottom of what's going on, I have made the following observations.

0. Incidentally the document Phidget.html indicates that error events are not currently used, so until your suggestion I hadn't tried this approach.

1. The interface kit detach handler and the server disconnect handler are never called - I've waited a good ten minutes and nothing happens if I down the network or power off the SBC.

2. The error event for the interface kit does get called roughly every three seconds while it's waiting for the SBC to appear on the network for the first time after my program starts, but it never gets called when the kit disappears. In fact I never see it again once a connection has initially been established.

3. So far I've failed to find any event that triggers when the link goes down, even after a very long wait!

4. My program has a very simple main loop which simply toggles an LED every two seconds. Because of the problems described above, if I down the network this loop continues to run. Eventually after 80 iterations it crashes so completely that Python can't be interrupted by Ctrl-C - clearly one of the Phidget library calls is failing to return. The following code snippet, the content of my main loop, is where this occurs:

objLCD.setDisplayString(0, "Iteration count: " + str(x))
print("Iteration count: " + str(x))
objIFK.setOutputState(1, 0)
time.sleep(1)
objIFK.setOutputState(1, 1)
time.sleep(1)
x += 1

The behaviour is quite consistent: at the computer console the iteration count simply stops, with a count 80 greater than that last logged on the text LCD. It is of course arguable that this crashing isn't really an issue: clearly I need an orderly way to detect that the link is down and to stop attempting to drive non-existent hardware!

So far the only way I've been able to detect shutdown is to electrically connect the output that drives the toggled LED to an input. When the input change event stops firing, I know that the network is down or the SBC is powered off. While this works, prevents the crashing and gives me a system which will recover from an outage, I'm still trying to establish what events get called on reconnection. I'll report further on this when I'm sure of my facts.

While this has the elements of a workable solution it does seem to me that the event system isn't operating as described in the documentation.

I'd be interested to hear your comments.

Regards, Pete


Top
 Profile Send private message  
 
PostPosted: Thu Nov 26, 2009 1:18 pm 
Offline
Lead Developer
User avatar

Joined: Mon Jun 20, 2005 8:46 am
Posts: 2345
Location: Canada
You could watch the output change event instead of connecting an output to an input.

I will look into better ways of detecting network failure.

-Patrick


Top
 Profile Send private message  
 
PostPosted: Fri Nov 27, 2009 5:50 am 
Offline
Phidget Mastermind

Joined: Tue Feb 07, 2006 5:16 am
Posts: 102
Location: Northwest UK
Thanks Patrick, the output change event is a better solution and provides an excellent, and rapid, way of detecting that the connection is down.

However this leaves me with the problem of detecting that it's come back up again! If I start my program and then fire up the 1070, all the expected events seem to fire enabling me to start the program in an orderly fashion. Now, using the above technique, I can detect that it's gone down and revert to waiting for it to restart.

However no events seem to fire either when the network fails or when it restarts, leaving me with no proper way to restart my application - polling the non existent output and checking output change will work, but is clearly unsatisfactory and would probably eventually lead to a crash.

I look forward to your comments when you've had a chance to investigate. While I may of course be doing something silly, I've had very little difficulty getting the other aspects to work and I am convinced that these events aren't working as designed.

With kind regards, Pete


Top
 Profile Send private message  
 
PostPosted: Fri Nov 27, 2009 6:19 am 
Offline
Phidgeteer!

Joined: Sat Apr 25, 2009 5:17 am
Posts: 86
Location: France
Why not sending some "ping" to the board at regular intervals ?

2 or 3 packets every 10sec, for example, would not disturb the network.


Top
 Profile Send private message  
 
PostPosted: Fri Nov 27, 2009 11:15 am 
Offline
Lead Developer
User avatar

Joined: Mon Jun 20, 2005 8:46 am
Posts: 2345
Location: Canada
It's true - you should definitely be getting events when the network goes down.

-Patrick


Top
 Profile Send private message  
 
PostPosted: Mon Nov 30, 2009 6:04 am 
Offline
Phidget Mastermind

Joined: Tue Feb 07, 2006 5:16 am
Posts: 102
Location: Northwest UK
I'm making no headway with this whatsoever. I am now entirely convinced that there is something wrong with the detection of attachment to the server, whether effected by event handlers or by polling. Here is a summary of some observations which can very easily be replicated - I'd be extremely interested to hear if others are experiencing the same difficulties.

Observations (Python)
Having failed with events I decided to try my hand with the polled functions.
1. isAttached() only ever returns False prior to the webservice starting for the first time. After that it returns True whether the network is connected or not.
2. isAttachedToServer() always returns False. I've never seen it return True.
3. Once the server is up and running I can find no events which trigger when the network is disconnected or reconnected, even if I wait five minutes or more.

Observations (C)
In case the Python library was at fault I repeated my tests in C. I performed a one line adaptation to InterfaceKit-simple.c to open the IFK remotely and verified that it worked. I then added server connect and disconnect handlers.

On disconnecting the network for several minutes, then reconnecting, neither attach nor detach nor server connect nor disconnect events took place. The program continued to run: when the network was reconnected it again started to respond to changes to digital and analog inputs to the 1070. It simply failed to provide any indication at the console of the changes to the network status.

I repeated these tests using a wired rather than wireless network connection with the same results.

Test conditions
Testing was done with a 1070 SBC running the small firmware build. I have made no changes apart from updating the firmware. My application software was run on a PC running Ubuntu 9.04 32 bit. In every respect apart from the detection of server connectivity the system works perfectly.

Final comments
I have spent a considerable amount of time trying to get this aspect of the system to work and have made no progress. This is in stark contrast to the general ease of interfacing the device, whether by polling or by detecting events. Interestingly I am experiencing the same problems whether I use Python or C, or wired or wireless networks. I am reluctantly forced to conclude that, whilst I can easily detect that the connection has initially been established, detecting subsequent failure and reconnection is simply not possible using the recommended functions and events.

I intend to direct my efforts to more rewarding aspects of my project but I would be very interested to hear comments from anyone who has used, or attempted to use, this functionality.

With kind regards, Pete


Top
 Profile Send private message  
 
PostPosted: Mon Nov 30, 2009 9:33 am 
Offline
Phidgeteer!

Joined: Sat Apr 25, 2009 5:17 am
Posts: 86
Location: France
Hello,

I also confirm that no event is raised when the network goes down/up.

I've done tests in Visual C# with all events connected to a valid method which writes the event raised in a textbox.

The SBC takes roughly 30-35 seconds to come back alive after a network disconnection. After that, every input/output change is captured by the event manager. But no other event.

While the network is down, the program still thinks that the board is attached :(


Btw, I've just checked that with a ping I would know the board connection status and it's working.

Here's the sample code, very simple :
Code:
        private void timer1_Tick(object sender, EventArgs e)
        {
            System.Net.NetworkInformation.Ping _ping = new System.Net.NetworkInformation.Ping();

            System.Net.NetworkInformation.PingReply reply = _ping.Send("192.168.1.192", 1000);
            textBox1.AppendText("Ping : " + reply.Status.ToString() + Environment.NewLine);
            if (reply.Status == System.Net.NetworkInformation.IPStatus.TimedOut) { };
        }

It is attached to a 2sec timer. Here, no action is taken, but the theory is valid.


Top
 Profile Send private message  
 
PostPosted: Mon Nov 30, 2009 10:46 am 
Offline
Lead Developer
User avatar

Joined: Mon Jun 20, 2005 8:46 am
Posts: 2345
Location: Canada
This is definitely not the expected behaviour. We had assumed that passively detecting network failures would be fine for the webservice, but apparently not. I will be implementing active polling to detect network status, and this will improve things quite a bit - at the expense of a small amount of extra network traffic.

This is a priority, so it should be up soon.

-Patrick


Top
 Profile Send private message  
 
PostPosted: Mon Nov 30, 2009 10:50 am 
Offline
Phidgeteer!

Joined: Sat Apr 25, 2009 5:17 am
Posts: 86
Location: France
Thank you very much for your reactivity, Patrick !


Top
 Profile Send private message  
 
PostPosted: Mon Nov 30, 2009 10:52 am 
Offline
Phidget Mastermind

Joined: Tue Feb 07, 2006 5:16 am
Posts: 102
Location: Northwest UK
becafuel - Thanks for confirming my observations.

Patrick - Excellent news. Given the wireless support offered by the 1070 we do need reliable detection of network outages.

With kind regards, Pete


Top
 Profile Send private message  
 
PostPosted: Mon Nov 30, 2009 6:05 pm 
Offline
Lead Developer
User avatar

Joined: Mon Jun 20, 2005 8:46 am
Posts: 2345
Location: Canada
Latest drivers address this issue. It is a client side fix, so there is no reason to update the SBC firmware.

-Patrick


Top
 Profile Send private message  
 
PostPosted: Tue Dec 01, 2009 2:17 am 
Offline
Phidgeteer!

Joined: Sat Apr 25, 2009 5:17 am
Posts: 86
Location: France
Fix confirmed !

Good work. Should I say "as usual" ? ;)


Top
 Profile Send private message  
 
PostPosted: Tue Dec 01, 2009 8:44 am 
Offline
Phidget Mastermind

Joined: Tue Feb 07, 2006 5:16 am
Posts: 102
Location: Northwest UK
A definite improvement, however detection of an outage is extremely slow.

Running my slightly modified C program InterfaceKit-simple it looks very encouraging if you only break the network connection for a few seconds. On reconnecting all the events fire in the right order - program output:

<disconnect wifi adaptor>
<reconnect wifi adaptor>
Server disconnected
Phidget InterfaceKit 8/8/8 110297 detached!
Server Connected
Phidget InterfaceKit 8/8/8 110297 attached!

However if I disconnect the network and wait for some program output it takes 15 minutes before the following output occurs:

Server disconnected
Phidget InterfaceKit 8/8/8 110297 detached!
There are then periodic error messages:
Error handled. 8 - No route to host
which is doubtless correct behaviour. On reconnecting the network it detects reconnection very quickly. So, in essence, it works correctly but detecting a network outage is extremely slow unless it is reconnected promptly.

I haven't had a chance to fully test the Python interface, however a preliminary check demonstrates that, even with the network up and running and the SBC working correctly, isAttachedToServer() still returns False.

If I down the network isAttached() eventually starts to return False but it takes a very long time. I haven't yet timed it, but I'll hazard a guess that it'll be 15 minutes.

On reconnecting the network isAttached() returns True after a few seconds, but isAttachedToServer() still maintains its obstinate devotion to the False state.

Whilst I can appreciate that there is probably a tradeoff between response time and network traffic, 15 minutes does seem very slow.

I look forward to your comments.

With kind regards, Pete


Top
 Profile Send private message  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 31 posts ]  Go to page 1, 2, 3  Next

All times are UTC - 7 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group