SBC long term stability

General PhidgetSBC Discussion.
pythoncoder
Phidget Mastermind
Posts: 102
Joined: Tue Feb 07, 2006 5:16 am
Location: Northwest UK
Contact:

SBC long term stability

Postby pythoncoder » Thu Jan 21, 2010 4:59 am

I've reported previously that the Webservice occasionally shuts down of its own accord if I leave the SBC running for a period of a couple of days or so. This has occurred on quite a few occasions. I have also now twice experienced more general failures, where I was unable to communicate with the device at all. On the first occasion I discounted it, thinking that the device may have experienced some kind of power spike, but yesterday's experience rather rules this out.

Until recently I've been running the small firmware build, however for the last week I've been running the full build with the SBC running a very simple test program which displays the current time and date on an attached Text LCD. The program uses about 1% of the CPU time and I can communicate with the SBC using SSH and the control panel while it is running. The SBC uses wifi, but my network is switched off at night.

Normally this works fine and connectivity is restored when I start everything up in the morning. However yesterday morning, while my program was still running correctly, I was unable to communicate with the SBC by any means. It wouldn't even respond to a ping - so there was no possibility of SSH or the Control Panel discovering the device or of attempting any diagnostics. The fact that my program was still running would seem to rule out any external electrical disruption.

I appreciate that the SBC is a very new product, but it does strike me that there are a couple of long term reliability issues. I'd be interested to hear any comments.

Regards, Pete

Robert

Re: SBC long term stability

Postby Robert » Thu Jan 21, 2010 10:39 am

Hmm, It MIGHT be a wifi issue, but I'm not sure. I had one networked, running all day, every day, for about 2 months straight, and it never had a problem.

User avatar
Patrick
Lead Developer
Posts: 3067
Joined: Mon Jun 20, 2005 8:46 am
Location: Canada
Contact:

Re: SBC long term stability

Postby Patrick » Thu Jan 21, 2010 10:41 am

It sounds like the wifi adapter did not re-connect to your network. I have a long running SBC at home on my wifi network, and it never has trouble - uptime of several months with reboots only to update the firmware, but I also don't switch my wifi on and off every day.

When it happens again, I would be interested if unplugging/replugging the wifi adapter gets it to reconnect, or if it needs to be power cycled, and then, if there are any interesting wireless-lan related messages in the system logs, which could help track down the issue.

-Patrick

pythoncoder
Phidget Mastermind
Posts: 102
Joined: Tue Feb 07, 2006 5:16 am
Location: Northwest UK
Contact:

Re: SBC long term stability

Postby pythoncoder » Thu Jan 21, 2010 12:08 pm

I'll do that. I suppose I should also have tried connecting an ethernet cable to it to see if I could talk to it that way - I didn't think of that until after I'd rebooted it.

For what it's worth my wifi has been in place and unchanged for nearly four years and has worked faultlessly with various laptops and devices.

Regards, Pete

pythoncoder
Phidget Mastermind
Posts: 102
Joined: Tue Feb 07, 2006 5:16 am
Location: Northwest UK
Contact:

Re: SBC long term stability

Postby pythoncoder » Fri Jan 22, 2010 5:39 am

Hi Patrick. You are right - it is a wifi problem. This morning I again was unable to communicate with the SBC so, without switching off or rebooting, I connected an ethernet cable. I was then able to SSH in (and access it from the control panel). Incidentally my program was still running.

I'm not entirely sure what logs you need to see, however I captured the output of dmesg and also the files pwsout and messages from the /tmp folder. I have put these in
http://www.hinch.me.uk/dmesg.txt
http://www.hinch.me.uk/pwsout.txt
http://www.hinch.me.uk/messages.txt
I'd be interested to hear your comments, and any requests for other logs for the next time it occurs.

Oddly it happily copes with short outages of the Wifi link, the problem only occurs when I shut it down overnight, and then only on occasion.

Regards, Pete

User avatar
Patrick
Lead Developer
Posts: 3067
Joined: Mon Jun 20, 2005 8:46 am
Location: Canada
Contact:

Re: SBC long term stability

Postby Patrick » Fri Jan 22, 2010 12:01 pm

I'd be interested if the wifi processes are still running (run 'ps -A'):

Code: Select all

/usr/sbin/wpa_supplicant -iwlan0 -c/mnt/userspace/.config/wpa_supplicant.conf -P/var/run/wpa_supplicant_wlan0.pid -B
/usr/sbin/wpa_cli -iwlan0 -a/sbin/wpa_action -P/var/run/wpa_cli_wlan0.pid -B


if wpa_supplicant is running, you should run 'wpa_cli -iwlan0'. This gives you a prompt, the command 'status' should respond with: 'wpa_state=SCANNING', you can then look at the scan results with 'scan_results', which should list your access point in the list.

If your access point is not in the list, does switching the access point off and on again bring it back? Are you able to connect to the AP with other wifi devices even when the SBC cannot connect? You could also try killing and restarting wpa_supplicant with the same parameters and see if that works. I'm assuming that unplugging/replugging the wireless adapter gets it to reconnect.

-Patrick

pythoncoder
Phidget Mastermind
Posts: 102
Joined: Tue Feb 07, 2006 5:16 am
Location: Northwest UK
Contact:

Re: SBC long term stability

Postby pythoncoder » Tue Jan 26, 2010 11:41 am

Hi Patrick, just to let you know I will do these checks as soon the wifi falls over again. At present it must know we're on the track of the problem because it's been working perfectly 24 hours a day!

Regards, Pete

pythoncoder
Phidget Mastermind
Posts: 102
Joined: Tue Feb 07, 2006 5:16 am
Location: Northwest UK
Contact:

Re: SBC long term stability

Postby pythoncoder » Tue Feb 02, 2010 11:58 am

My present application (a clock which displays astronomical data) uses an autonomous C program on the SBC. It only uses the wifi inasmuch as the time automatically corrects itself, and so it's entirely unaffected by wifi outages. It has been running faultlessly (through various updates) for two or three weeks. So the following comments are really for your information of the outcome of this rather protracted investigation rather than a request for help.

The webservice seems to have become more reliable since I updated to the latest firmware. I have been able to access the SBC daily from the Phidget Control Panel. This definitely wasn't the case in December/early January when I was experiencing times when I couldn't communicate with the SBC by any means. This hasn't occurred once since the last firmware update.

There is indeed (as you suggested) something not 100% right about my wifi network, although it's been in regular use for four years and nobody has noticed. I have fitted the clock with an LED indicating web connectivity using the method you suggested - it goes out occasionally for 10-50 seconds despite the fact that the clock is in the same room as the access point.

I wrote a similar routine in Python which I've run for some hours on my netbook and that confirms that there are brief outages.

Evidently applications using the Phidget Webservice to control the SBC (as I was until recently) need to be resilient in the face of wifi outages. At least in my house.

I have posted the code in case you or anyone else is interested. Note that the Busybox implementation of ping doesn't support the w 1 parameter so the call to ping can occasionally take ten seconds to return, if the network goes down at the wrong time in the cycle. The only other things of note are the redirection of stderr to /dev/null which avoids endless "network unreachable" messages on the console and the location of temporary files on the ramdisk to avoid SSD wear.

Code: Select all

// Return value 0 = OK, 1 = File open failure, 2 = resource not present
// Resource can be an IP address or a network device name
int net_test(char *cResource)
   {
   FILE *infile ;
   char cBuf[100] ;                           // Output of ping is 57 chars
   char cCmd[100] ;
   char *cFile = "/tmp/ping.txt" ;                  // Temporary file (located in ramdisk on SBC)
   int nResult = 2 ;                           // Assume network failure
   int nTx, nRx ;
   sprintf(cCmd, "ping -c1 %s > %s 2> /dev/null", cResource, cFile) ; // busybox doesn't support -w1
//   sprintf(cCmd, "ping -c1 -w1 %s > %s 2> /dev/null", cResource, cFile) ;
   system(cCmd) ;                              // messages on stdout
   infile = fopen(cFile, "r") ;
   if (infile == 0)
      nResult = 1 ;                           // Failed to open the file
   else
      {
      while(fgets(cBuf, 99, infile) != NULL)         // Read a line at a time
         {
         if (sscanf(cBuf, "%d packets transmitted, %d", &nTx, &nRx) == 2)
            {
            if (nRx == 1)                     // Received one packet
               {
               nResult = 0 ;
               break ;
               }
            }
         }
      fclose(infile) ;
      }
   return nResult ;
   }

Thanks for all your help.

Regards, Pete

User avatar
Patrick
Lead Developer
Posts: 3067
Joined: Mon Jun 20, 2005 8:46 am
Location: Canada
Contact:

Re: SBC long term stability

Postby Patrick » Tue Feb 02, 2010 12:59 pm

Glad to hear that things seem to be working better. The last update did address a few webservice issues - the release of the SBC has put the webservice into the spotlight, whereas before, it was much less used.

I would say that any application running over a network should be resilient to outages, though this admittedly requires some extra coding.

-Patrick

pythoncoder
Phidget Mastermind
Posts: 102
Joined: Tue Feb 07, 2006 5:16 am
Location: Northwest UK
Contact:

Re: SBC long term stability

Postby pythoncoder » Tue Mar 16, 2010 6:24 am

On a few occasions my Lunar Clock has failed to reconnect to my wireless network after the network was turned off overnight. Today I've managed to extract some debug information by connecting an ethernet cable to it while it was still running. Note the application is a C program running on the SBC, and this was running correctly.

The webservice was still running and I was able to access the SBC via the Control Panel and via the web interface. The latter reported the wireless state as ERROR, and the adapter as wlan0 (up). No wireless networks were detected. I have extracted the Kernel Ring Buffer and System Log from the web interface, these may be seen at
http://www.hinch.me.uk/KernelRingBuffer.txt
http://www.hinch.me.uk/Syslog.txt

This was running the full firmware build of 10th January - I've since upgraded to the latest version. If this is a known problem solved in the current build, please let me know.

Regards, Pete


Return to “General”

Who is online

Users browsing this forum: No registered users and 3 guests