SBC long term stability

General PhidgetSBC Discussion.
User avatar
Patrick
Lead Developer
Posts: 3177
Joined: Mon Jun 20, 2005 8:46 am
Location: Canada
Contact:

Re: SBC long term stability

Postby Patrick » Tue Mar 16, 2010 11:02 am

I'm still not too sure what could be causing this - does un-plugging and re-plugging the wireless adapter bring things back or do you need to re-boot the SBC? I see from the logs that once it lost the signal it stayed lost for several days, presumably with the wireless network going down and up every day. Also, still curious if wpa_supplicant and wpa_cli were still running.

-Patrick

pythoncoder
Phidget Mastermind
Posts: 102
Joined: Tue Feb 07, 2006 5:16 am
Location: Northwest UK
Contact:

Re: SBC long term stability

Postby pythoncoder » Tue Mar 16, 2010 11:55 am

The wireless network was indeed started every day - it was a few days before I had the chance to plug in an Ethernet cable which involves partially dismantling it. As far as I can tell only a reboot of the SBC restores the wireless link.

Next time I'll check the things you mention.

Regards, Pete

pythoncoder
Phidget Mastermind
Posts: 102
Joined: Tue Feb 07, 2006 5:16 am
Location: Northwest UK
Contact:

Re: SBC long term stability

Postby pythoncoder » Sat Mar 20, 2010 5:30 am

Hi Patrick - it's happened again. This time I unplugged the Wifi adapter and plugged it in again - the SBC connected successfully to the network. I checked the processes running: wpa_supplicant and wpa_cli were both on the process list

Code: Select all

4089 root S 712 /usr/sbin/wpa_supplicant -iwlan0 -c/mnt/userspace/.config/wpa_supplicant.conf -P/var/run/wpa_supplicant_wlan0.pid -B 
4091 root S 444 /usr/sbin/wpa_cli -iwlan0 -a/sbin/wpa_action -P/var/run/wpa_cli_wlan0.pid -B 

I then power cycled the SBC. The full process list before and after the power cycle is available at
http://www.hinch.me.uk/Processes.txt

It strikes me that there may be issues with the Wifi adapter, perhaps confirmed by the fact that the one supplied with my second SBC had to be replaced under warranty? I look forward to hearing your comments.

Given that my program periodically checks for network availability, is there any way of rebooting the Wifi adapter under program control?

Regards, Pete

User avatar
Patrick
Lead Developer
Posts: 3177
Joined: Mon Jun 20, 2005 8:46 am
Location: Canada
Contact:

Re: SBC long term stability

Postby Patrick » Mon Mar 22, 2010 10:29 am

Hi,

You could try swapping the Wifi stick from your other SBC, but I don't think that's the issue.

I was wondering if wpa_supplicant and wpa_cli were running when the wifi adapter couldn't connect - these get started up when the adapter is plugged in and stopped when it's unplugged.

you can run '/sbin/wifi restart wlan0' to reset the wifi

-Patrick

pythoncoder
Phidget Mastermind
Posts: 102
Joined: Tue Feb 07, 2006 5:16 am
Location: Northwest UK
Contact:

Re: SBC long term stability

Postby pythoncoder » Mon Mar 29, 2010 4:56 am

Hi Patrick,

It's failed again, this time I connected an Ethernet cable and copied the list of processes:
http://www.hinch.me.uk/processes.txt
wpa_supplicant appears to be running, but not wpa_cli.

I have also updated the buffer files in case these have any worthwhile debugging information:
http://www.hinch.me.uk/Syslog.txt
http://www.hinch.me.uk/KernelRingBuffer.txt

While the Ethernet cable was connected I tried resetting the wifi according to your suggestion (running '/sbin/wifi restart wlan0') - unfortunately this was unsuccessful, the terminal session is at
http://www.hinch.me.uk/restart.txt

It seems very odd to me that it will cope with regular wifi outages, only occasionally locking up apparently irretrievably. I look forward to your comments.

Regards, Pete

pythoncoder
Phidget Mastermind
Posts: 102
Joined: Tue Feb 07, 2006 5:16 am
Location: Northwest UK
Contact:

Re: SBC long term stability

Postby pythoncoder » Thu Apr 08, 2010 4:48 am

I know you guys are busy, but have you had a chance to look at this one? I'd really like to have some way of kicking the wifi back into life!

Regards, Pete

User avatar
Patrick
Lead Developer
Posts: 3177
Joined: Mon Jun 20, 2005 8:46 am
Location: Canada
Contact:

Re: SBC long term stability

Postby Patrick » Thu Apr 08, 2010 8:26 am

well it looks like wpa_cli died unexpectedly... I do wonder if things would have come back up if you had deleted the lock file that message mentioned and started wifi back up without replugging the adapter.

Something I've been meaning to implement is a daemon to restart crashed processes (webservice mostly) - it should be pretty easy to hard it monitor the wifi processes too and that would probably work so long as wpa_cli crashing is your only problem

-Patrick

pythoncoder
Phidget Mastermind
Posts: 102
Joined: Tue Feb 07, 2006 5:16 am
Location: Northwest UK
Contact:

Re: SBC long term stability

Postby pythoncoder » Thu Apr 08, 2010 10:55 am

Hi Patrick, I can't find any reference to a lock file in our discussion so far, perhaps you could clarify? In my last error report, at your suggestion, I didn't unplug the adaptor instead running '/sbin/wifi restart wlan0' but this failed as detailed in my post of 29th March.

It's currently unable to connect to the wifi, I'll leave it in this state until tomorrow if there's anything else I can check to provide you with diagnostic information (deleting the lock file?).

Regards, Pete

User avatar
Patrick
Lead Developer
Posts: 3177
Joined: Mon Jun 20, 2005 8:46 am
Location: Canada
Contact:

Re: SBC long term stability

Postby Patrick » Thu Apr 08, 2010 2:18 pm

Sorry, not a lock file but a run file '/var/run/wpa_supplicant/wlan0' - running wifi failed because the file existed, the file existed because wpa_cli crashed rather then shutting down cleanly. So it probably would have restarted properly if the file was deleted (as the error code said).

-Patrick

pythoncoder
Phidget Mastermind
Posts: 102
Joined: Tue Feb 07, 2006 5:16 am
Location: Northwest UK
Contact:

Re: SBC long term stability

Postby pythoncoder » Fri Apr 16, 2010 3:36 am

Hi Patrick. Just to keep you posted, this does work and allows me to restart the wifi. I've written a simple script which periodically checks that wpa_cli is running, if it isn't it deletes the file and restarts the wifi. I'm confident that this will fix the problem.

It's worth pointing out that the problem occurs with both my 1070's so it isn't an issue with a particular SBC or WiFi adaptor. It's possible that it's related to my wireless access point, a Netgear WG602, although in my experience Netgear products are usually pretty good. Perhaps wpa_cli hasn't been tested adequately in this role: in its normal use on a laptop the WiFi will be running before the PC boots.

If you do decide to implement a similar watchdog in a subsequent firmware build please disclose this information so that I can tun off my script.

Thanks for your help in devising this solution.

Best Wishes, Pete


Return to “General”

Who is online

Users browsing this forum: No registered users and 5 guests