another wierd error


Advanced search

Message boards : Number crunching : another wierd error

Author Message
Profile Ascholten
Send message
Joined: 17 Sep 11
Posts: 112
Credit: 525,421
RAC: 0

Message 1100 - Posted: 8 May 2012 | 20:38:06 UTC

Ok, now my one computer which RH was running fine on is starting to play the usb stall game.

Today I came home to find the task 4 hours into it, normally takes 30 minutes, so obviously was stalled. Unplugging and replugging the detector did not work, it usually works on my other machine when it hangs up.

I stopped / restarted boinc and got a Waiting to run (waiting for GPU memory) on the radioactive task. Why would it have to wait for GPU memory? That makes no sense, especially considering that my vid card has 2 gig of memory and the crunching task in it is using maybe 750k at most?

Any ideas?
Thanks
Aaron
____________


Dagorath
Avatar
Send message
Joined: 4 Jul 11
Posts: 151
Credit: 42,738
RAC: 0

Message 1101 - Posted: 8 May 2012 | 22:21:39 UTC - in response to Message 1100.

The "Waiting for GPU memory" thing is a bug in BOINC client. You're on an old version, 6.10.58. IIRC they made more than 1 attempt to fix that bug and I don't recall exactly which version finally fixed it. The recommended version is now up around 7.0.25 which I use. It seems to be stable.

There are some changes in the 7.x.x series that won't allow you to go back to 6.x.x easily. IIRC, going from 7 back to 6 requires uninstalling and manually deleting the entire BOINC data directory. You probably won't need to revert to 6 but you should be aware of what it might mean if you do.


____________

Profile Ascholten
Send message
Joined: 17 Sep 11
Posts: 112
Credit: 525,421
RAC: 0

Message 1102 - Posted: 9 May 2012 | 19:48:17 UTC - in response to Message 1101.
Last modified: 9 May 2012 | 20:12:52 UTC

Thanks Dagorath, ill probably try the update on Boinc. This time the damned thing just locked up after running for a few hours, I come home to find the stupid project stalled for 16 hours of nothing. I never had any problems with this computer at all, it would run for days and days, now all of the sudden I can't get the thing to run for a few hours without problems.

they really need to fix this crap.

Edit: I installed the new boinc and got a waiting for scheduler error, WTF ever that means. I had to do a hard reboot on the computer only to find that the USB port was locked so bad it blue screened the computer on shut down procedure.

Rebooted and now it's running on Boinc 7, calling it non cpu intensive. I swapped to a different USB port hope that helps.

WHY is this all of the sudden causing such problems when I have changed nothing on my system? Have you guys changed anything with the program or server at all?

Aaron
____________


Profile Ascholten
Send message
Joined: 17 Sep 11
Posts: 112
Credit: 525,421
RAC: 0

Message 1107 - Posted: 12 May 2012 | 18:09:47 UTC - in response to Message 1102.

Well, I went and upgraded to Boinc 7.0.25 and so far it seems ok.
Now a problem I am seeing is, before my cpu used to run at around 80 percent utilization for all tasks, which would zip them out pretty quick.

Now, even with the settings the same, i rarely exceed 50 percent cpu utilization.
Anyone have any ideas WTF is happening now? I didn't change anything that I can see and I have my power scheme to run for performance when possible.

sigh... I am really beginning to tire of this boinc bug / windows garbage.

Aaron
____________


Dagorath
Avatar
Send message
Joined: 4 Jul 11
Posts: 151
Credit: 42,738
RAC: 0

Message 1109 - Posted: 12 May 2012 | 22:46:21 UTC - in response to Message 1107.

This isn't the best forum to get help with problems related to BOINC itself. For that you should go to the BOINC dev forums . Ageless is very good at BOINC related problems but he won't help unless you provide details first. So explain exactly which hosts are giving you problems, what OS they're running, BOINC version, CPU model, etc. If would be very helpful if you would provide a link to your list of hosts and then give the host IDs that are giving you trouble. That way he can read nearly all the details he needs for himself.

It sounds like you're miffed with Windows itself? If so then consider switching to Linux but it's not a panacea. For example a USB port that isn't implemented properly in hardware will still give problems under Linux. I can give you some guidance for moving to Linux if you want.

____________

Profile Ascholten
Send message
Joined: 17 Sep 11
Posts: 112
Credit: 525,421
RAC: 0

Message 1110 - Posted: 12 May 2012 | 23:02:43 UTC - in response to Message 1109.

I stumbled across something that I wonder if it has something to do with some of these problems.

Under power themes, you can adjust the power use of your computer.... sort of. It has an area where you can tell it to turn USB ports on or off, (im assuming depending on useage on them). I just turned mine to disable that function.

This makes me wonder, if part of the windows problem with the USB port hanging might be something to do with Winblows turning off the port on the detector, even briefly, and that causing the data hangup / port lockup / lockout.

Granted in a perfect world, windows would see the USB port in use and not touch it, but all it would take is one app to bog windows down or eat a processors attention for a few seconds, and this could cause a USB port to get 'burped' which would be plenty enough to cause a problem with the USB.

Just thinking out loud.
Aaron
____________


Profile Ascholten
Send message
Joined: 17 Sep 11
Posts: 112
Credit: 525,421
RAC: 0

Message 1128 - Posted: 18 May 2012 | 21:10:20 UTC - in response to Message 1110.

Ok, this is beginning to really piss me off now. The stupid thing is locking up like every 5 or 6 hours. It's actually locking up the USB port to the point where I can't even do a normal windows shutdown, it freezes up and eventually blue screens with a USB error. Now I wait 20 minutes for the shit to reset on the reboot. I tried a different detector, same thing, it keeps freezing for no reason.

Dagorath I tried your script but it's not helping when the port is locked out to the point it kills windows. Thank you anyways for the effort.

Why would a perfectly working computer start doing this crap? My other computer which was having lockout issues is now working pretty good actually. This makes me think more and more it's something with the project.

I tried installing the newest boinc and well, I don't like it, it has some pretty significant issues with GPU tasks and not releasing CPU tasks properly to fetch more. I thought an upgraded boinc might help with the busy error but I still am having these problems. I tried plugging into a different usb port on the computer too, no help.

This gets really annoying.

Aaron
____________


Dagorath
Avatar
Send message
Joined: 4 Jul 11
Posts: 151
Credit: 42,738
RAC: 0

Message 1129 - Posted: 19 May 2012 | 4:51:25 UTC - in response to Message 1128.

Does it blue screen even if my script isn't running? The reason I ask is because my script isn't very sophisticated and there is a possibility that the blue screens are caused by the script trying to reset the detector over and over and over. Maybe repeated unsuccessful reset commands causes the driver to go crazy or something. If it blue screens even if the script is not running then we know it's not the script causing the blue screens.

I wonder if changing some setting in your Windows configuration would prevent the lock ups? Is it possible to install a different USB driver to see if it works better? You might get some advice about that and some ideas to try from one of the Windows help forums.

One way to install a different driver would be to install Linux on the machine. No guarantees but it might do the trick.


____________

Profile Ascholten
Send message
Joined: 17 Sep 11
Posts: 112
Credit: 525,421
RAC: 0

Message 1130 - Posted: 19 May 2012 | 9:38:54 UTC - in response to Message 1129.

Dagorath it does it without your script and with. The port faults to the point of lockup it seems. The strange thing about this is, windows generally just ignores bad USB faults, it locks out the port and goes on it's way, which it does, but even in shut down windows just ignores the port. Not this time, it blue screens.

I have tried different ports, different settings on windows. It's hit or miss though, it may go a few days with NOTHING going wrong, then all of the sudden it does this for a day or two. NOTHING changed on my side, NO I did not let windows install any updates etc. This is why I am questioning if there is something goig on on the project / server side. My side is not changing, is theirs?

I even swapped the detector with one on one of my other machines to rule out hardware failure. I swapped the USB cable.

I don't think it's a boinc thing either since I updated the version I am running, (much to my chagrin as I do not like this new version of Boinc) .

It's behaving itself a bit better now. Let's see I guess.

Aaron
____________


Dagorath
Avatar
Send message
Joined: 4 Jul 11
Posts: 151
Credit: 42,738
RAC: 0

Message 1131 - Posted: 19 May 2012 | 10:35:00 UTC - in response to Message 1130.

It's good to know it's not my script.

The fact you swapped the cable and detector and get the same problem is a strong indication it's the machine. The admins have said a few different times some USB port implementations will not work well with the detector (I hope am stating correctly what they said). Maybe that machine has one of those problem ports?

Not sure I have this right but at http://msdn.microsoft.com/en-us/library/ff568641%28v=vs.85%29.aspx the links on the left side of the page, lower down, seem to be tests you can run to check your USB port to see if it meets Microsoft's standards? I didn't really study those links very well so maybe they are not what I think they are.

____________

Profile Ascholten
Send message
Joined: 17 Sep 11
Posts: 112
Credit: 525,421
RAC: 0

Message 1132 - Posted: 19 May 2012 | 16:28:38 UTC - in response to Message 1131.

I hear what you are saying Dagorath but to have it work for a few months and then all of the sudden decide it wants to throw random tantrums? Well... maybe it's the excuse to buy the 64 core cruncher I been wanting to get.

Aaron
____________


Dagorath
Avatar
Send message
Joined: 4 Jul 11
Posts: 151
Credit: 42,738
RAC: 0

Message 1133 - Posted: 19 May 2012 | 21:59:44 UTC - in response to Message 1132.

Oh! Sorry, I didn't realize it had been working for months until now. If it happens to be on your P4 or Celeron then you should consider the problem might be due to failing hardware. Could be something as simple as a leaking capacitor. For rigs that old it doesn't make much sense to bother determining the cause let alone fix it. They use so much power and do so little work compared to new technology. Hope it's not your new Phenom II system though.

Specing out a new cruncher and setting it up is way more fun than fixing weird crunchers. Put a stinky fried transformer in the case, tell the wife to smell it and then tell her you're worried the machine is going to start shooting sparks out the vents and cause a fire. She'll insist you get a new one. And remember, the money you can save by putting Linux on it instead of Windows can be put toward a new GTX 680.

____________

Profile Ascholten
Send message
Joined: 17 Sep 11
Posts: 112
Credit: 525,421
RAC: 0

Message 1134 - Posted: 20 May 2012 | 2:07:55 UTC

Unfortunately it's my 6 core that's having the fits. I wonder if the GPU grid tasks are any part of the culprit, ive seen them do some off things from time to time. It's been behaving so far today though.

I think Ill still see what that mega cruncher is going to set me back, need to do a little research on the 16 core chips still. See several of them out there but hear some of them share math processors so that will not work well. I won't get the 800 dollar top of the line each chip but don't want the on the cheap one either. It looks like the Tyan motherboard will handle them easily too and I might even be able to put a good gtx on it too.

Aaron
____________


Dagorath
Avatar
Send message
Joined: 4 Jul 11
Posts: 151
Credit: 42,738
RAC: 0

Message 1135 - Posted: 20 May 2012 | 6:01:34 UTC - in response to Message 1134.

I bet you have the type of USB port they said can give problems with the detector once in a while. The USB on the detector is implemented in software, not hardware. A detector with a hardware USB might work better.

My detector is connected to my machine that has the GTX 570 and crunches GPUgrid 24/7 and I have no detector-USB problems at all. I doubt it's the GPUgrid tasks because they don't use USB in any way.

____________

Post to thread

Message boards : Number crunching : another wierd error


Main page · Your account · Message boards


Copyright © 2024 BOINC@Poland | Open Science for the future