Page MenuHomeAleph Objects Inc

Quiver crashes (restarts) after repeated probe fails printing via Cura
Open, HighPublic

Description

Quiver with Dual Extruder, Marlin 2.0.0.88
Cura 3.6.5, printing over USB

If the quiver fails the probe fail twice, the firmware will restart instead of displaying a probe fail error as the TAZ 6 does.

Event Timeline

karrad added a subscriber: karrad.Feb 18 2019, 10:16 AM

@youngmrcarlson Can you reflash to .88 and re-test? Be sure to restore factory defaults after updating firmware

youngmrcarlson renamed this task from Quiver crashes (restarts) during bed probe sequence to Quiver crashes (restarts) after repeated probe fails.Feb 18 2019, 11:55 AM
youngmrcarlson updated the task description. (Show Details)

@karrad The issue with the probing hot end being retracted seems to be resolved, and it's just that the printer restarts instead of displaying an error if it fails probing twice. Updated task to reflect.
Tested using Cura 3.6.5 and .88 FW. Reproduced just by disconnecting zero sense from one hot end (to fake a probe fail) and starting a print.

@youngmrcarlson Ahh thank you. One last thing to check, can you force a fail while printing via USB stick? Will help determine if FW or Cura issue, or communication between the two

@karrad That's what I started testing right after my last comment :)
Probe fail message displays as expected when printing from USB drive - only affects tethered printing over USB

karrad renamed this task from Quiver crashes (restarts) after repeated probe fails to Quiver crashes (restarts) after repeated probe fails printing via Cura.Feb 18 2019, 12:08 PM
karrad changed the edit policy from "All Users" to "Cura LulzBot Edition (Project)".
karrad added a project: Cura LulzBot Edition.
alexei triaged this task as High priority.Feb 18 2019, 12:18 PM
alexei changed the edit policy from "Cura LulzBot Edition (Project)" to "Marlin (Project)".
alexei removed a project: Cura LulzBot Edition.
marcio added a subscriber: marcio.Feb 18 2019, 12:35 PM

It looks like the machine crashes during a move towards the middle of the bed right after wiping after the second probe fail. Is this what you are seeing?

So what is interesting is that if I connect to the printer via a serial console, and type "G29", the probe repeats the correct number of times and shows an error. So it seems like Cura is doing something that is causing the printer to reboot.

marcio added a comment.EditedFeb 18 2019, 1:44 PM

Cura is expecting certain commands to complete in under 100 seconds. This is certainly no longer the case for G29, which on Quiver, as the worst case scenario, is several minutes. I suspect that when Cura tries to "wake-up" Marlin, it is causing the crash. One very useful test to run would be to increase the following value to 1000 in MarlinSerialProtocol.py and see if the problem goes away:

self.slowTimeout            = 100

Here is the line that I am referring to:

https://code.alephobjects.com/source/cura-lulzbot/browse/master/plugins/USBPrinting/MarlinSerialProtocol.py$100

Another thing I noticed is that before the crash, the output in the serial console seems to freeze up and I am no longer seeing output in the Cura console. So something appears to be failing with regards to serial communications.

@marcio Nice! I changed that value to 1000 and it now displays the error as expected.

marcio claimed this task.Feb 18 2019, 2:15 PM

I was able to reproduce that error outside of Cura by using a script I have that prints from the command line using "MarlinSerialProtocol.py". So now I can test this outside of Cura, which will be a big help in tracking it down efficiently. There are actually two problems rolled up into one here. For one, Marlin should not crash, no matter what gets sent to it from the serial port. Second, from my initial testing it is clear that the MarlinSerialProtocol is not behaving how it should with regards to timeouts on Quiver. This is a separate issue that will need to be addressed.

If worse comes to worse, we know we can avoid the problem by setting that timeout higher, but I would like to see if I can track down what is causing Marlin to crash first. I'll assign this ticket to myself.

If serial data can cause Marlin to crash, this may actually explain T4900, T4916 which Dani recently closed. It's possible we just haven't run into serial errors because the Archim board is more reliable after the fixes Mark implemented in hardware, but I suspect that the serial error recovery is what is causing the Archim to crash. It's good that we found a good test case to reproduce the problem.

I have not been able to determine under what circumstances Marlin crashes, but it seems like the root cause of this may have been that Cura was not waiting long enough for G29 to complete before deciding Marlin was unresponsive and attempting to revive it. However, this was happening multiple times in the row and it was filling the serial output buffer, eventually causing Cura's serial code to block. It's unclear why this ultimately leads Marlin to crash, but I have pushed a patch that will keep Cura from initiating this process due to failed probes.

alexei added a subscriber: alexei.Mar 7 2019, 10:47 AM

@marcio , Merged your workaround to CuraLE master.

It looks like this problem is known on 32-bit boards. At the moment there is a possible workaround, but not a fix.

https://github.com/MarlinFirmware/Marlin/issues/11715

DaniAO added a subscriber: DaniAO.Wed, May 22, 10:57 AM

We have had a a few firmware updates since the last few comments- are we still seeing this as a problem? Or can we close this ticket?

Just tested it with 2.0.0.110 and it shows the "Probing failed" message as expected.

@youngmrcarlson will you update your firmware to .116 and try it. If there are no issues please close the ticket.

Thanks1

@DaniAO Works on .116 as well, but I don't have permissionsauce to mark this Resolved.

Sounds good. Per Alexei we will leave it open until it's resolved upstream

jebba added a subscriber: jebba.Wed, May 22, 2:50 PM

@alexei Why not close this ticket though? The problem in this T5686 ticket is that Marlin restarts when taking too long in Quiver, but that is no longer an issue, it correctly says "probing failed". If we want to track whatever is going on in upstream Marlin's #11715, isn't that a separate issue?

@jebba : The fix rCT182c043e604d and rCT1e83b65d02f6 was done in CuraLE only, so the bug is still present in Marlin therefore other than CuraLE programs, such as Octoprint would have the same problem.