login Signup

Commander step work is done, but step is still 'running'.

A new procedure has a stop which writes a script file, ssh's that to a different machine, and then when the ssh process ends, prints a message and is intended to terminate. The remote process runs as expected, the message after the ssh command finishes is displayed, but the running step doesn't end, even after hours. In the commander log I see this:
cf0ae55e-de14-11e8-9d20-00505694438f Run_FMB cefcb447-de14-11e8-8638-00505694438f F10-NGOS_FMB_integration_7_201811011329 FMB Run_FMB

--- Truncated to 8192 characters ---

We're running Build Version:

The step shell is set to /bin/bash

What should I do ?

avatar image By dave_tarbox@dell.com 15 asked Nov 02, 2018 at 12:19 AM
more ▼
(comments are locked)
avatar image dave_tarbox@dell.com Nov 02, 2018 at 01:12 AM

When I abort the step, it still doesn't end.

10|750 characters needed characters left

2 answers: sort voted first

I'm not sure what caused this issue, perhaps the remote script did not terminate as expected. In any case, you may want to consider two alternative approaches: install an agent on the target machine or use proxy agent which essentially does what you described but as a standard feature.

avatar image By gregm 2k answered Nov 02, 2018 at 02:00 PM
more ▼
(comments are locked)
avatar image dave_tarbox@dell.com Nov 02, 2018 at 02:46 PM

Thanks for the suggestions. We now have an agent on the machine where the command must run, and I changed the job to use that. Unfortunately, that doesn't seem to make any difference. The step continues to run with no work left to do. In the agent log I see that it sent to the server a message saying that the build has completed (finishCommand). The thing I find most confusing is that when I 'abort' the step, it doesn't finish. The build page shows that the step took 88 minutes, even though it started 10 hours ago.

10|750 characters needed characters left

This is typically due to the fact the agent cannot connect the server to tell it the work is done. Here are a couple of potential causes:

  1. the port 7800 (default port to communicate between agent and server) is open only in one direction (server to agent as the work starts). That port needs to be open in both direction as the connection is severed and then the agent initiate a new connection to the server to indicate it is done.

  2. the name of the flow server does not resolve from the agent. Check commander.properties fro the attribute COMMANDER_SERVER_NAME as well and the server settings property "Server IP address".

  3. if you use gateway and load balancers, certainly a bad configuration somewhere in the chain. Contact support in this case for a more detailed answer.

avatar image By lrochette 5.9k answered Nov 02, 2018 at 04:30 PM
more ▼
(comments are locked)
avatar image dave_tarbox@dell.com Nov 02, 2018 at 11:31 PM

Thanks, Laurent, but those can't be the causes, either. The step which hangs is the third step in the procedure. The first two also use this resource, and successfully finish. A thought occurs to me: this step runs emake on the agent. Is there some way that affects the interaction between commander agent and server ? I still find it surprising that the step continues to remain running even though the web UI shows that I've aborted it. ... reverting to use only gmake doesn't help. These jobs still never terminate, can only be ended by aborting the whole build.

avatar image lrochette Nov 05, 2018 at 12:02 AM

in the past I have seen issues with steps that start daemon-like commands as it appears to never finish (from Flow point of view) even if the work is done but emake should finish. I've ran it in the past without issues. You may want to open a ticket with support so they can dig in your logs.

10|750 characters needed characters left
Your answer
toggle preview:

Up to 8 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question



asked: Nov 02, 2018 at 12:19 AM

Seen: 54 times

Last Updated: Nov 05, 2018 at 12:02 AM

Related Questions