x
login Signup

ElectricAccelerator: unable to start agents on Redhat MRG

Installed a 6.0 cluster manager on a machine running Redhat MRG, an "application compatible" real-time variant of Redhat EL 5. I am having trouble installing agents on a similar machine. The installation initially fails while trying to compile the lofs driver (a compile issue related to our trying to compile lofs with the Redhat MRG kernel source).

  • Using lsmod and the eaagent log, I noted that the efs driver, however, had been installed correctly.

  • I then set SKIP_LOFS to 1 in the agent startup script and attempted to start the agent. The agents appeared to initially start (I noticed momentary traffic in top) and then disappeared.

  • I then attempted to manually execute the agent process using the command line in the startup script but excluding the --daemon option. The best that I could get was the agent process dumping me in a bash shell.

  • I executed the same manual step in strace to see whether the agent behavior had anything to do with the efs driver or an attempt to use the lofs driver but I so no reference to either near the end of the strace output.

  • Lastly, note that the installation process failed before the agent would have registered itself with the cluster manager.

So questions:

1) If the installation failed while trying to compile the lofs driver, can I expect to be able to set SKIP_LOFS to 1 and successfully start the service?

2) What is the recommended method for running an agent with verbose logging to find out what it is doing (and why it's exiting in daemon mode)?

avatar image By mreid 775 asked Apr 30, 2012 at 12:25 PM
more ▼
(comments are locked)
10|750 characters needed characters left

2 answers: sort voted first

FINAL ANSWER:

Note that one should not attempt to install and run an EA agent on Redhat MRG. MRG touts itself as being "application compatible" with Red Hat Enterprise Linux 5; however, this compatibility does not extend to driver code. In fact, the MRG installation documentation explicitly states that real-time and non-real-time drivers not interchangable between MRG and non-realtime variants of Red Hat.

If you have a Redhat MRG emake machine, consider the following:

  • Whether there are any requirements in your build that truly tie it to RH MRG

  • Using RHEL5 has the OS for your agents.

(I am leaving all the other info in here as well as that is generally useful stuff).

UPDATE: After installation, you need to update two locations to fully disable the LOFS component:

  • You should set SKIP_LOFS to 1 in /etc/init.d/ecagent. This will prevent the startup script from trying to build the LOFS kernel module.

  • You should set gSkipLofs to 1 in /opt/ecloud/i686_Linux/bin/runagent. This will instruct the agent not to use LOFS.

You mentioned that you found the following error message in the console*.log files:

 WARNING: Unable to start session: error creating sandbox mountpoint "/efsroots/0/": No such device 
 WARNING: Retrying in 10 seconds...
 ERROR: Unable to start session after retry: error creating sandbox mountpoint "/efsroots/0/": No such device
 ERROR: Exiting.

This indicates that the agent is still trying to use LOFS, so you probably missed setting gSkipLofs.

ORIGINAL ANSWER:

A few things to try:

  • Check /var/log/console*.log for agent errors

  • If you have specified the -logfile option to runagent (unlikely, because you would have had to manually edit ecagent, but possible), verify that it refers to a valid path.

  • Check for agent core dumps.

  • Check /var/log/messages for errors generated either when the EFS module was loaded or when the agent tries to communicate with it on startup.

  • With the agent service stopped, run /opt/ecloud/i686_Linux/bin/install_efs manually, and verify that the post-install tests all pass.

Note that the failure to build the LOFS kernel module is a red flag for compatibility on this platform.

avatar image By eric melski ♦♦ 6k answered Apr 30, 2012 at 01:46 PM
more ▼
(comments are locked)
avatar image mreid Apr 30, 2012 at 03:35 PM

Took a look at the console*.log files, and as you suggested might happen, the error became apparent.

WARNING: Unable to start session: error creating sandbox mountpoint "/efsroots/0/": No such device WARNING: Retrying in 10 seconds... ERROR: Unable to start session after retry: error creating sandbox mountpoint "/efsroots/0/": No such device ERROR: Exiting.

avatar image mreid Apr 30, 2012 at 03:35 PM

Is the driver not working correctly in spite of the fact that it loads seemingly without error? Is this error the result of the installation prematurely aborting due to the lofs driver compilation issue? If the answer to the previous question is yes, is there a way to force the installation to skip the installation of the lofs driver?

avatar image eric melski ♦♦ Apr 30, 2012 at 04:09 PM

@mreid: See my update: if you are disabling LOFS post-install, there are actually two places you need to update.

10|750 characters needed characters left
Your answer
toggle preview:

Up to 8 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.