login Signup

CM build time longer that time in anno file

I’m trying to determine why my build time is longer in the CM vs the anno file. Typically, I’ve seen this issue when there are a lot of Commit by copy files. In my case a 17 minute build, as indicated by the annotation file, shows as 22 minutes in the CM. No files are committed by copy. What else should I be looking for?


Here are the metrics @Eric Melski asked to see:

   Network to agent: 5576.9 MB, 5.9 MB/s active, 4.1 MB/s overall 
   Network from agent: 8498.8 MB, 4.0 MB/s active, 6.2 MB/s overall 
   To disk: 14548.1 MB, 7.2 MB/s active, 10.7 MB/s overall 
   From disk: 7825.3 MB, 6.7 MB/s active, 5.7 MB/s overall 
   Create/Modify usage: 14549.8 MB, 10.7 MB/s overall
   Commit by copy: 0 files 0.0 MB, 0.0 MB/s overall 
   Commit by rename: 135683 files 14551.0 MB, 10.7 MB/s overall 
   Commit by remove (pre): 0 files 0.0 MB, 0.0 MB/s overall 
   Commit by remove (tmp): 3 files 1.2 MB, 0.0 MB/s overall 
   Conflicts: 0 Job Count: 26106
 Counter values: 
   DirCache misses            738,431 
   DirCache probes            939,365 
   DirCache readdirs           27,463 
   DirCache stats             257,803 
   DirCache symlinks              297 
   File rename retries              0 
   NameStringPool hits     19,937,843 
   NameStringPool lookups  20,200,950 
   Total appends                    0 
   Total creates              152,119 
   Total exists                     0 
   Total failed lookups    18,418,962 
   Total links                      0 
   Total lookups            5,600,917
   Total modifies                   0
   Total modify atts              991
   Total new names            124,297
   Total reads              1,349,497
   Total renames                1,021
   Total submakes                   0
   Total unlinks                    5

avatar image By dave 663 asked Apr 30, 2012 at 03:48 PM
more ▼
(comments are locked)
10|750 characters needed characters left

1 answer: sort voted first

Between the end of the last job and the end of the build, emake does a number of things:

  • Conflict checks on any as-yet-unterminated jobs. A j-level debug log will give you some insight into what's happening here, without creating a huge amount of data.

  • Committing files to their final location for as-yet-unterminated jobs. This can be slow if you have a high "Commit by Copy" metric, or if you just have a slow disk in general (check the "Bandwidth" metrics in the emake g-level debug log for the disk performance).

  • Annotating as-yet-unterminated jobs. This can be slow if you have a high-level of annotation detail enabled.

  • Updating the ledger, if --emake-ledger is enabled. This can be slow depending on the options you have enabled.

  • Uploading annotation to the cluster manager, unless annotation upload is disabled. This can be slow if your annotation files are very large.

  • Deleting temporary files. This can be slow if your disk is slow or if you have a large number of intermediate versions of files.

Yes, there's a lot of stuff there relating to "unterminated" jobs -- remember, emake is terminating (committing) jobs in serial order, so there could be an enormous number of jobs which have completed (finished running on the cluster) but which are not yet terminated. All the processing for all of those jobs has to be finished, one-at-a-time.

avatar image By eric melski ♦♦ 6k answered Apr 30, 2012 at 05:29 PM
more ▼
(comments are locked)
avatar image dave May 01, 2012 at 03:07 PM

We ran another build using fewer agents (12 vs. 24). The 12 agent build shows the same time between the CM and anno file. The only points that may affect us from your answer are the first and last. We have no Commit by Copy files, anno is file/history/waiting (is this considered high?), no ledger options, and anno isn't uploaded.

With 12 agents, the build ran in 25 minutes, 44 seconds compared with 22 minutes 42 seconds for the 24 agent build (17:14 in the annotation).

avatar image dave May 01, 2012 at 03:07 PM

Would the 12 extra agents explain this 5 minute lag? The build is from a network filer with slow disk I/O. We will run another build with debug=j. Is there anything specifically we should look for?

avatar image eric melski ♦♦ May 01, 2012 at 03:54 PM

@Dave Goldman: if your disk is slow, even commit-by-rename can be slow. Why don't you post the "Bandwidth", "Terminator" and "Counter values" sections of your emake performance metrics (as an update to your question)?

avatar image eric melski ♦♦ May 01, 2012 at 03:56 PM

In a j-level debug log, I would look for the delta between consecutive "terminating" and "terminated with" messages. I would also see if there is a large gap between the "terminated with" message for the final end job in the build and the actual end of the build.

avatar image eric melski ♦♦ May 01, 2012 at 06:35 PM +

@Dave Goldman: the metrics you've shown so far don't reveal anything unusual. Let's see what the end of your j-level debug log shows us.

10|750 characters needed characters left
Your answer
toggle preview:

Up to 8 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.