[Issues] some CPUs are not used

Ewout M. Helmich helmich at astro.rug.nl
Thu Sep 27 16:19:53 CEST 2007


Hi Johannes,

I'm not sure I completely understand your problem, but I can explain a 
few things. In DBRecipes/mods/Pipeline.py a variable GROUP_SIZE is used 
which in the case of the image pipeline is 8. This results in your 20 
filenames being split up in groups of 8, 8 and 4. The GROUP_SIZE was 
chosen so as to work best for the HPC cluster in Groningen, in 
particular because of the ~30min job limitation in the "short queue" 
here. If the number of processes per node (CPUs/cores) is 2, as in 
Groningen, that means two nodes are reserved in the call to the PBS 
queuing system, where one is handling 16 frames and the other 4. That is 
not very balanced and we could try to optimize by dividing the load 
evenly. On the other hand I doubt that this alone would be a serious 
problem (the main question here being whether the node that handles 4 
files is occupied for the entire time the node that handles 16 is busy). 
You mention losing 50% of the CPUs; how is that exactly? Are your 
submitting many jobs where you specify 1 filename?

Regards,
Ewout

John P. McFarland wrote:
> Hi Johannes,
>
> The DPU/CPU behavior might have something to do with the cluster queueing 
> system not controlled by the DPU, but that is only a guess.  For now, you 
> could simply try to optimize the lists you use both CPUs on one node and no 
> others if possible.
>
> I'm CCing this to the Issues list so that anybody else with some ideas 
> (especially our DPU experts) can chime in.
>
> Cheers,
>
>
> -=John
>
>
> On Mon, 24 Sep 2007, Johannes Koppenhoefer wrote:
>
>   
>> Hello John,
>>
>> I have realized, that if you submit a job on the dpu with the 
>> red_filenames option, and the number of files is e.g. 20 it results in 3 
>> CPU jobs, two on one node and on on the next node. Now, for some reason I 
>> do not understand, the second CPU on the second node is not going to be 
>> used by other processes. This is in particular painful in my situation 
>> where I have to submit jobs that run on a single CPU, because I can use 
>> only half of the CPUs on our cluster and the rest is blocked. Is there any 
>> reason for this dpu-behavior? Do you know of any quick workaround for me?
>>
>> Cheers,
>> Johannes
>>
>>     
> _______________________________________________
> Issues mailing list
> Issues at astro-wise.org
> http://listman.astro-wise.org/mailman/listinfo/issues
>   

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Drs. Ewout Helmich		<><>
Kapteyn Astronomical Institute	<><> Astro-WISE/OmegaCEN
Landleven 12			<><>
P.O.Box 800			<><> email: helmich at astro.rug.nl
9700 AV Groningen		<><> tel  : +31(0)503634548
The Netherlands			<><>
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=



More information about the Issues mailing list