I was trying to use GW0 method to calculate the corrected electronic bands of solids implemented in YAMBO (v 3.4.2). Please check the GW input file as below:
The job works well only if I use small number of the cores (e.g 8 or less) on HPC clusters:gw0 # [R GW] GoWo Quasiparticle energy levels
ppa # [R Xp] Plasmon Pole Approximation
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
EXXRLvcs= 20 Ry # [XX] Exchange RL components
Chimod= "Hartree" # [X] IP/Hartree/ALDA/LRC/BSfxc
% QpntsRXp
1 | 34 | # [Xp] Transferred momenta
%
% BndsRnXp
1 | 280 | # [Xp] Polarization function bands
%
NGsBlkXp= 1 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 0.000000 | 0.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
% GbndRnge
1 | 280 | # [GW] G[W] bands range
%
GDamping= 0.100000 eV # [GW] G[W] damping
dScStep= 0.100000 eV # [GW] Energy step to evalute Z factors
DysSolver= "n" # [GW] Dyson Equation solver (`n`,`s`,`g`)
%QPkrange # [GW] QP generalized Kpoint/Band indices
1| 1| 210| 222|
%
%QPerange # [GW] QP generalized Kpoint/Energy indices
1| 34| 0.0|-1.0|
%
srun --ntasks=8 --hint=nomultithread --ntasks-per-node=8 --ntasks-per-socket=4 --ntasks-per-core=1 --mem_bind=v,local ${YAMBO_HOME}/bin/yambo -F INPUTS/06_BSE -J 06_BSE
But if I increased the number of cores, like 32 (one node) or even more, it always stopped like
It seems that increasing numbers of cores doesn’t accelerate the calculations. I would like to know how to handle the large system with more than one thousand cores and run the jobs parallel more efficiently.<---> [01] Files & I/O Directories
<---> [02] CORE Variables Setup
<---> [02.01] Unit cells
<01s> [02.02] Symmetries
<01s> [02.03] RL shells
<01s> [02.04] K-grid lattice
<01s> [02.05] Energies [ev] & Occupations
<01s> [03] Transferred momenta grid
<01s> [04] Bare local and non-local Exchange-Correlation
<01s> [Distribute] Average allocated memory is [o/o]: 7.502401
<01s> [M 0.773 Gb] Alloc WF ( 0.721)
<02s> [FFT-HF/Rho] Mesh size: 30 30 95
<02s> [WF-HF/Rho loader] Wfs (re)loading | | [000%] --(E) --(X)
<02s> [M 0.996 Gb] Alloc wf_disk ( 0.222)
<08s> [WF-HF/Rho loader] Wfs (re)loading |# | [009%] 05s(E) 58s(X)
<14s> [WF-HF/Rho loader] Wfs (re)loading |#### | [020%] 11s(E) 56s(X)
<19s> [WF-HF/Rho loader] Wfs (re)loading |###### | [032%] 17s(E) 53s(X)
<25s> [WF-HF/Rho loader] Wfs (re)loading |######## | [043%] 23s(E) 52s(X)
<31s> [WF-HF/Rho loader] Wfs (re)loading |########### | [055%] 28s(E) 51s(X)
<37s> [WF-HF/Rho loader] Wfs (re)loading |############# | [067%] 34s(E) 50s(X)
<42s> [WF-HF/Rho loader] Wfs (re)loading |############### | [079%] 40s(E) 50s(X)
<48s> [WF-HF/Rho loader] Wfs (re)loading |################## | [091%] 46s(E) 50s(X)
<51s> [WF-HF/Rho loader] Wfs (re)loading |####################| [100%] 49s(E) 49s(X)
<51s> [M 0.775 Gb] Free wf_disk ( 0.222)
<51s> EXS | | [000%] --(E) --(X)
<56s> P001: EXS |### | [016%] 05s(E) 29s(X)
<01m-01s> P001: EXS |###### | [033%] 10s(E) 29s(X)
<01m-06s> P001: EXS |########## | [050%] 15s(E) 29s(X)
<01m-11s> P001: EXS |############# | [067%] 20s(E) 29s(X)
<01m-16s> P001: EXS |################ | [084%] 25s(E) 29s(X)
<01m-20s> P001: EXS |####################| [100%] 28s(E) 28s(X)
<01m-20s> [xc] Functional Perdew, Burke & Ernzerhof(X)+Perdew, Burke & Ernzerhof(C)
<01m-20s> [xc] LIBXC used to calculate xc functional
<01m-20s> [M 0.052 Gb] Free WF ( 0.721)
<01m-21s> [05] Dynamic Dielectric Matrix (PPA)
<01m-21s> [Distribute] Average allocated memory is [o/o]: 77.85714