Error in MPI in BSE calculations

Deals with issues related to computation of optical spectra, solving the Bethe-Salpeter equation.

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano

Post Reply
DmitrySkachkov
Posts: 11
Joined: Mon Dec 13, 2021 8:52 pm
Contact:

Error in MPI in BSE calculations

Post by DmitrySkachkov » Thu Sep 26, 2024 9:52 pm

Hello,

I have an error in hybrid OpenMP-MPI calculations of BSE on 3d step (BSE solver), whereas, 1st step (BSE screening) and second step (BSE Kernel) are calculated without any errors.

The error:
[r2x08:04448] *** An error occurred in MPI_Allreduce
[r2x08:04448] *** reported by process [1228734465,28]
[r2x08:04448] *** on communicator MPI_COMM_WORLD
[r2x08:04448] *** MPI_ERR_COUNT: invalid count argument
[r2x08:04448] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[r2x08:04448] *** and potentially your MPI job)


The OpenMP-MPI version of Yambo was compiled from the GitHub version with the following modules:
intel/2020u4
intel-mkl/2020u4
openmpi/4.1.5:intel-2020
using the following configuration:
> ./configure --enable-memory-profile --enable-dp --enable-open-mp --enable-par-linalg \
FC=ifort F77=ifort CC=icc MPICC=mpicc MPIFC=mpifort
The compiled version:
This is yambo - MPI+OpenMP+SLK+HDF5_MPI_IO - Ver. 5.2.0 Revision 23096 Hash f147e08b32
The input file for the 3d step (BSE solver):

# __ __ ________ ___ __ __ _______ ______
# /_/\/_/\ /_______/\ /__//_//_/\ /_______/\ /_____/\
# \ \ \ \ \\::: _ \ \\::\| \| \ \\::: _ \ \\:::_ \ \
# \:\_\ \ \\::(_) \ \\:. \ \\::(_) \/_\:\ \ \ \
# \::::_\/ \:: __ \ \\:.\-/\ \ \\:: _ \ \\:\ \ \ \
# \::\ \ \:.\ \ \ \\. \ \ \ \\::(_) \ \\:\_\ \ \
# \__\/ \__\/\__\/ \__\/ \__\/ \_______\/ \_____\/
#
#
# Version 5.2.0 Revision 23096 Hash (prev commit) f147e08b32
# Branch is master
# MPI+OpenMP+SLK+HDF5_MPI_IO Build
# http://www.yambo-code.eu
#
bss # [R] BSE solver
optics # [R] Linear Response optical properties
dipoles # [R] Oscillator strenghts (or dipoles)
bse # [R][BSE] Bethe Salpeter Equation.
BSKmod= "SEX" # [BSE] IP/Hartree/HF/ALDA/SEX/BSfxc
BSEmod= "retarded" # [BSE] resonant/retarded/coupling
BSSmod= "d" # [BSS] (h)aydock/(d)iagonalization/(s)lepc/(i)nversion/(t)ddft`
BSENGexx= 40 Ry # [BSK] Exchange components
BSENGBlk=-1 RL # [BSK] Screened interaction block size [if -1 uses all the G-vectors of W(q,G,Gp)]
#WehCpl # [BSK] eh interaction included also in coupling
KfnQPdb= "E < SAVE/ndb.QP" # [EXTQP BSK BSS] Database action
KfnQP_INTERP_NN= 1 # [EXTQP BSK BSS] Interpolation neighbours (NN mode)
KfnQP_INTERP_shells= 20.00000 # [EXTQP BSK BSS] Interpolation shells (BOLTZ mode)
KfnQP_DbGd_INTERP_mode= "NN" # [EXTQP BSK BSS] Interpolation DbGd mode
% KfnQP_E
0.000000 | 1.000000 | 1.000000 | # [EXTQP BSK BSS] E parameters (c/v) eV|adim|adim
%
KfnQP_Z= ( 1.000000 , 0.000000 ) # [EXTQP BSK BSS] Z factor (c/v)
KfnQP_Wv_E= 0.000000 eV # [EXTQP BSK BSS] W Energy reference (valence)
% KfnQP_Wv
0.000000 | 0.000000 | 0.000000 | # [EXTQP BSK BSS] W parameters (valence) eV| 1|eV^-1
%
KfnQP_Wv_dos= 0.000000 eV # [EXTQP BSK BSS] W dos pre-factor (valence)
KfnQP_Wc_E= 0.000000 eV # [EXTQP BSK BSS] W Energy reference (conduction)
% KfnQP_Wc
0.000000 | 0.000000 | 0.000000 | # [EXTQP BSK BSS] W parameters (conduction) eV| 1 |eV^-1
%
KfnQP_Wc_dos= 0.000000 eV # [EXTQP BSK BSS] W dos pre-factor (conduction)
% BSEQptR
1 | 1 | # [BSK] Transferred momenta range
%
% BSEBands
111 | 122 | # [BSK] Bands range
%
% BEnRange
0.00000 | 10.00000 | eV # [BSS] Energy range
%
% BDmRange
0.100000 | 0.100000 | eV # [BSS] Damping range
%
BEnSteps= 100 # [BSS] Energy steps
% BLongDir
0.000000 | 1.000000 | 0.000000 | # [BSS] [cc] Electric Field
%
BSEprop= "abs" # [BSS] Can be any among abs/jdos/kerr/magn/dich/photolum/esrt
BSEdips= "none" # [BSS] Can be "trace/none" or "xy/xz/yz" to define off-diagonal rotation plane
WRbsWF # [BSS] Write to disk excitonic the WFs

DIP_Threads= 0 # [OPENMP/X] Number of threads for dipoles
X_Threads= 0 # [OPENMP/X] Number of threads for response functions
K_Threads= 0 # [OPENMP/BSK] Number of threads for response functions
NLogCPUs= 10 # [PARALLEL] Live-timing CPU`s (0 for all)
PAR_def_mode= "balanced" # [PARALLEL] Default distribution mode ("balanced"/"memory"/"workload")

Could you please suggest how to improve it.

Thank you,
Dmitry
Dmitry Skachkov
University of Central Florida
https://github.com/Dmitry-Skachkov

User avatar
Daniele Varsano
Posts: 4048
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: Error in MPI in BSE calculations

Post by Daniele Varsano » Fri Sep 27, 2024 8:42 am

Dear Dmitry,

the report/log files would be useful to understand at which point the error appears and the dimensions of the BS matrix.
In the meanwhile, as you are opting for a full diagonalization, unless the BS matrix is very large you can do the diagonalization in serial.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

DmitrySkachkov
Posts: 11
Joined: Mon Dec 13, 2021 8:52 pm
Contact:

Re: Error in MPI in BSE calculations

Post by DmitrySkachkov » Fri Sep 27, 2024 2:57 pm

Dear Daniele,

Thank you for the reply.

Here are the report and log files.
The system for running is 1 node with 64 cores and 2000Gb of memory.
You do not have the required permissions to view the files attached to this post.
Dmitry Skachkov
University of Central Florida
https://github.com/Dmitry-Skachkov

User avatar
Daniele Varsano
Posts: 4048
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: Error in MPI in BSE calculations

Post by Daniele Varsano » Mon Sep 30, 2024 7:49 am

Dear Dmitry,

it seems a problem due to the scalapack: note that you are running with 32 MPI, using a 4x4 scalapack grid.
You can try to set a different parallel strategy for the diagonalization as 4,16,64 etc.. setting the corresponding resource in the job script, and see if the problem persists This can be done by setting the BS_nCPU_LinAlg_DIAGO variable (using the option -V par variable governing the parallelization strategy will appear).

Having said that, in your case you have a quite large BS matrix (dimension=46656) and I strongly suggest you to try iterative algorithms, i.e. Haydock if you are interested in the spectra only, or slepc if you are interested also in the firsts eigenvecotrs. Note that to use the slepc algorithm the libraries should be linked and this is done adding the slepc option in the configure command. (--enable-slepc-linalg)

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

Post Reply