Out of memory issues while running IP RPA

Various technical topics such as parallelism and efficiency, netCDF problems, the Yambo code structure itself, are posted here.

Moderators: Davide Sangalli, andrea.ferretti, myrta gruning, andrea marini, Daniele Varsano, Conor Hogan, Nicola Spallanzani

Post Reply
muhammadhasan
Posts: 36
Joined: Tue Aug 27, 2024 4:42 am

Out of memory issues while running IP RPA

Post by muhammadhasan » Fri Nov 08, 2024 4:49 pm

Hi Professor,

I am doing a dielectric function calculation using IP RPA (For Gold, 3D) . I have seen the following error message:

Code: Select all

slurmstepd-node-161: error: Detected 1 oom-kill event(s) in StepId=48909.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: node-161: task 48: Out Of Memory
slurmstepd-node-160: error: Detected 1 oom-kill event(s) in StepId=48909.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
slurmstepd-node-159: error: Detected 1 oom-kill event(s) in StepId=48909.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
I have attached necessary files if you can help me to solve the problem. Here is my input file (as follows). Now I am considering only one Q points to check the convergence, however, I would have to consider near 1000 points later.

Code: Select all

optics                           # [R] Linear Response optical properties
infver                           # [R] Input file variables verbosity
chi                              # [R][CHI] Dyson equation for Chi.
dipoles                          # [R] Oscillator strenghts (or dipoles)
Nelectro=  1216.00               # Electrons number
ElecTemp= 0.0388         eV    # Electronic Temperature
BoseTemp=-1.000000         eV    # Bosonic Temperature
OccTresh= 0.100000E-4            # Occupation treshold (metallic bands)
Chimod= "IP"                     # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% QpntsRXd
    1 |  1 |                       # [Xd] Transferred momenta
%
% BndsRnXd
    1 |  800 |                       # [Xd] Polarization function bands
%
% EnRngeXd
  0.00000 | 10.00000 |         eV    # [Xd] Energy range
%
% DmRngeXd
 0.100000 | 0.100000 |         eV    # [Xd] Damping range
%
ETStpsXd= 1001                    # [Xd] Total Energy steps
% LongDrXd
 1.000000 | 0.000000 | 0.000000 |        # [Xd] [cc] Electric Field
%
And finally submit job file on the cluster:

Code: Select all

#!/usr/bin/env bash
#SBATCH --job-name=Au_300K
#SBATCH --nodes=3                      # node count
#SBATCH --ntasks-per-node=24         # number of tasks per node
#SBATCH --cpus-per-task=1            # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=5gb                    # Job memory request
#SBATCH --time=60:00:00               # Time limit hrs:min:sec
#SBATCH --output=sdc.txt              # Standard output and error log
#SBATCH --partition=epyc           # MOAB/Torque called these queues

module load yambo
srun yambo -F yambo.in_IP -J Full
Thank you. Please let me know if you need some more info.

Best
Md J Hasan
PhD Student
Mechanical Engineering
University of Maine
You do not have the required permissions to view the files attached to this post.

User avatar
Daniele Varsano
Posts: 4043
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: Out of memory issues while running IP RPA

Post by Daniele Varsano » Fri Nov 08, 2024 5:22 pm

Dear Hasan,

if you have a memory issue, you can try to set the parallelization strategy which distribute memory among cores in your input file.

DIP_CPU= "1 6 12" # [PARALLEL] CPUs for each role
DIP_ROLEs= "k c v" # [PARALLEL] CPUs roles (k,c,v)
X_CPU= "1 1 6 12" # [PARALLEL] CPUs for each role
X_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)

If the problem persists, you can try to use less cpu per node in order to have more memory available.

Please note that in these calculations (IP) the q points are independent, you will have an IP spectrum for each q points, so this does not have to do with convergences.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

muhammadhasan
Posts: 36
Joined: Tue Aug 27, 2024 4:42 am

Re: Out of memory issues while running IP RPA

Post by muhammadhasan » Fri Nov 08, 2024 5:53 pm

Hi Professor,

Thank you so much as always.

I have increased the memory of our cluster and now it is working without any error.

Regarding convergence, I am planning to do only these three parameters (shared below). Are these sufficient parameters for convergence considering IP RPA, professor? Would you please suggest me about how can I proceed for DmRngeXd? I have seen maximum examples, they don't consider the convergence of this parameter. How do I can find best DmRngeXd?
1) FFTGvecs= 99845
2) % BndsRnXd
1 | 800 | # [Xd] Polarization function bands
%
3) % DmRngeXd
0.100000 | 0.100000 | eV # [Xd] Damping range
%

Thank you

Best
Md J Hasan
PhD Student
Mechanical Engineering
University of Maine

Post Reply