I keep running into the same error in my calculations for larger k grids. Everything goes smoothly until the BSE kernel calculation is finished, and then the computation crashes with a "too many communicators" error and no other explanation before it can start the haydock calculation (I believe this or something similar also happens when I'm trying to do a slepc calculation, but I'm not sure if the problems are related).
Code: Select all
Fatal error in PMPI_Comm_split: Other MPI error, error stack:
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
PMPI_Comm_split(1294)...............: MPI_Comm_split(MPI_COMM_WORLD, color=2015, key=1, new_comm=0x1516916bb858) failed
PMPI_Comm_split(1276)...............:
MPIR_Comm_split_allgather(1005).....:
MPIR_Get_contextid_sparse_group(615): Too many communicators (0/2048 free on this process; ignore_id=0)
I'm not really sure how to approach this issue, so any advice is appreciated.
Best,
Miles