MARCONI100 back in production and possible killed jobs

  1. Home
  2. /
  3. Newsletter
  4. /
  5. MARCONI100 back in production...

Dear Users,
the maintenance operations on MARCONI100 have been completed and the cluster is now back in production.
During the maintenance, Slurm scheduler has been updated to version 21.08.1 . Unfortunately, this caused an issue that we are investigating and that may have affected your jobs. In case of failure with an error log about GPU device unavailability, as a workaround please change the directive “–gres=gpu:xx” with “–gpus-per-node=xx” in your jobscript.
We apologies for the inconvenience and we will let you know as soon as the problem is solved.
Regards,HPC User Support – CINECA