System and Infrastructure Status News
Delta Notice: Delta maintenance 01-23-2024 - 01-25-2024
PublishedInfrastructure News Type: Reconfiguration
Affected Infrastructure: delta-cpu.ncsa.access-ci.org, delta-gpu.ncsa.access-ci.org
Start Date: January 23, 2024, 2:00 p.m.
End Date: January 25, 2024, 11:00 p.m.
The Delta resource will undergo maintenance starting 8:00AM on Tuesday January 23rd, 2024. During the maintenance Delta compute nodes will be upgraded with the HPC Cassini network interface card and will boot with an OS image updated to support the new Slingshot11 communication software stack. Please see the Delta Network Upgrade page at https://wiki.ncsa.illinois.edu/display/DSC/Delta+Network+Upgrade for information on the forthcoming changes. During the maintenance period: • Jobs will continue to be scheduled to run. • Compute nodes will be upgraded in batches on Tuesday and Wednesday. • the dt-login.delta.ncsa.illinois.edu and login.delta.ncsa.illinois.edu alias will point to the upgraded dt-login03 and dt-login04 login nodes. • dt-login01 will be rebooted but will remain available as a Slingshot10 configured login node. Delta resources will be available during the maintenance period: Delta login nodes • On Tuesday users are encouraged to use the dt-login.delta.ncsa.illinois.edu ssh alias or dt-login03.delta.ncsa.illinois.edu and dt-login04.delta.ncsa.illinois.edu in particular to begin to use compute nodes moved to the Slingshot11 configuration. • Jobs submitted from dt-login03 and dt-login04 will automatically be assigned to run on upgraded compute nodes. • dt-login01 will remain available in the Slingshot10 configuration to be used to address any porting issues discovered during the upgrade. • Jobs submitted from dt-login01 will automatically be assigned to run on non-upgraded compute nodes. Delta compute nodes • On Tuesday and Wednesday 1/2 of each node type will be upgraded and moved to the Slingshot11 configuration. • The pool of available upgraded nodes will increase during the day as they are returned to service. • Two compute nodes of each type, except for the gpuA100x8 nodes and the gpuMI100x8 node, will remain on Slingshot10 to address any porting issues discovered during the upgrade. Delta services: • Open OnDemand - will move to supporting Slingshot11 on Tuesday morning after a security software update. Expect a 30 - 60 minute OnDemand outage between 8:00AM and 9:00AM. • Delta Globus Online endpoint - available. Reminder: • Codes that use OpenMPI or similar on the Slingshot10 nodes will need to be rebuilt to run on the upgraded Slingshot11 nodes. • Jobs submitted from dt-login01 will only run on the remaining non-upgraded Slingshot10 computes nodes. A follow-up message will be sent once maintenance is complete. Please send questions to help@ncsa.illinois.edu and be sure to mention Delta in the subject.
Posted: March 20, 2026
Georgia Tech Hive Gateway Scheduled Downtime
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: hive.gatech.access-ci.org
Start Date: January 23, 2024, 12:00 p.m.
End Date: January 26, 2024, 5:59 a.m.
PACE Quarterly Maintenance period is scheduled to begin at 6:00AM on Tuesday, 01/23/2024, and is scheduled to conclude by 11:59PM on Friday, 01/26/2024. Please note, as usual, jobs with resource requests that would be running during the Maintenance Period will be held until after the Maintenance Period by the scheduler. During the Maintenance Period, access to all the PACE managed computational and storage resources will be unavailable. Please see the list of activities to be completed, which are posted at https://blog.pace.gatech.edu/?p=7778
Posted: March 20, 2026
ACCESS XDMoD Scheduled Downtime
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: xdmod.access-ci.org
Start Date: January 16, 2024, 1:00 p.m.
End Date: January 16, 2024, 11:00 p.m.
ACCESS XDMoD (https://xdmod.access-ci.org/ ) will be unavailable from approximately 7:00AM to 5:00PM EDT on Tuesday January 16th 2024 during a scheduled monthly downtime. The downtime will cause a full outage for both XDMoD and the ACCESS Metrics site. These services should be unavailable for only a couple of minutes despite the full-day downtime. A follow-up message will be sent when the downtime is complete.
Posted: March 20, 2026
ACCESS XDMoD Scheduled Downtime
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: xdmod.access-ci.org
Start Date: December 19, 2023, 6:00 p.m.
End Date: December 20, 2023, 12:00 a.m.
UPDATE 12/19/23 15:56 EDT: The downtime is complete and ACCESS XDMoD is back up. Thank you for your patience. ACCESS XDMoD (https://xdmod.access-ci.org/ ) will be unavailable from approximately 12:00PM to 6:00PM EDT on Tuesday December 19th 2023 during a scheduled infrastructure update. This will temporarily be a full outage of the service. A follow-up message will be sent when the update is complete.
Posted: March 20, 2026
Bridges-2 Outage December 19-20
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: bridges2-em.psc.access-ci.org, bridges2-gpu.psc.access-ci.org, bridges2-rm.psc.access-ci.org, bridges2-ocean.psc.access-ci.org
Start Date: December 19, 2023, 12:00 p.m.
End Date: December 21, 2023, 12:00 a.m.
Beginning on Tuesday, December 19 at 6AM Eastern time, the entire PSC machine room (all machines, VMs and filesystems) will be unreachable due to a major networking upgrade. We anticipate that this outage will last until 6PM Eastern time on Wednesday December 20.
Posted: March 20, 2026
SDSC Expanse Maintenance 7AM-Midnight (PT), Monday, Dec 18, 2023 [Completed]
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: expanse.sdsc.access-ci.org, expanse-gpu.sdsc.access-ci.org, expanse-ps.sdsc.access-ci.org
Start Date: December 18, 2023, 3:00 p.m.
End Date: December 19, 2023, 2:30 a.m.
The Slurm scheduler upgrade has been completed on Expanse and the machine is available for use. Slurm was upgraded from version 21.08.8 to 23.02.6. Please note that with the upgrade of Slurm, srun default behaviour has changed. Details of release specific changes are available in: https://github.com/SchedMD/slurm/blob/slurm-23-02-6-1/NEWS (https://urldefense.com/v3/__https://github.com/SchedMD/slurm/blob/slurm-23-02-6-1/NEWS__;!!Mih3wA!CdIUBFjMQW1aL5WriJZf0AW9DInW3G8D99tY-K4oEFYdAWirVTSpm_6et8qGavnPSV87kgRvaigIvXsrQRih$) One change in particular might impact some users as srun is no longer reading SLURM_CPUS_PER_TASK. This meas that the --cpus-per-task value set in the #SBATCH specification will not be automatically picked up by any srun command within the script. Users can either add a specific option to their srun command OR set the following variable before the srun commands: export SRUN_CPUS_PER_TASK=${SLURM_CPUS_PER_TASK} No changes are required if your script was using Intel MPI and mpirun. Please contact us either via the ACCESS ticketing system or via email to consult@sdsc.edu if you have any questions. >>>>>>>> Dear Expanse User, We will have a maintenance period on Expanse 7AM-Midnight (PT), Dec 18, 2023. During this maintenance, we will be updating the Slurm scheduler. We have a reservation in place to prevent jobs from running during this period. The "squeue" output will show "ReqNodeNotAvail, Reserved for maintenance" for jobs that do not fit in the time period before the maintenance begins. These jobs will run after we release the maintenance reservation. Thanks SDSC User Support Staff
Posted: March 20, 2026
ACCESS XDMoD Scheduled Downtime
PublishedInfrastructure News Type: Outage Partial
Affected Infrastructure: xdmod.access-ci.org
Start Date: December 14, 2023, 2:00 p.m.
End Date: December 14, 2023, 5:00 p.m.
UPDATE 12/14/23 13:39 EDT: ACCESS XDMoD is up now, however some features might not be available for a couple more hours. Thank you for your patience during this time. ACCESS XDMoD (https://xdmod.access-ci.org/ ) will be unavailable from approximately 10:00AM to 1:00PM EDT on Thursday December 14th 2023 during a scheduled infrastructure update. This will temporarily be a full outage of the service. A follow-up message will be sent when the update is complete.
Posted: March 20, 2026
ACCESS Web Login Partial Outage October 31, 2023
PublishedInfrastructure News Type: Outage Partial
Affected Infrastructure: identity.access-ci.org
Start Date: October 31, 2023, 2:15 p.m.
End Date: November 2, 2023, 1:00 p.m.
ACCESS Web Login is failing for some users. The issue was due to LDAP corruption which has been fixed.
Posted: March 20, 2026
Anvil Unplanned Outage
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: anvil.purdue.access-ci.org, anvil-gpu.purdue.access-ci.org
Start Date: October 17, 2023, 1:45 p.m.
End Date: October 17, 2023, 2:55 p.m.
Dear Anvil user, Anvil nodes experienced a brief outage this morning. The problem is resolved and nodes are online now. Please check the status of your job and resubmit if necessary.
Posted: March 20, 2026
Delta /projects file system temporarily unavailable
PublishedInfrastructure News Type: Outage Partial
Affected Infrastructure: delta-storage.ncsa.access-ci.org
Start Date: October 9, 2023, 6:30 p.m.
End Date: October 9, 2023, 9:05 p.m.
The Delta /projects file system currently has an issue that has taken part of it down rendering it unresponsive. NCSA is working with the vendor at the moment to determine the problem and resolution but do not yet have an ETA for repair. We have removed the projects and taiga constraints from the scheduling configuration so new jobs requesting those constraints will not start. At this time any attempt to access files on /projects will hang which may impact logins as well. A follow-up message will be sent once the repair is complete.
Posted: March 20, 2026
Delta maintenance 10-04-2023
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: delta-cpu.ncsa.access-ci.org, delta-gpu.ncsa.access-ci.org
Start Date: October 4, 2023, 11:00 a.m.
End Date: October 5, 2023, 3:00 a.m.
The Delta resource will undergo maintenance from 6:00AM to 8:00PM CDT on Wednesday October 4th, 2023. Additional notices will be sent if changes to the plan occur. During the maintenance period the following changes will be made: - Minor updates to the OS. - pmix package will be updated to 3.2.5 to address security vulnerability - ucx will be updated to a Mellanox release to improve functionality - apptainer will be updated to 1.2.2 from 1.1.9. - No expected changes to the existing software stack. - Software for the Delta high-speed network (HSN) switches and fabric manager will be upgraded. - Lustre file system software will be updated to address known issues. - NVIDIA GPU driver will be upgraded to support for CUDA 12.2, while the default CUDA software module will remain at CUDA 11.6.1. All Delta resources will be unavailable during the maintenance period including: - Delta login nodes - unavailable - Delta compute nodes - unavailable - Delta services - Open OnDemand - unavailable - Delta Globus Online endpoint - unavailable All running jobs will have been drained in advance of the maintenance. Queued jobs will persist and be eligible to run once maintenance is complete. A follow-up message will be sent once maintenance is complete. Please send questions to help@ncsa.illinois.edu (mailto:help@ncsa.illinois.edu) and be sure to mention Delta in the subject. --Delta Project Office
Posted: March 20, 2026
SDSC Expanse NFS server issues [resolved]
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: expanse.sdsc.access-ci.org, expanse-gpu.sdsc.access-ci.org, expanse-ps.sdsc.access-ci.org
Start Date: September 29, 2023, 7:05 p.m.
End Date: October 1, 2023, 1:30 a.m.
The Expanse home directory server issues have been resolved and the machines is back in production and available for use. >>> We are seeing the NFS server issues again on Expanse and this is causing login issues. We are working with the vendor on identifying the source of the problem and will update once we have resolution. In the interim we are putting in a system reservation to prevent new jobs from starting on Expanse.
Posted: March 20, 2026
SDSC Expanse: Home directory server and login issues [Resolved]
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: expanse.sdsc.access-ci.org, expanse-gpu.sdsc.access-ci.org
Start Date: September 28, 2023, 8:00 a.m.
End Date: September 28, 2023, 9:30 p.m.
Update: The home directory issues on Expanse have been resolved and the system is available for logins and use now. >>> The SDSC Expanse home directory servers had issues overnight and that is leading to login problems. We are looking into the issues and will update once they are resolved.
Posted: March 20, 2026
Jira Service Management ticketing system is non-responsive
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: tickets.access-ci.org
Start Date: September 27, 2023, 6:25 p.m.
End Date: September 27, 2023, 7:02 p.m.
Jira Service Management ticketing system is non-responsive again. We are working with the support to resolve this and avoid this in the future. We are not able to give an ETA at this time.
Posted: March 20, 2026
Jira Service Management ticketing system is non-responsive
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: tickets.access-ci.org
Start Date: September 27, 2023, 1:00 p.m.
End Date: September 27, 2023, 3:19 p.m.
Dear ACCESS colleagues, the Jira Service Management ticketing system is non-responsive in certain geographical regions. We have submitted a support ticket but have no ETA on the resolution
Posted: March 20, 2026
Delta projects file system maintenance 09-14-2023
PublishedInfrastructure News Type: Outage Partial
Affected Infrastructure: delta-storage.ncsa.access-ci.org
Start Date: September 14, 2023, 1:00 p.m.
End Date: September 15, 2023, 3:00 a.m.
The Delta projects (/projects) file system will be unavailable from 8:00AM to 10:00PM on Thursday September 14th, 2023. The host file system, Taiga, will undergo semi-annual maintenance requiring Taiga mounts on Delta (/projects and /taiga) to be unmounted. No data can be read or written to/from /projects or /taiga during the maintenance period. The Slurm constraint for projects and taiga was removed Tuesday, September 12 at 8AM. Jobs with the projects constraint, even with a short wall clock time, will not be eligible for scheduling after that time and until maintenance is complete. Jobs that do not specify the /projects or /taiga file systems by Slurm constraint/feature will be allowed to run. Please send questions to help@ncsa.illinois.edu (mailto:help@ncsa.illinois.edu) and be sure to mention Delta in the subject or message body. --Delta Project Office
Posted: March 20, 2026
Anvil Unplanned Outage
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: anvil.purdue.access-ci.org
Start Date: September 10, 2023, 5:12 p.m.
End Date: September 11, 2023, 5:17 p.m.
Anvil has been returned to service
Posted: March 20, 2026
DUO authentication slowness or failure August 21, 2023
PublishedInfrastructure News Type: Outage Partial
Affected Infrastructure: identity.access-ci.org
Start Date: August 21, 2023, 1:34 p.m.
End Date: August 21, 2023, 6:00 p.m.
Update as of 08-21-2023 1:00 pm Central The DUO multi-factor services appears to be working normally again. Original News The ACCESS Multi-factor Authentication (MFA) service provided by DUO is experiencing slowness or failure, affecting all logins using ACCESS username and password. The vendor is aware of the problem and working to resolve it. The vendor is posting outage updates here (https://status.duo.com/incidents/rw7g0q7ztj8f) and here (https://status.duo.com).
Posted: March 20, 2026
SDSC Expanse Maintenance, 7AM-Midnight (PT), August 14, 2023
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: expanse.sdsc.access-ci.org, expanse-gpu.sdsc.access-ci.org, expanse-ps.sdsc.access-ci.org
Start Date: August 14, 2023, 2:00 p.m.
End Date: August 15, 2023, 6:59 a.m.
We will have a maintenance period on Expanse 7AM-Midnight (PT), Monday, August 14, 2023. There is a reservation in place to prevent jobs from running during this period. The "squeue" output will show "ReqNodeNotAvail, Reserved for maintenance" for jobs that do not fit in the time period before the maintenance begins. These jobs will run after we release the reservation. The SDSC Expanse portal and the SDSC Expanse Globus collections will also be unavailable during this maintenance period.
Posted: March 20, 2026
RESOLVED -- 8/13/2023 Network outage upstream from Jetstream2
PublishedInfrastructure News Type: Outage Full
Affected Infrastructure: jetstream2.indiana.access-ci.org, jetstream2-gpu.indiana.access-ci.org, jetstream2-lm.indiana.access-ci.org, jetstream2-storage.indiana.access-ci.org
Start Date: August 13, 2023, 4:00 p.m.
End Date: August 13, 2023, 5:15 p.m.
UPDATE - 1:30pm Eastern: Network engineers have resolved the upstream network issues as of approximately 1:15pm Eastern. VMs on Jetstream2 should not have been affected by the outage. If you are seeing issues, please open a ticket via https://support.access-ci.org/open-a-ticket ------ There is a network outage upstream from Jetstream2 that is preventing access. Running VMs should be unaffected but they will likely not be accessible. Network engineers are aware of the issue and are working on it presently. We will update as soon as we have additional information.
Posted: March 20, 2026
