System and Infrastructure Status News

PSC Datacenter Power Disruption February 20

Published

Infrastructure News Type: Outage Full

Affected Infrastructure: bridges2-em.psc.access-ci.org, bridges2-gpu.psc.access-ci.org, bridges2-rm.psc.access-ci.org, bridges2-ocean.psc.access-ci.org

Start Date: February 20, 2026, 6:50 p.m.

End Date: February 20, 2026, 9:30 p.m.

At around 12:50 PM Eastern on Friday, February 20, the PSC datacenter was hit with a power disruption. PSC staff are returning the systems to service as quickly as possible. Most of the Bridges-2 infrastructure has returned to service at 3:30PM Eastern Time.

Posted: March 20, 2026

Kerberos Master KDC Migration

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: kerberos.access-ci.org

Start Date: February 18, 2026, 4:00 p.m.

End Date: February 18, 2026, 5:00 p.m.

NCSA will be migrating the ACCESS-CI Master Kerberos server to proxmox. During this migration, password resets will not work and new accounts will not be able to be created. Authentication may be slightly delayed as they fail over to the replicas.

Posted: March 20, 2026

Anvil Maintenance Completion

Published

Infrastructure News Type: Outage Full

Affected Infrastructure: anvil.purdue.access-ci.org, anvil-gpu.purdue.access-ci.org

Start Date: February 6, 2026, 6:00 a.m.

End Date: Not Specified

Dear Anvil Users, As of 12:00 AM ET, the maintenance work on the data center where Anvil is housed has been completed and job scheduling has been resumed. If you encounter any issues post maintenance, please contact ACCESS Help Desk (https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2F6fax6u7ab.cc.rs6.net%2Ftn.jsp%3Ff%3D001dCpINx6evSsa1_17Dchea-aVnOlLHYrrWCf8ycCjv-K07nZ_U7rnHdWI0h9pGbc890parwoc93MiggkbErRGGuIMGhQdGfuyxgjLhnZuEo2KQkZ98YVQFYafo6s2GQAXe13TtUipPDXh9_y-djtady1wkTvjiBTD01OZK5Rw9wECeD-h9subow%3D%3D%26c%3DCcfuyLIMXqPrEf3n9vLlQFGxzpID_X0qsA32hTNQ9EW-RyenvYgQng%3D%3D%26ch%3Do7Q3tk2Wg_NLAcXRVRM68jZ-iyAmpSuIHOCwn57D2u7DIOoqZlyxdg%3D%3D&data=05%7C02%7Cschultzy%40purdue.edu%7Ce2b279b8b3054de4405108de390b8b94%7C4130bd397c53419cb1e58758d6d63f21%7C1%7C0%7C639010918938743892%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=%2B1XXeFCkoGoXLG5ldWwyEUA1BWvG%2FPPnHljt%2BvJVqpM%3D&reserved=0). Sincerely, Doug Schultz Sr. Manager Scientific Applications Rosen Center for Advanced Computing Purdue University

Posted: March 20, 2026

Subject: Anvil Downtime - February 5, 2026

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: anvil.purdue.access-ci.org, anvil-gpu.purdue.access-ci.org

Start Date: February 5, 2026, 11:00 a.m.

End Date: February 6, 2026, 1:00 a.m.

Dear Anvil users, On Thursday, February 5, the data center where the Anvil cluster is housed will be offline while cooling maintenance is performed. During this maintenance window, Anvil will be powered down. We expect this maintenance to last from 6 AM EST to 8 PM EST. How does this maintenance impact you? - Any jobs requesting a walltime which would take them past the start of the maintenance will not start and will remain in the queue until after the maintenance is completed. - You will not be able to run jobs during the maintenance period. - The Anvil Composable cluster and all web services hosted on it will be unavailable during the scheduled maintenance window. This includes AnvilGPT service, which is hosted on Anvil Composable. - You will still be able to access your data. If you have questions about how this outage will affect your work or need support, please contact ACCESS Help Desk (https://support.access-ci.org/help-ticket). Sincerely, Doug Schultz Sr. Manager Scientific Applications Rosen Center for Advanced Computing Purdue University

Posted: March 20, 2026

Updated Idp.access-ci.org

Published

Infrastructure News Type: Reconfiguration

Affected Infrastructure: identity.access-ci.org

Start Date: February 4, 2026, 3:15 p.m.

End Date: February 4, 2026, 3:30 p.m.

On February 4, 2026, the ACCESS Identity Provider (https://idp.access-ci.org/idp) (idp.access-ci.org) was updated to Shibboleth v5.2.0 (https://shibboleth.atlassian.net/wiki/spaces/IDP5/pages/3199500367/ReleaseNotes#5.2.0-([date])), which includes updates to Spring (v7) and Spring Web Flow (v4).

Posted: March 20, 2026

TAMU ACES Maintenance, January 28

Published

Infrastructure News Type: Outage Full

Affected Infrastructure: aces.tamu.access-ci.org

Start Date: January 28, 2026, 3:00 p.m.

End Date: January 29, 2026, 2:00 a.m.

UPDATE: The ACES cluster is now available. Some nodes remain unavailable due to issues with one of the PCIe composability fabrics that we hope to fix ASAP this week. Due to the job scheduling backlog, new limits per job and per user have been applied to help improve fairness among ACES users. - 2000 core limit per user - 10 GPU limit per user - 20 node limit per job in cpu queue - 10 node limit per job in pvc queue UPDATE: The scheduled maintenance is now starting. The TAMU ACES cluster will be unavailable from 9a to 8p CST Wednesday January 28, 2026 for regular and storage maintenance.

Posted: March 20, 2026

CILogon Authentication Outage Wednesday January 28 8:00am CST

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: allocations.access-ci.org, identity.access-ci.org

Start Date: January 28, 2026, 3:00 p.m.

End Date: January 28, 2026, 4:00 p.m.

CILogon data store maintenenance - North America Upcoming scheduled maintenance noticeOn Wednesday, January 28th, 2025, we will be performing emergency maintenance on our data storage system to address issues with performance degradation and outages over the last few weeks. No downtime is expected, but users may experience slower than usual performance during the maintenance window. LDAP will not be impacted. Please report any issues you find after the maintenance window ends to help@cilogon.org (mailto:help@cilogon.org) Start time Jan 28, 08:00 CST Estimated duration 1 hour Components affected cilogon.org (http://cilogon.org/) registry.cilogon.org (http://registry.cilogon.org/)

Posted: March 20, 2026

SDSC Expanse: Power outage impacting part of the system [Update]

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: expanse.sdsc.access-ci.org, expanse-gpu.sdsc.access-ci.org

Start Date: January 25, 2026, 9:00 a.m.

End Date: January 27, 2026, 5:00 a.m.

>>>> Update 2 >>>> Dear Expanse User, The majority of the affected Expanse nodes (except Rack3) were put back in production Monday night and most of the Rack 3 nodes were brought online this morning. The job backlog will clear over the next few days and wait times should return to normal soon. Thanks SDSC User Services >>>>> Dear Expanse User, A subset of the Expanse nodes continue to be offline as further work is needed to address the power infrastructure for the nodes. This means that there are fewer nodes available for use on Expanse and there will be longer wait times as we resolve this issue. Please leave your jobs in the queue and they will run as resources become available. Thanks SDSC User Services Staff >>>>>>>>. Dear Expanse User, We are seeing a significant number of down nodes on SDSC Expanse right now and are looking into the problem. This issue started a little after midnight due to a power outage and impacted running jobs. We will update once we have more information and upon resolution. Thanks SDSC User Services Staff

Posted: March 20, 2026

SDSC Expanse Lustre filesystem issues [resolved]

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: expanse.sdsc.access-ci.org, expanse-gpu.sdsc.access-ci.org, expanse-ps.sdsc.access-ci.org

Start Date: January 16, 2026, 3:00 p.m.

End Date: January 17, 2026, 1:15 a.m.

Update: Dear Expanse User, The Lustre issues on Expanse have been resolved and filesystem is available for use. Separately this morning we had an issue with the cooling distribution units (CDUs) that resulted in shutting down of the Expanse CPU nodes. That problem has also been resolved and the system is running jobs now. Thanks SDSC User Services Staff >>>>> Dear Expanse User, We are currently seeing problems with one of the object storage servers of the Expanse Lustre filesystem. The issue started around 7AM (PT) and is currently resulting in filesystem access problems on all Expanse nodes. This will impact logins if your default path has Lustre locations in it. We will update once the issue is resolved. Thanks SDSC User Services Staff

Posted: March 20, 2026

NCSA Identity portal unavailable for Delta/DeltaAI local identity management, setting or resetting passwords

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: delta-cpu.ncsa.access-ci.org, delta-gpu.ncsa.access-ci.org

Start Date: January 9, 2026, 10:00 p.m.

End Date: January 12, 2026, 3:00 p.m.

The NCSA Identity portal, https://identity.ncsa.illinois.edu , will be unavailable from Friday, January 9th at 4PM to Monday, January 12th at 9AM due to planned power maintenance in the NCSA building. This outage will impact setting or resetting passwords for local NCSA user accounts. Processing of NCSA accounts, which require setting a password, may be paused or delayed during the outage. Please send questions or comments to NCSA via the NCSA help portal at https://help.ncsa.illinois.edu or by email to help@ncsa.illinois.edu.

Posted: March 20, 2026

ACCESS Kerberos Degraded

Published

Infrastructure News Type: Degraded

Affected Infrastructure: kerberos.access-ci.org

Start Date: January 9, 2026, 10:00 p.m.

End Date: January 12, 2026, 3:00 p.m.

Due to power work at NCSA the replica Kerberos server will be offline over the weekend. The master server and the replica hosted at PSC will remain online and unaffected by this work. No service impact is expected but you may see slight delays as things move over to the other replica server.

Posted: March 20, 2026

TAMU ACES Globus Connect service down

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: aces.tamu.access-ci.org

Start Date: January 8, 2026, 9:30 p.m.

End Date: January 23, 2026, 1:30 p.m.

The Globus Connect Service on TAMU ACES cluster has been restored. The Globus Connect Service on TAMU ACES cluster is down. We are working to restore the service. No estimated time for restoration at this time.

Posted: March 20, 2026

Delta /work file system unavailable

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: delta-cpu.ncsa.access-ci.org, delta-gpu.ncsa.access-ci.org

Start Date: January 7, 2026, 3:00 p.m.

End Date: January 7, 2026, 5:27 p.m.

This issue has been resolved. Please check your jobs. --- The /work file system is temporarily unavailable due to an unexpected power issue during data center maintenance. The /work is up but not accessible from Delta. A system reservation will be put in place to prevent new jobs from starting. An update will be provided when the issue is resolved. Please submit comments or questions via the NCSA Help portal at https://help.ncsa.illinois.edu or email to help@ncsa.illinois.edu (mailto:help@ncsa.illinois.edu) -Delta Project.

Posted: March 20, 2026

ACCESS Kerberos Degraded

Published

Infrastructure News Type: Degraded

Affected Infrastructure: kerberos.access-ci.org

Start Date: January 7, 2026, 12:50 p.m.

End Date: January 12, 2026, 1:00 p.m.

Due to power work at NCSA, one of the Kerberos replica servers (tg-kdc1) will be offline through the weekend. The master server will remain online and the replica server hosted at PSC should not be affected. Normal Kerberos service is expected to not be impacted by this work. If you notice any issues, please submit a help ticket https://support.access-ci.org/help-ticket

Posted: March 20, 2026

DeltaAI Notice: DeltaAI Maintenance 01-07-2026

Published

Infrastructure News Type: Outage Full

Affected Infrastructure: delta.ncsa.access-ci.org

Start Date: January 7, 2026, 12:00 p.m.

End Date: January 8, 2026, 4:00 a.m.

DeltaAI Resource Users, This is a reminder that the DeltaAI system will undergo planned maintenance on Wednesday, January 7th starting at 6AM (central time)and ending at 10PM (central time), approximately. Changes will include: - Upgrade of Slingshot FMS from 2.2.0 to 2.3.1. - Upgrade of Slurm from 23.02 to 25.11. - Maintenance on facility power feed for DeltaAI. Available environment modules will NOT change. Nevertheless, please check your jobs once the scheduler resumes job scheduling. During the maintenance period: - All compute and login nodes will be unavailable. - DeltaAI Open OnDemand will be unavailable. - Globus endpoints will remain available. The work and projects filesystems will also be accessible from Delta. - A reservation will be in place to prevent jobs from running into the maintenance period. Please be sure to check job time requirements as January 7th approaches so that jobs can be scheduled as the reservation drains the available nodes. Adjusting the time limit to account for the start time of the reservation will allow jobs to run. The resource scheduler will resume once the maintenance is complete. The other previously announced upgrades have been deferred: - OS upgrade from COS 24.8.0 / SLES 15 SP5 to SLES 15 SP6. - Upgrade of the HPE Cray Programming Environment from 24.07 to 25.09. - Upgrade of NVIDIA HPC SDK from 24.03 (CUDA 11.8 + 12.3) to 25.5 (CUDA 11.8 + 12.9). - Upgrade of NVIDIA driver from 550 series to 570 series. - Upgrade of Slingshot SHS from 11.0.0 to 13.0.0. We invite users to test their codes on nodes where these deferred upgrades have been applied. Please create a ticket for additional information. Available environment modules in the test environment WILL change. Thank you!

Posted: March 20, 2026

Network Maintenance for ACES and Launch Clusters, January 5-6

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: aces.tamu.access-ci.org, launch.tamu.access-ci.org

Start Date: January 6, 2026, 2:00 a.m.

End Date: January 6, 2026, 12:00 p.m.

Reminder: The ACES and Launch clusters will be inaccessible during 8p CST Monday January 5 until 6a CST Tuesday January 6 for the university to perform network maintenance. During this maintenance, job scheduling will continue and not be impacted on the clusters.

Posted: March 20, 2026

Jetstream2 maintenance outage on January 2, 2026

Published

Infrastructure News Type: Outage Full

Affected Infrastructure: jetstream2.indiana.access-ci.org, jetstream2-gpu.indiana.access-ci.org, jetstream2-lm.indiana.access-ci.org, jetstream2-storage.indiana.access-ci.org

Start Date: January 2, 2026, 11:00 a.m.

End Date: January 2, 2026, 11:00 p.m.

On Friday, January 2, from approximately 6AM to 6PM ET, Jetstream2 will be offline during plumbing maintenance of the IU Bloomington Data Center. This maintenance will cause a widespread outage for all Jetstream2’s CPU, GPU, and large memory resources. We strongly advise all Jetstream2 users to save and close their work prior to January 2. You can preserve your work by: - Safely shutting down any active processes or jobs - Backing up essential data outside of Jetstream2 During the outage, please refer to the Jetstream2 status page (https://jetstream.status.io/) for the most up-to-date information. We appreciate your understanding and hope to mitigate any inconvenience this might cause. If you have any questions or concerns, please contact Jetstream2 Support at help@jetstream-cloud.org (mailto:help@jetstream-cloud.org).

Posted: March 20, 2026

Delta Scheduler Maintenance 12-22-2025

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: delta-cpu.ncsa.access-ci.org, delta-gpu.ncsa.access-ci.org

Start Date: December 22, 2025, 1:00 p.m.

End Date: December 22, 2025, 5:00 p.m.

The Delta Slurm resource scheduler software and job database will be upgraded on Monday, December 22nd starting at 7AM and ending at 11AM, approximately. The Slurm software will be upgraded from 23.11.9 to 25.11.0. The most noticeable change will be how Slurm sees numa memory domains as sockets by setting the Slurm numa_node_as_socket parameter, which is helpful for CPU-GPU affinitization. If you use the sbatch, srun, or salloc -per-socket options such as --cores-per-socket or --gpus-per-socket then please check your jobs once the scheduler resumes job scheduling. See the Slurm socket affinity page (https://slurm.schedmd.com/gres.html#Socket_Affinity)for more information. After the maintenance: - The RH9 Slurm reservation will be removed and no longer needed when submitting jobs from dt-login02, dt-login03 or dt-login04. - Login node, dt-login01, and 1/4 of the compute nodes will be available for users who have not migrated to the upgraded OS. - The RH8 reservation will be set for jobs submitted from dt-login01. During the maintenance period: - All computes nodes will be unavailable. - Login nodes, file systems, Globus endpoints and other non-job related services will remain available. - The Delta OnDemand service will be available but interactive applications that require a compute node such as Jupyter notebook, VS Code server, and X Desktop will not be able to run. A reservation will be in place to prevent jobs from running into the maintenance period. Please be sure to check job time requirements as December 22nd approaches so that jobs can be scheduled as the reservation drains the available nodes. Adjusting the time limit to account for the start time of the reservation will allow jobs to run. The resource scheduler will resume once the maintenance is complete. Please submit comments or questions by using the NCSA Help portal (https://help.ncsa.illinois.edu/) or by email to help@ncsa.illinois.edu (mailto:help@ncsa.illinois.edu).

Posted: March 20, 2026

TAMU Launch Maintenance

Published

Infrastructure News Type: Outage Full

Affected Infrastructure: launch.tamu.access-ci.org

Start Date: December 17, 2025, 3:00 p.m.

End Date: December 18, 2025, 6:00 p.m.

The TAMU Launch maintenance has completed. Previous message: TAMU Launch maintenance is extended till noon, Dec 18, 2025, due to unexpected issues. The TAMU Launch maintenance for Dec 17, 2025 has started. Original message: The TAMU Launch cluster will be unavailable from 9a to 11pm CST Wednesday December 17, 2025 for regular and storage maintenance.

Posted: March 20, 2026

SDSC Expanse Lustre filesystem and login issues [resolved]

Published

Infrastructure News Type: Outage Full

Affected Infrastructure: expanse.sdsc.access-ci.org, expanse-gpu.sdsc.access-ci.org, expanse-ps.sdsc.access-ci.org

Start Date: December 17, 2025, 1:00 a.m.

End Date: December 18, 2025, 2:30 a.m.

>>> Update 2>>> Dear Expanse User, The Lustre filesystem issues on Expanse have been resolved and we have mounted it back on the login nodes and on most compute and GPU nodes. The remaining nodes have been drained from the queue and will be rebooted to fix the mounts. The system is available for logins now and jobs can run on the nodes that are already fixed. Users are reminded that the Lustre filesystem locations (both scratch and projects) are **not** backed up. So please make offsite copies of anything critical. Also, it is important to control Lustre usage levels to help with performance and stability. Users are requested to clean up any data that is not in use for active runs or not needed in the near future. The filesystem **should not** be used as an archival location and users are strongly urged to make offsite copies of anything they want to keep long term. Thanks SDSC User Services Staff >>> Update 1>>>> Dear Expanse User, We are continuing to work on the Lustre filesystem issue. We were able to recover the metadata servers last night but there are still continuing issues with one of the metadata servers that we are working on. Additionally, logins to Expanse are being impacted (with errors regarding malformed TOTP) and we are looking into the source of the problem. Updates will be provided as we resolve both of these problems. Thanks SDSC User Services Staff >>>>> Dear Expanse User, We are currently seeing problems with the metadata servers of the Expanse Lustre filesystem. The issue started around 5PM (PT), December 16, 2025, and is currently resulting in filesystem access problems on all Expanse nodes. This will impact logins if your default path has Lustre locations in it. We will update once the issue is resolved. Thanks SDSC User Services Staff

Posted: March 20, 2026