System and Infrastructure Status News

Problem for New User/Projects on Anvil

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: anvil.purdue.access-ci.org, anvil-gpu.purdue.access-ci.org

Start Date: August 18, 2025, 4:00 p.m.

End Date: August 20, 2025, 10:00 p.m.

Anvil is experiencing a problem with new user and allocation propagation. Our engineers are working on the fix, and will keep this updated. The problem has been fixed on 5 pm.

Posted: March 20, 2026

Update for registry.access-ci.org Plugin

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: registry.access-ci.org

Start Date: August 14, 2025, 1:00 p.m.

End Date: August 14, 2025, 1:30 p.m.

On August 14, 2025, a plugin used by the ACCESS User Registry (https://registry.access-ci.org/) will be updated. This update will enable the creation of ePPN Identifiers for linked accounts which assert ePPNs. Server instances will be restarted during this update which may cause in-progress registrations/logins to fail.

Posted: March 20, 2026

TAMU ACES/FASTER/Launch Network Maintenance

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: aces.tamu.access-ci.org, launch.tamu.access-ci.org

Start Date: August 2, 2025, 1:00 a.m.

End Date: August 2, 2025, 11:00 a.m.

The TAMU campus network will be undergoing maintenance from 8p CDT Aug.1 to 6a CDT Aug. 2. The TAMU ACES, FASTER, and Launch clusters will be inaccessible to ACCESS users for at least the first 20 minutes of the maintenance window. During the remainder of the maintenance, there may be intermittent connectivity issues for accessing the TAMU clusters. The network maintenance will not impact running jobs on the TAMU clusters.

Posted: March 20, 2026

Unscheduled Anvil AI nodes outage

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: anvil-gpu.purdue.access-ci.org

Start Date: July 31, 2025, 9:00 p.m.

End Date: August 1, 2025, 5:00 p.m.

Update: The Anvil AI nodes (Nvidia H100 GPUs) have been resumed at 11:10am EST. Thank you for your patience. Original: The Anvil AI nodes (Nvidia H100 GPUs) are currently powered off due to ongoing cooling issues in the data center. Facilities has confirmed that the cooling system will not be restored until sometime tomorrow, and the H100 GPUs will remain offline until it is safe to bring it back online. Job scheduling remains paused, but file access is still available during this downtime. We now anticipate service restoration sometime tomorrow (Friday, August 1). We will provide additional updates as more information becomes available or by 12:00pm EST. Thank you for your patience and understanding.

Posted: March 20, 2026

ACCESS XDMoD Downtime

Published

Infrastructure News Type: Outage Full

Affected Infrastructure: xdmod.access-ci.org

Start Date: July 30, 2025, 6:30 p.m.

End Date: July 31, 2025, 5:00 p.m.

The ACCESS XDMoD portal will temporarily be unavailable today, 07/30, from approximately 13:30 EDT until tomorrow, 07/31, 12:00 EDT. The service will be completely unavailable for routine infrastructure updates.

Posted: March 20, 2026

Upgrade registry.access-ci.org

Published

Infrastructure News Type: Outage Full

Affected Infrastructure: registry.access-ci.org

Start Date: July 22, 2025, 10:00 a.m.

End Date: July 22, 2025, 11:00 a.m.

On July 22, 2025, the ACCESS User Registry (https://registry.access-ci.org/) will be upgraded to COmanage Registry (https://spaces.at.internet2.edu/display/COmanage/COmanage+Registry+User+Guide) v4.4.2. This upgrade requires a database schema update which will result in a total service outage of approximately 15-30 minutes. During the outage, visitors to https://registry.access-ci.org/ will be redirected to this infrastructure news notice. Users will not be able to register for ACCESS accounts, make changes to their existing ACCESS accounts, or create/modify OIDC client registrations. Logging on to other ACCESS websites should not be affected. For questions or concerns with this update, please contact help@cilogon.org (mailto:help@cilogon.org) or open an ACCESS Help Ticket (https://access-ci.atlassian.net/servicedesk/customer/portal/2/create/30).

Posted: March 20, 2026

idp.access-ci.org Updated

Published

Infrastructure News Type: Reconfiguration

Affected Infrastructure: identity.access-ci.org

Start Date: July 21, 2025, 5:00 p.m.

End Date: July 21, 2025, 5:30 p.m.

On July 21, 2025, the ACCESS Identity Provider (https://idp.access-ci.org/idp) (idp.access-ci.org) was updated to address several Tomcat vulnerabilities (https://tomcat.apache.org/security-10.html#Fixed_in_Apache_Tomcat_10.1.43).

Posted: March 20, 2026

Delta and DeltaAI Emergency outage on Monday, 7/21

Published

Infrastructure News Type: Outage Full

Affected Infrastructure: delta-cpu.ncsa.access-ci.org, delta-gpu.ncsa.access-ci.org

Start Date: July 21, 2025, 12:00 p.m.

End Date: July 21, 2025, 5:00 p.m.

On Monday, July 21st there will be an emergency system outage on both Delta and DeltaAI to perform a corrective file system check (fsck) on the /work (aka scratch) file system. The fsck is needed to correct issues on the file system that are causing an increasing number of issues with files or directories that can not be removed or other IO errors. The issue is believed to be metadata only, there is no indication of any data corruption. The system outage will begin at 7AM and last until noon CDT. In order to complete the unmount of /work processes on the logins that have open files on work will be killed and/or it is possible that the logins will need to be rebooted. Adjust job wall time when submitting jobs to fit jobs into the time remaining before the maintenance begins. Please send questions or questions by using the NCSA help portal at https://help.ncsa.illinois.edu or by email to help@ncsa.illinois.edu (mailto:help@ncsa.illinois.edu). - Delta and DeltaAI Project Teams

Posted: March 20, 2026

SDSC Expanse: Lustre filesystem issue

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: expanse.sdsc.access-ci.org, expanse-gpu.sdsc.access-ci.org, expanse-ps.sdsc.access-ci.org

Start Date: July 18, 2025, 1:00 p.m.

End Date: July 19, 2025, 1:00 p.m.

Dear Expanse User, We are currently seeing some issues with the Lustre metadata server and that will cause filesystem write issues. Please pause/hold any Lustre based jobs and we will update once the issue is resolved. Thanks SDSC User Services

Posted: March 20, 2026

ACCESS XDMoD Partial Downtime

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: xdmod.access-ci.org

Start Date: July 17, 2025, 2:00 p.m.

End Date: July 18, 2025, 2:00 p.m.

ACCESS XDMoD will be upgraded to version 11.0.2 on Thursday, July 17 at approximately 10:00 EDT. Various data in ACCESS XDMoD may be unavailable during the upgrade. Service is expected to be fully restored within 24 hours. Once the upgrade is started, release notes will be available at https://xdmod.access-ci.org/#main_tab_panel:about_xdmod?Release%20Notes

Posted: March 20, 2026

Delta and DeltaAI file system outage: /projects and /taiga

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: delta-cpu.ncsa.access-ci.org, delta-gpu.ncsa.access-ci.org

Start Date: July 17, 2025, 12:00 p.m.

End Date: July 18, 2025, 3:00 a.m.

On July 17, 2025 from 7:00 AM - 10:00 PM the /project and /taiga file systems will not be available due to planned maintenance. Files in /projects or /taiga will not be accessible during the maintenance window. As the maintenance day approaches it is recommended that jobs which do not need access to files in /projects or /taiga make use of a special reservation. Jobs that are submitted to the special "no projects or taiga” reservation and do not specify /projects or /taiga file systems as a Feature or constraint will be allowed to run during the /projects and /taiga file system maintenance. This reservation can run jobs before the reservation is active next week. We recommend using the no_projects_taiga_requirements reservation in advance of the maintenance day. To submit new jobs to the special reservation, add the following #SBATCH --reservation=no_projects_taiga_requirements or use the command line option as in $ sbatch --reservation= no_projects_taiga_requirements ... For jobs already submitted but not running and that might be scheduled to run during the maintenance day, use scontrol as follows to add the job to the reservation: $ scontrol update reservation= no_projects_taiga_requirements job=JOBID where JOBID is the slurm job id of the existing job. To verify the change use scontrol again as follows $ scontrol show job JOBID | grep -i Reservation and you should see Reservation= no_projects_taiga_requirements In general we recommend using Slurm's constraint and feature to indicate to the job scheduler which jobs depend on any file system including the projects or Taiga file system. Jobs that have the projects or taiga file system as a Slurm Feature or constraint will be put on hold 2 days before the maintenance start time. See below for information on how to specify file systems as a job constraint for new jobs and as a Feature for already submitted jobs.

Posted: March 20, 2026

Bridges-2 Maintenance

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: bridges2-gpu.psc.access-ci.org

Start Date: July 17, 2025, 2:00 a.m.

End Date: July 17, 2025, 5:00 p.m.

Due to a lightning storm in the area, Bridges-2 has experienced some issues this evening. Most of the machine has been restored but some partitions will remain unavailable until the morning when we have additional staff on site. Thank you for your patience while we restore all services.

Posted: March 20, 2026

ACES filesystem issues

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: aces.tamu.access-ci.org

Start Date: July 11, 2025, 2:55 p.m.

End Date: July 11, 2025, 6:00 p.m.

UPDATE: The degraded storage server was recovered around 11:50a. We are continuing to monitor the ACES filesystem for any further issues. We are currently seeing degradation on one of the Lustre storage servers. This is leading to slow filesystem access and impacting the responsiveness of the Slurm job scheduler. We will update once the issue is resolved.

Posted: March 20, 2026

SDSC Expanse Lustre filesystem OSS issue [Resolved]

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: expanse.sdsc.access-ci.org, expanse-gpu.sdsc.access-ci.org, expanse-ps.sdsc.access-ci.org

Start Date: July 8, 2025, 1:00 a.m.

End Date: July 8, 2025, 5:00 p.m.

>>> Update Dear Expanse User, We resolved the Lustre OSS issues yesterday and have been monitoring the filesystem for any further issues. Separately this morning (7/9/2025) we had a short disruption of Slurm job submissions due to a down system service and that has also been resolved. Thanks SDSC User Services >>> Dear Expanse User, We are currently seeing high load and timeouts on one of the Expanse Lustre filesystem object storage servers (OSSs). This might lead to access issues on files that are striped onto storage targets on this OSS. We are looking into the issue and will update once it is resolved. Thanks SDSC User Services Staff

Posted: March 20, 2026

ACES Lustre filesystem issues

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: aces.tamu.access-ci.org

Start Date: July 1, 2025, 8:00 p.m.

End Date: July 2, 2025, 12:05 a.m.

We are currently seeing degradation on one of the Lustre storage servers. This is leading to slow filesystem access and impacting the responsiveness of the Slurm job scheduler. We will update once the issue is resolved. The Lustre filesystem has been recovered. We monitoring the storage for any further issues.

Posted: March 20, 2026

Premium and Enterprise plans for all Atlassian products will see full rollout of new navigation on July 7.

Published

Infrastructure News Type: Reconfiguration

Affected Infrastructure: tickets.access-ci.org

Start Date: June 25, 2025, 6:00 p.m.

End Date: July 7, 2025, 1:00 p.m.

Premium and Enterprise plans for all Atlassian products will seefull rollout of new navigation on July 7 (https://support.atlassian.com/navigation/docs/manage-the-navigation-rollout/).

Posted: March 20, 2026

idp.access-ci.org Updated

Published

Infrastructure News Type: Reconfiguration

Affected Infrastructure: identity.access-ci.org

Start Date: June 23, 2025, 6:00 p.m.

End Date: June 23, 2025, 6:30 p.m.

On June 23, 2025, the ACCESS Identity Provider (https://idp.access-ci.org/idp) (idp.access-ci.org) was updated to address several critical Tomcat vulnerabilities (https://www.cert-in.org.in/s2cMainServlet?pageid=PUBVLNOTES01&VLCODE=CIVN-2025-0129).

Posted: March 20, 2026

Delta/DeltaAI full system outage 6/18

Published

Infrastructure News Type: Outage Full

Affected Infrastructure: delta-cpu.ncsa.access-ci.org, delta-gpu.ncsa.access-ci.org

Start Date: June 18, 2025, 12:00 p.m.

End Date: June 19, 2025, 6:00 p.m.

Underlying network configuration issues encountered during maintenance are resolved and the systems are back in service. ---- The network maintenance on Delta encountered an issue with the network that was discovered at final checkout. The Delta login nodes will be available but could become unavailable at any time. The maintenance reservation will prevent the scheduler from running jobs. ---- Delta and DeltaAI users, ALL Delta and DeltaAI services will be down next Wednesday, 6/18, from 7AM to 7PM central time. NO Delta or DeltaAI services will be available during the outage including: logins, computes, data transfer nodes, Open OnDemand. During the outage the core high-speed network will have a software upgrade and reconfiguration to fully integrate the last compute hardware added to the system into the proper intended configuration. This upgrade will address some underlying issues in the network fabric to improve its performance and reliability, but does not include software changes on the clients so is expected to be transparent to users. If you have questions please open a ticket with https://help.ncsa.illinois.edu/ or help@ncsa.illinois.edu (mailto:help@ncsa.illinois.edu) <mailto:help@ncsa.illinois.edu> The Delta Project

Posted: March 20, 2026

SDSC Expanse Lustre filesystem issues

Published

Infrastructure News Type: Outage Partial

Affected Infrastructure: expanse.sdsc.access-ci.org, expanse-gpu.sdsc.access-ci.org, expanse-ps.sdsc.access-ci.org

Start Date: June 18, 2025, 8:00 a.m.

End Date: June 18, 2025, 4:00 p.m.

Dear Expanse User, We are currently seeing connectivity issues to one of the Lustre filesystem object storage servers (OSSs). This is leading to timeouts and access issues for files that are striped onto this OSS. We will update once the issue is resolved. Thanks SDSC User Services Staff

Posted: March 20, 2026

Bridges-2 and Neocortex Maintenance Monday, June 16, 2025

Published

Infrastructure News Type: Outage Full

Affected Infrastructure: bridges2-em.psc.access-ci.org, bridges2-gpu.psc.access-ci.org, bridges2-rm.psc.access-ci.org, bridges2-ocean.psc.access-ci.org, neocortex-sdflex.psc.access-ci.org

Start Date: June 16, 2025, 1:00 p.m.

End Date: June 16, 2025, 11:00 p.m.

Bridges-2, including all VMs and filesystems, as well as Neocortex, will be unavailable due to scheduled maintenance starting on Monday June 16, 2025 at 8am Eastern Time. We anticipate that the system will return by 6pm Eastern Time. During this time, you will be unable to access the system. The slurm queue will be preserved and queued jobs will begin running once the machine has returned to service. Please direct any questions to help@psc.edu (mailto:help@psc.edu) and our team will be happy to assist you. Thank you, PSC

Posted: March 20, 2026