On June 16th, a significant number of virtual machines in the NY2 region became unavailable and went into Read-Only (RO) mode, which affected network availability. By 18:20 UTC on the same day, network connectivity was restored to the affected virtual machines. However, affected virtual machines continued to be unavailable until 12:23 UTC on June 17th as they were still in RO mode. The affected virtual machines were restored to Read-Write (RW) mode between 5:58 and 12:23 UTC on June 17th. By 12:24 UTC, affected virtual machines were available and network access to customers was restored.
The root cause of this unexpected service unavailability is related to a core switch failure in NY2 that caused a large number of machines to go into RO mode.
As a result of the failure of the core switch, there was a spike in traffic that adversely impacted network performance. The virtual machines went into RO mode as a result of the loss of network access to the Network File System on which they resided. Since virtual machines were in RO mode, customers were unable to perform any write operations on them, resulting in service disruptions around 10:30 UTC on June 16th, 2024.
A number of efforts are underway to try to prevent these types of failures from occurring again, including a network redesign and installation of new equipment.
On behalf of Paperspace, we apologize for the disruption to your services and appreciate your understanding.
If you have any questions or concerns, please open a ticket with our Customer Support team.