AWS Service Degradation Affecting Parsec
Updates
On Monday October 20th 2025, Parsec incurred three separate incidents triggered by several, cascading AWS outages.
Below is a detailed audit of the sequential incidents, the underlying AWS outages experienced, our response, and future planned mitigations to ensure our communities continued trust in Parsec’s reliability and service quality. We offer this detail in the interests of trust, transparency, and accountability.
Summary and analysis
The first incident (a full outage) was detected on US-East-1, early Monday morning @ 2:45am EDT and resolved by 7:50am EDT. A second minor incident (degraded service) was next noticed two hours later @ 9:49am EDT and resolved by 1:05pm EDT. A final incident (and second full outage) was detected shortly after @ 1:35pm EDT, followed by our decision to failover to US-West-2 starting at 3:23pm and completed by 4:39pm EDT to resolve this final incident and fully restore service. Total calculated outage time equaled 8 hours and 9 minutes, across the day’s incidents.
Key assessments & root cause analyses:
- First incident: Full outage — This incident was initially detected by Parsec as an outage of AWS’s global IAM (identity access management) service blocking any ability to login, auth, or failover. IAM is a foundational service with no regional backup offered. If authentication goes, no failover regions are available. This rare occurrence was followed by AWS-triggered delays in SQS polling that began affecting Parsec’s signaling services, email APIs, and other endpoints. All eventually recovered, but left low-confidence in IAM health, increasing our risk-aversion to initiate any failover attempts prematurely at the risk of stalling.
- Second incident: Degraded service — Intermittent timeouts of database requests alerted us to the continued risk of degraded performance and the possibility of failed connections. Continued low-confidence in IAM health continued to discourage any immediate failover attempts in response, but ultimately stable signaling service health and successful database recovery confirmed the decision.
- Third incident: Full outage — Return of AWS-driven SQS polling delays, subsequent signaling service degradation, and ultimate full outage (coupled with improving IAM confidence) led to our decision to initiate failover to achieve restored service, with high confidence, in US-West-2. Cleanup needs detected due to the intermittent nature of outages and various recovery efforts that were documented, scheduled, and messaged to customers.
A detailed walkthrough
First incident: Full outage —
At 2:45am EDT, our internal APIs reported issues with database access, triggering alerts to our on-call engineers, which were acknowledged soon after. Immediate access issues were encountered due to detected IAM outages, limiting our investigation and troubleshooting efforts. Within the hour, first notice was delivered to customers of potential connection issues while trying to assess IAM outage impact.
By 5:53am EDT signs of recovery and acknowledgment from AWS were communicated, with IAM service returning. Despite signs of recovery, continued outages in our signaling were detected, with early investigations pointing to delays in SQS polling. An hour later, with source isolated, steps were initiated to encourage SQS recovery through the cleanup of appropriate queues resulting in signs of additional recovery and successful customer connections reported. AWS then delivered official acknowledgement of previously detected SQS polling delays. Based on subsequent reports from AWS, we understand SQS polling to be a lever used to encourage recovery of connected services.
By 7:48am EDT, SQS recovery is acknowledged. Signs of signaling service recovery are also detected and official system recovery confirmed with communication of that recovery going out to customers through various channels. Normal operation is resumed, but low-confidence continues in IAM health.
Second incident: Degraded service —
At 9:49am EDT, Parsec detected the return of intermittent timeouts of database requests. Load alerts and other errors indicated degraded performance, but stopped short of a full outage.
At 11:00am EDT, ongoing recovery reported by AWS and continued intermittent customer reports further confirmed degraded service status. Signs of recovery, combined with continued low-confidence in IAM stability, increased our risk-aversion for attempting a premature failover at the risk of encountering issues mid-attempt. Soon after, in-app banners are updated to communicate degraded service as broadly as possible and inform those not subscribed to updates.
By 1:05pm EDT, full API recovery was detected with confirmed reports of successful connections, ending this second incident. Out of an abundance of caution and with decreasing trust in US-East-1’s ability to maintain recovery, Parsec kept its degraded service banner live and its related incident open while our team continued to monitor health.
Third incident: Full outage —
At 1:35pm EDT, Parsec once again detected drops in signaling service health and eventual unavailability, with notice delivered soon after to our community.
By 2:30pm EDT, official acknowledgement of additional AWS-driven SQS polling rate delays were received. This removed remaining confidence in US-East-1’s ability to maintain recovery, turning our limited confidence in IAM health into an acceptable risk for failover. A final decision to failover was confirmed shortly after.
At 3:23pm EDT, our failover effort to US-West-2 began, starting within the two hour window for business continuity as designated by our security and compliance policies. UW1 services begin coming online roughly 20 minutes later, with connections acknowledged as fully restored by 4:36pm EDT.
Known after effects
The intermittent, sequential, and complex nature of the outages and related recovery efforts required several cleanup activities that were low-impact and non-critical.
- Signaling service impact meant some cleanup of “ghost” or duplicate hosts (computers that appear online but are not) that required removal.
- Signaling service impact also resulted in some broken avatars that needed cleanup.
- Additionally, some MacOS users may have experienced delays in recovery due to discrepancies in how different operating systems handle websocket connections.
- Finally, some users might’ve noticed persistent banner messages in the Parsec application due to predetermined polling windows for new updates. These self-resolved over a roughly 1-4 hour window.
Planned mitigations
Safe Harbor Statement: Plans subject to change. All statements based on assumptions using cloud provider insights, customer input, and company goals. The comments herein are subject to alteration or removal pending future research, new insights, internal discussion, updated goals, or further customer feedback.
Parsec remains committed to making improvements that help us stay your #1 choice for remote desktop. A brief summary of currently planned mitigations include:
-
Short-term
- IaaS improvements to harden failover process
- Deeper multi-regional monitoring and observability improvements
- Cleanup automations
-
Medium-term
- Component-level failover support to improve flexibility in response and recovery
-
Long-term
- Several solutions under consideration to reduce difficult-to-mitigate single points-of-failure.
The incident is now resolved. Signaling, connection, and other related services are now fully restored. You may resume normal Parsec usage.
As a result of this resolution, some users may still see duplicate or “ghost” hosts in their applications that appear to remain online even though those computers are not active. Cleanup of these duplicate or “ghost” machines is ongoing and will not affect any other active, available computers moving forward. If you encounter any duplicate hosts before cleanup is complete and need to connect, you may attempt connections to each duplicate until you find the one that is successful. If you encounter a “ghost” host and attempt to connect, you may receive error code 6 implying that the host is not online and may be safely ignored until cleanup is complete. We apologize for any inconvenience this causes.
Additionally, a smaller subset of users may still see a warning banner persist that indicates degraded service, but this can be safely ignored. All incident related banner messages should disappear for affected applications in the next few hours according to predetermined polling windows for new updates. Once incident messages are removed, they will be replaced by our original downtime message, notifying users of our planned quarterly maintenance window.
Please note that our quarterly maintenance is still scheduled to occur on Saturday, November 1 2025.
A detailed post-mortem outlining incident discovery, triage, and resolution will be made available via this channel in the coming days. Thank you.
Failover mitigations were successful. Connection services are now restored with other services gradually recovering. We will continue to monitor service health and update accordingly.
While service improves, we’re still noticing significant AWS degradations impacting availability. With AWS IAM global services now restored, we’re implementing planned failover mitigations to resolve connectivity issues. We’ll continue to monitor and update accordingly.
We are continuing to experience degraded service availability and are our engineering team is actively monitoring the situation and working to mitigate customer impact. Further, we’ve confirmed that an AWS internal Lambda system has experienced ongoing errors, negatively impacting other systems. We’ve been informed that recovery is ongoing and we will report more updates as they’re received.
We are currently experiencing degraded service availability due to the previous reported ongoing AWS outage. Users may see intermittent connection errors. These errors should be resolved by retrying the request.
← Back
Services Status