Salesforce outage affects thousands of users worldwide
A massive Salesforce outage on Friday left thousands of customers unable to access the service for several days.
The incident appeared to be one of the worst Salesforce downtime events in the company’s history, with as many as 3,200 users temporarily losing access to their SaaS data around the world.
Officials from Salesforce confirmed that the outage was intentional, after a faulty database script accidentally broke permission settings, giving some users access to all their company’s Salesforce data.
Here’s what we know so far.
How did the Salesforce outage happen?
ZDNet reported that the issue arose from a change that the company made to its production environment within Pardot, Salesforce’s digital marketing tool.
A bad database script deployment inadvertently broke access permission settings, giving users access to data that they wouldn’t normally have access to. Users could read virtually any of their company’s data, and worse yet, they gained write access too, creating a serious security problem.
As a security measure, Salesforce was forced to bring down the service entirely for affected users.
How were users affected?
Prior to the outage, users of Salesforce Pardot – the company’s B2B marketing tool – were suddenly granted full read/write access to their company’s Salesforce data.
As a result, Salesforce shut down large swaths of its infrastructure, intentionally removing access to all Pardot users, including not only current users but also former customers as well. This meant that the service disruption also affected a chunk of Salesforce users who were not actively using Pardot and who weren’t affected by the permission issue.
The outage also affected Sales Cloud and Service Cloud, “the two largest products for Salesforce by revenue,” according to CNBC.
When did it happen? How long did it last?
Salesforce first acknowledged an issue on its status page at 12:56 p.m. Eastern time on Friday, May 17, 2019. Roughly a half-hour later, the company explained that the issue was due to the “deployment of a database script resulting in granting users broader data access than intended.” Out of caution, the company said it was proactively blocking access to all current and former Pardot users.
Users slowly regained access over the weekend. Those who weren’t directly affected by the broken permissions gained access Saturday. However, among those that had been affected by the faulty script, only users with “System Administrator” profiles were able to access their data at first.
As of Monday morning, Salesforce said it was still working to resolve issues for a “subset of customers.” By Tuesday, Salesforce’s status page reported that all production instances were “out of service disruption and in a performance degradation state as service levels return to normal. During a performance degradation, end users are able to access the service, however, some functionality within the service may not be available or running at optimal performance.”
What did Salesforce say about it?
Salesforce provided updates about the service outage through Twitter and its internal status page.
One of the first acknowledgements of the outage came from Salesforce co-founder Parker Harris, who posted on Twitter at 12:40 p.m. ET: “To all of our @salesforce customers, please be aware that we are experiencing a major issue with our service and apologize for the impact it is having on you. Please know that we have all hands on this issue and are resolving as quickly as possible.”
Around 1:30 p.m., the company clarified what was happening on its status page: “The Salesforce Technology team is investigating an issue impacting Salesforce customers who use Pardot, or have used Pardot in the past. The deployment of a database script resulted in granting users broader data access than intended. To protect our customers, we have blocked access to all instances that contain affected customers until we can complete the removal of the inadvertent permissions in the affected customer orgs. As a result, customers who were not impacted may experience service disruption. In parallel, we are working to restore the original permissions as quickly as possible. Customers should continue to check Trust for updates.”
By 5:40 a.m. Saturday morning, the company said it had restored access for all the administrators at affected companies, but added that it was creating instructions for admins on how to manually restore permissions to its other users.
A Monday morning update said that permissions had been restored on most accounts but that some customers may still experience some problems.
In its Tuesday status update, the company said, “We are aware that some customers continue to experience issues, and Salesforce is working urgently to resolve them. Customers should continue to check [the Trust status page] for updates.”
Why was it a ‘forced’ outage?
When it became apparent that the faulty database script had effectively removed all permission settings for some companies, Salesforce’s decision basically boiled down to this: allow those users to have access to everything, or allow no access to anyone.
Taking the service down temporarily was the only viable decision.
If Salesforce had allowed the service to remain up with the faulty script in place, then users at every affected company would be able to access data that they weren’t supposed to. This would create a potentially far more dangerous situation at each company.
For example, consider an employee who has just been terminated being able to delete large amounts of critical data before they exited the company. Or, imagine a scenario in which a user might maliciously copy the data to provide it to competitors.
Even an accidental deletion of important company data would be a major problem for any organization. Salesforce had no choice but to take the service down until it could restore those permissions.
A workaround for some users
In the immediate aftermath of the Salesforce outage, customers with backups of their data were able to restore the correct permissions before the company later executed the automated provisioning.
Salesforce said on Saturday that companies with “a valid backup of their profiles and user permission data can deploy that information directly from a Sandbox copy to the production environment,” according to CRN.
Unfortunately, since most companies do not keep independent Salesforce backups, most users needed to wait until Salesforce restored the permissions (or their admins restored them manually).
Has this ever happened before?
Salesforce has a solid record of service uptime and cloud availability, though it’s not the first time an outage has occurred (and, as with all SaaS providers, it won’t be the last).
In May 2016, an outage left companies without access to their CRM data for 20 hours. That disruption was caused initially by a bug in the firmware of its storage arrays. During the resolution, the company had to move its data to another datacenter, and that led to a massive database failure. The company restored a backup, but many companies permanently lost some of their data in the process.
How to protect your Salesforce data
SaaS platforms like Salesforce allow companies to run powerful applications in the cloud, rather than installing software on premise or storing the data on on-site servers. But just because the data is stored in the cloud doesn’t mean it’s protected against data-loss events or service disruptions.
While Salesforce does offer some backup export options, these are very limited and the process to restore them is highly manual. For greater protection, organizations need a cloud-to-cloud SaaS backup solution that replicates all Salesforce data and stores it independently in other data centers. This ensures that companies can maintain business continuity when disruptions occur.
Other common incidents of data-loss in Salesforce
Extended Salesforce outages may not happen all the time, but there are several other ways that users can lose their CRM data—and these events occur a lot more frequently than you might think.
· Accidental or malicious deletion by users
· Failed data migrations
· Data overwrites during third-party app integrations
To be clear, these are user-caused data-loss events that are not the fault of Salesforce. And statistically, they are far more common than service disruptions. In a survey by Aberdeen Group, 80% of companies reported losing data within SaaS apps like Salesforce, due to the reasons listed above.
A better Salesforce backup
Backupify from Datto is our recommended solution for Salesforce backup.
Backupify creates daily backups of all Salesforce data, including all CRM data, files, objects and Chatter messages. The backups are independent from the Salesforce platform and stored in Datto’s own secure cloud, allowing you to access your data within Backupify’s independent interface (even if Salesforce is down).
The platform also provides seamless data restoration, allowing you to restore all data or individual objects with just a few clicks.
No business can predict when the next SaaS service outage will occur. But by backing up your SaaS data, you can ensure that your organization will be able to continue operating through the disruption.
Request more information on how Backupify can protect your Salesforce data. Sign up for a free demo or contact our business continuity professionals at Invenio IT: call (646) 395-1170 or email success@invenioIT.com.