- Published on
What you should know about Salesforce Data Cloud
- Authors
- Name
- Szymon Lewandowski
Table of Contents
- What is Data Cloud
- Why there is Data Cloud
- Data Cloud Functionalities
- Data Cloud pricing
- Data Cloud billing
Data Cloud shined bright in the Salesforce Portfolio, but currently it is a little bit on the second plan. Among the AI hype and Agentforce we cannot forgot about the data that fuels up our algorithms. And this is the moment when Data Cloud comes as an only solution to provide well structured data space... and more.
What is Data Cloud
There are a lot of definitions of Data Cloud. Due to the flexibility of this tool, it could be defined differently, taking the method of use into account. First let's look at the official Salesforce definition and then we will try to categorize this tool in more scientific way.
Data Cloud definition
The official definition is the perfect way to describe the core Salesforce Data Cloud feature:
Data Cloud is a data platform that unifies all of your's company data on Salesforce Platform.
But, why we should unify this data? The answer is in the next part of the sentence:
Data Cloud is a data platform that unifies all of your's company data on Salesforce Platform, giving every team a 360-degree view of the customer to drive automation and analytics, personalize engagement, and power trusted AI.
Salesforce promotes the "360-degree view of customer", but it does quite differently than other systems.
Key Ring
Data Cloud starts with the different point of view in terms of storing and connecting the data. It does not aim for the golden record or golden ID, but rather uses all of the Primary Keys from different data sources and stores them in the "Key Ring", and this Key Ring is the result of unification process.
The Key Ring is just the additional data layer, based on references from other data sources. Instead of consolidating the data within single ID and building the data model around it, Data Cloud links the data and creates the unified profile as a data view. Then, following the specified rules (called Matching and Reconciliation Rules) system places a given portion of data (e.g. First Name, Email, Phone) in this data view. This process is called Identity Resolution in the system, or Unification in the "business" language.
Customer Reference Data Platform
Moving to the scientific way of defining the Data Cloud, We should see at the history of this tool. It started in 2020 as an Customer 360 Audiences, a standard CDP-like tool with basic functionalities for segmenting and activating the customers to Marketing Cloud platform. Then the tool in 2021 became Salesforce CDP, following the next name change to Marketing Cloud Customer Data Platform in 2022.
Then we saw a switch from a marketer-centered approach to empowering the other Salesforce Products, like Sales and Service. With the next name changes to Genie and then to Data Cloud the tool evolved rapidly, offering a not-so-common approach to data unification, which we already covered. From the history of the product, we can clearly determine that Data Cloud is something more than Marketing Data Warehouse, it is also something other than Data Management Platform.
So can we say that Data Cloud is Customer Data Platform, as it was called by Salesforce in the past?
Well, theoretically, yes... but no... but it is complicated. 🤔
Salesforce promotes Data Cloud as something "more than just traditional CDP". Timo Kovala defined the Data Cloud as "account-based CDP", highlighting the possibility to unify not only the customer but also account data.
Here also comes the Key Ring. Typical CDPs usually offer unification within the single Primary Key, Golden Key, or similar. But the Data Cloud works a little bit differently, based on references instead of strict keys and relationships which provides a more elastic approach for matching users and organizing the data hierarchies from different sources. What is more, the BYOL (Bring Your Own Lake) functionality without copying the data showed even more reference-based approach in Data Cloud.
From the other side, Data Cloud is still highly-based on customers or accounts. There are no native options to export the data without the user or account context (instead of API). Salesforce is probably working hard to provide such functionalities, that we can see in the new addition of full DMO export for Meta Ads (even if it is without user context).
That's why I will define Data Cloud as Customer Reference Data Platform. This name highlights the reference-focused approach and still refers to the CDP.
Why there is Data Cloud
So why did Salesforce create the Data Cloud, instead of adding the features to the current CRM environment? Well, it was probably driven by several technical limitations, architectural constraints and concept of the future look of Salesforce ecosystem.
Technical limitations
Salesforce CRM operates on monolithic, large-scale Java service with Oracle database. 20 years of development provided some technical debt that is not perfectly optimized for processing large amounts of data, especially when it comes to AI.
Data Cloud is micro-service based solution with cool stuff under the mask, following the Amazon S3 buckets for cold storage and DynamoDB spaces for quick data access. As a back layer, it uses Amazon EMR and Kubernetes. Architecture is widely based on Apache Iceberg, infrastructure, that operates as a heavy-boosted database system with SQL metadata store support for indexing.
Completely new infrastructure allowed Salesforce to shift the burden of data integration and maintenance from CRM databases to Data Cloud with shiny-new tech stack, optimized for current business processes.
Data integration
As we well know, Salesforce apps were acquired and rebranded. This resulted in a lot of isolated back-end architectures. What is more, enterprise customers want to connect the internal data stored in custom databases, cloud databases, and third-party apps (or spreadsheets, lol).
This fragmentation made the integrations between Salesforce apps and third party software a big challenge. Especially with enterprise-grade amount of data and Salesforce CRM architecture of that time.
Data Cloud was created with these data integration challenges in mind. Salesforce provided good architecture, canonical data model and native connectors for SF products and many third party solutions, including cloud storage and popular apps.
Scalability
Current systems process terabytes of data, not only structured data, but also unstructured. Implementing the AI and ML capabilities with scalability in mind within old architecture could be very very hard. Adding the changing requirements, evolving data models and unification rules made the integration and maintenance potentially very costly.
Data Cloud provides this scalability. In July it was processing 250 trillion transactions across tenants per week. Migration from Amazon EC2 to Kubernetes provided even better scalability. Salesforce also added the unstructured data support.
Data Cloud Functionalities
There were a lot of materials about the Data Cloud functionalities. But let's make this as simple as possible. I will categorize the functionalities in 4 sections:
- Ingest - getting the data from different sources
- Harmonize - cleaning and mapping the data
- Unify - unifying the data into one customer profile
- Act - using the data
But let's be honest here - it is huge simplification. The Data Cloud is a big tool and the new features are provided monthly, so the functionalities naturally evolve.
Ingest
Ingestion is a process of getting the data from different sources to the Data Cloud. We store the ingested data in the Data Lake Objects.
There are 3 types of ingestion:
- Batch Ingestion - standard data copy in batches. Basic way to ingest the data within pre-built connectors in time intervals (usually from 1 hour to 24 hours)
- Streaming Ingestion - low latency data copy. Faster way to ingest the data within API calls or from CRM Apps.
- Data Federation - no-copy connection to the data source. Newest method that will query necessary data without copying it to the Data Cloud storage. Currently, it is supported by Snowflake, Databricks (Beta) and Google BigQuery.
There are also two types of data that can be ingested:
- Structured Data - standard data type that can be stored with relational database. Data with structure characteristic for tables, API queries, CSV or Parquet files.
- Unstructured Data - other data type that doesn't have specific format and structure. Currently supported formats are text-based PDF, TXT, HTML, and audio/video-based MPG, MP3, MP4, MPGA, OGG, WAV, FLAC, WebM (audio only).
Unstructured Data ingestion is still a fresh functionality in Data Cloud. Currently, you can ingest it from blob storages such as Azure, Amazon or Google. The second option is to manually send the files to the Einstein Data Library.
Moving further, there are several possible data connection types, including:
- Manual CSV import - ad hoc data ingestion from CSV file. Perfect option for MVP, demo preparation or quick addition of data from spreadsheets.
- SFTP - encrypting and transferring CSV files from SFTP server to Data Cloud using SSH. The limit here is 2 GB size per CSV file, 4,5 GB per data stream run and 1000 files for scheduled run.
- Ingestion API - REST API, supporting both batch and streaming ingestion types. The schema is based on the OpenAPI format.
- Website and Mobile SDK - another API connection that allows you to collect the behavioral data from selected website or mobile app within the prepared sitemap.
- No-Code connectors - pre-built connectors maintained by Salesforce. You can check current connectors availability here.
- Mulesoft - brings additional third-party integration connectors included in Mulesoft ecosystem.
It is worth to mention that still a lot of connectors are in the beta stage. It means that you can test them or even use in demo/MVP projects, but I will not recommend using them in the production environment.
You can find developer documentation of Data Cloud Integrations here.
Harmonize
Harmonization is a process of cleaning and mapping the data in Data Cloud. We store the harmonized data in the Data Model Objects. Data Model Objects are not physical objects like Data Lake Objects, but just virtual data views that can be remapped.
There are several options and functionalities supporting the harmonization in Data Cloud:
- Formula Fields - allows us to create additional fields with calculations, specific text, concatenated keys and more information during the ingestion process.
- Data Transforms - batch or streaming, allowing you to transform the data within UI interface or SQL code. Perfect for splitting, joining, flattening Data Lake Objects before mapping.
- Data Mapping - process of mapping the data from Data Lake Objects to the standard or custom Data Model Object within UI interface.
- Data Graphs - transforms normalized table data from Data Model Objects into materialized views of the data with JSON. Perfect for optimizing queries and real-time capabilities of Data Cloud.
- Indexing and embedding - unstructured data is automatically mapped to Unstructured Data Model Objects, then is indexed within vector embedding models to provide a future context for AI.
Data Cloud has a Customer 360 canonical data model that can be easily modified and extended with custom objects. Some of the pre-built objects are necessary to map for the segmentation purposes, like Individual/Account and Contact Points. If you connect the data from Salesforce products, you can use Data Bundles that will automatically create Data Streams and Data Mappings. You can find more information about Data Cloud data model here.
Unify
Unification is a process of connecting data about individuals or accounts to a single customer or account view. This solution, called Identity Resolution, allows you to build a Key Ring in the Data Cloud and prepare the data for better use in the CRM environment.
Based on reference instead of one key, you can easily create multiple Identity Resolution Rulesets, depending on your use cases.
During the Ruleset creation, you need to specify:
- Match Rules - specifies which profiles you want to unify. Each rule contains one or more criteria. You can use the default match rules or create your own, using the exact matching or fuzzy matching (with levels of precision).
- Reconciliation Rules - specifies what value from the list of possible records you want to use in unification. You can use the last updated value, the most frequently occurring value, or set the source priority.
TLDR - you use the Match Rules to reduce the number of final profiles (if more Match Rules, then more source profiles), and Reconciliation Rules to choose what values will be selected for the unified customer view.
Act
With ingested, harmonized and unified data, we can start acting. Acting is using the data in Data Cloud and sending it to other systems.
You can:
- use the data in Salesforce Flows, which opens multiple automation opportunities in the Salesforce environment;
- enrich the Salesforce CRM views with Related Lists and Copy Fields which use data from Data Cloud;
- build and export Calculated Insights, which are the custom data views, built within SQL queries or UI interface;
- build Segments based on unified customers in the Data Cloud
- Activate the segments to other systems (e.g. Marketing Cloud, advertising platforms, cloud storage)
- use Data Actions to send the data in near-real time to Marketing Cloud, Salesforce Platform, and other apps
- transfer the data to Tableau and analyze it in this tool
- power the AI and ML models, using the Model Builder and Agent Builder
- export the data with Query API
- Share the data via zero-copy data sharing, Python SDK or JDBC Driver
Data Cloud pricing
Data Cloud comes with multiple licenses, depending on your requirements. There are also some add-ons that provide more functionalities or extend the license limits.
We know about the basic price of some licenses:
- Data Cloud Provisioning - free Data Cloud tier for customers that have Salesforce Enterprise or Unlimited edition.
- Data Cloud Starter - basic license with the configuration depending on specific license name (e.g. Data Cloud Starter for Marketing or for Tableau), with a price tag set as 108,000 USD per year
- Marketing Cloud Growth and Advanced- new Marketing Cloud on Core licenses, containing the Data Cloud license with less consumption credits than in Data Cloud Starter, billed 1500 USD per month for Growth Edition and 3250 USD per month for Advanced Edition.
The most important Data Cloud add-ons with pricings are:
- Data Spaces ($60,000/year/space): creates internal data spaces for e.g. different brands
- Ad Audiences ($2,400/year/audience): activates segments to advertising platforms like Meta, Amazon Ads, and Google Ads
- Data Storage ($1,800/year/TB): provides additional storage beyond the basic 1TB or 5TB limit
- Data Services ($1,000/100K credits): provides additional Services Credits
- Segmentation and Activation ($1,000/100K credits): enables customer segmentation and activation, and provides additional credits
Like many other enterprise-class solutions, Data Cloud pricing information is not so widely available, so for detailed pricing and bundles, you should contact the Salesforce Account Executive. There are many possible licenses, including the Salesforce Foundations, specific Suites and others.
Data Cloud billing
Data Cloud is the first product in the Salesforce ecosystem which switched from the lump sum billing to usage billing. You pay for the licence that provides the tool and so-called consumption credits. You use the consumption credits to perform operations in Data Cloud and you can buy more credits if you need.
And here's the catch - you have to calculate the potential costs. Depending on your implementation, all of the functionalities have their own usage multipliers. For example calculated insights are billed 15 credits per 1 million rows processed, batch ingestion is billed with 2000 credits per 1m rows processed, but identity resolution is billed 100,000 credits per 1m rows processed.
I recommend reading my Salesforce Data Cloud Credits Guide, which is the complete article about the billing and credits in Data Cloud.
You can also check the Data Cloud Credit Calculator made by Isaac Shaffer, which can help with estimating the potential credit usage.
Source of the diagram images: A Visual Guide to Salesforce Data Cloud Capabilities | Salesforce Developers Blog