Identity Resolution Demystified: Probabilistic vs. Deterministic Identification

Probabilistic or deterministic data — that’s the question. Or, at least it should be if marketers take a look at the way their audiences are searching, reading, shopping, and using all of their devices.

Consumers are fairly consistent with their devices, dividing their internet time pretty evenly among them, according to the Interactive Advertising Bureau (IAB): 45 percent on PCs/laptops, 40 percent on smartphones, and 15 percent on tablets.

This poses a major problem for marketers because cookies — the pieces of code traditionally used for aggregating and targeting online audiences — are tied to a single device. So what can you do?

The two main ways to reliably target audiences across multiple devices when cookies don’t work are probabilistic and deterministic identification. These data options are gaining traction because everyone is looking for an alternative to the third-party cookie (i.e. cookieless identifiers).

While they both solve the same problem, kind of, they each do so in completely different ways.

What’s the difference?

To get a better sense of how these two types of identification methods differ, let’s start by defining each:

  • Deterministic identification — often referred to as first-party data — is personally identifiable information (PII), like an email address, that is known to be true because someone entered it themselves on a website. For this one, think logins: Facebook, WiFi, Roku, Gmail, Instagram, etc.
  • Probabilistic identification is based on probabilities and consists of individual pieces of information, like the IP address, browser version, and behavior. It’s everything but personally identifiable indicators, and these identifiers tie personally identifiable information to devices. Probabilistic is a key function of an identity graph and is a statistical guess on non-IP identifiers about who a person is on that device.

While deterministic data is mainly confined to logins, probabilistic data points are much more vast. It allows brands to draw conclusions about the consumer’s identity before they login using things like:

  • IP address
  • Websites visited
  • Actions taken on a given time of day
  • Screen resolution
  • Browser version
  • Location data
  • WiFi network
  • Operating systems

Probabilistic identification is able to pull from all of these points to connect them to a single user and the devices they use.

Pros & Cons

When it comes down to what each type of identification does the best, it’s all about the level of certainty and amount of possible data.

Probabilistic identification allows you to identify what two or more devices belong to a single person, and it also broadens prospects because you can reach nearly everyone:

  • Pro: Massive reach
  • Con: There can be about a 5-percent matching inaccuracy.

Deterministic is a great option to turn to if you want to sell to someone who has already purchased something from you.

  • Pro: Extremely accurate
  • Con: Even for the top companies like Amazon, Facebook, and Google, it’s a small universe of users.

Which is better?

Deterministic is more accurate (though only by a few percentage points). But, even the largest companies don’t have every global internet user logging in on all of their devices all of the time. So, probabilistic has the reach the big brands — and the smaller ones — need.

Think about it this way: Would you rather have data that’s 95 percent correct on nearly all of the internet — or 100 percent correct on a tiny portion of the internet? Still, the decision about which identification method to use doesn’t have to be one or the other.

Most brands use both in their identity graphs, which gives them excellent accuracy and reach. Looking ahead, these identification tools’ roles in marketing will only grow, making it important for marketers to not only understand them, but also to find ways to begin implementing them.

Table of Contents