Vol. 4, Issue 10: Concrete and Clay

Why Data Fidelity Is The Only Foundation That Will Hold

Jun 17, 2025

When I was growing up in the northern suburbs of New York City in the 1980s, there were basically 4 sets of things to listen to on what we’d now call “terrestrial radio”. If you were into popular music, you could listen to WPLJ (95.5 on your FM dial) or you could listen to Z100 (100.3). If you were more of an adult contemporary fan, there was Lite-FM all the way up on 106.7*. 104.3 played classic rock, including Won’t Get Fooled Again at least once every 2 hours.

*Lite-FM was (is?) the most listened to radio station in the entire country for a long time. And if you like Michael Bolton, you’re not gonna believe this …

There were other radio stations - if you were lucky and you leaned out your window far enough in my neck of the woods, you could get 92.3 WLIW, which played edgier stuff from New Wave to bands from the UK to “alternative” music*. There was 97.1, which was like WPLJ and Z100, until it became the home of hip hop in the late 80s.

*This was the radio station where I heard Bad Brains for the first time. I’m not a Henry Rollins fan, per se, but boy can he scream.

There were other stations too, but there was only one other one that really mattered:

101.1 - CBS-FM

CBS-FM played oldies, unapologetically. And they recruited all manner of DJs from radio’s heyday in the 1950s and 1960s who worked in the New York market - guys whose names you’d either have to be really old or have listened to CBS-FM to recognize: Ron Lundy, Dan Ingram, Harry Harrison, Bill Brown*. And for a long time, their station refused to play anything released after like 1972.

*True story: Bill Brown used to solicit submissions for a playlist format that he aired during the lunch hour called “The Brown Bag” - and back in 2003 or 2004, he used my submission called the “Pleasant Conversation” Brown Bag, which was Hello It’s Me (by Todd Rundgren), How Do You Do (by Mouth and MacNeal), and I’m Doing Fine Now (by The New York City Queens). I won tickets to a Broadway show called “Anna in the Topics” that starred Jimmy Smits.

Anyway, this is all prelude for me to tell you that I was listening to an oldie today called “Concrete and Clay”, of which the chorus goes “the sidewalks and the streets, the concrete and the clay beneath my feet begin to crumble.” And as I think through some work I’m doing now, it seemed an apt metaphor for data and data fidelity and the importance of knowing and being able to identify your audience

In the world of data, we build with both concrete and clay. One gives us strength and certainty. The other gives us scale and adaptability. But only one can bear real weight.

Concrete = deterministic data (solid, foundational, fixed: email addresses, logins, telco IDs)
Clay = probabilistic data (flexible, shaped by inference, but unstable: modeled behavior, lookalikes, device graphs)

And as I said on LinkedIn a couple of weeks ago, in a world full of signal and noise, fidelity matters*. Which brings me to my thesis: More data isn’t better. Better data is better. When your source is unstable, everything downstream becomes noise.

* You guys of course remember Family Matters, the TGIF show and of course Urkel** (and his suave counterpart Stephon Urquell). But do you remember that the show started out with 3 kids? And the younger daughter got Chuck Cunninghammed right out of the show with zero explanation. And we caught up with her and here’s what she had to say! JK, but that would be pretty wild if I did that. Maybe another time.

**Also never forget Urkel and Bea Arthur “doing the Urkel” at the 1991 American Comedy Awards

Let’s start with this: what do we even mean when we talk about “data fidelity”? For me, I think of four things: accuracy, integrity of the source, consumer consent to use the data, cross-channel resilience of the data. Like anything else, the recipe and mix of those things may be different for each source, but in the same way that you need flour, sugar, eggs and milk to make a cake, you need some combination of those four things to make a data, er, cake.

Let’s set the table here quickly by being super clear on what we mean:

Deterministic = logged-in user IDs, telco IDs, email-based graphs

Probabilistic = modeled identities, cross-device guesses, cookie stitching

The pervasiveness of “data” as a thing has created this sense that the more data you have, the better off you are. But the obvious problem with that is that not all data is created equally, most data is guesswork and some data isn’t even 1:1. And while it’s true that the modeling around probabilistic data allows it to scale faster (I’m talking about volume here, not actual scalability), the reality is that deterministic data performs better.

And, just because I’m a psycho for closure, here’s a case study for you. Published in 2023, it appeared in a German publication from the Axel Springer network called Springer Professional. And in true German style, it’s not only extremely clinical, it’s extraordinarily detailed. The extraordinary thing about this study isn’t even that deterministic data outperformed probabilistic data. Generally, that’s not a terribly surprising outcome. The surprising outcome here in this specific study is that random approaches outperformed probabilistic data. I’m not saying that one exhaustively researched and peer reviewed study is the one ring to rule them all, but as we say where I come from, it doesn’t hurt.

And all of this matters now more than ever. Third-party cookies are staying for now but are highly questionable (both from an efficacy and a privacy perspective) - and a viable replacement hasn’t emerged. We’ve already got some signal loss from Apple and Firefox and from privacy regulations around the globe. And given all of the constraints on margin, there’s increased scrutiny on ROI, incrementality, and trust. And as Billy Joel would say, it’s always been a matter of trust.

The cost of low fidelity data may simply be too much to pay in the long run - without real high quality data fidelity, we’re wasting impressions, have poor frequency management and measurement is completely misaligned. There’s also the potential for brand safety issues and ultimately, the further erosion of consumer trust.

This is the pivot*. Advertisers, publishers, platforms all have to choose: Do we keep modeling our way out of a broken system? Or do we start rebuilding on something solid?

*When you hear the word “pivot”, you scream PI-VOT in your head, right? That’s not just me?

You don’t have to ask me twice.* The future is consent-based identity. The future are high grade signals from telcos or other deterministic sources. The future is fewer intermediaries and more transparency. The future is data that traverses channels and devices while maintaining its integrity.

*Or I guess I don’t have to ask me twice, but you get it.

Fidelity isn’t a luxury - it’s the foundation. And if we want the industry to stand up to what’s coming, we better start pouring ourselves some concrete.

The time has come - we’ve arrived at Issue #10 and the summer is upon us. It’s time for a little brain break over here. I’ll be honest: this is the most fun that I’ve had writing in years, so I’m going to keep after it. But I’m going to keep after it after the July 4th holiday, so expect Issue #11 sometime during that week following Independence Day.

Let me take a moment to earnestly thank those of you who read this thing. I don’t quite know what it is but I know it brings me joy - and if it brings you something too, that means the world. Thank you.

That’s all for this week. Until next time, friends.

Discussion about this post

Ready for more?