How the EXCEL do I communicate data science in Tableau?! Or, I made the model, how do we use it?

You ran the model, you rolled it out, and now you’re trying to get people to USE the magic number. Oh, we all know it’s 42, but what does that actually mean?

More than doing data science, we’re focused on communicating it. Here’s how to use Tableau to to demystify the model.

Understand the business case

Someone – maybe you, maybe someone else – ran the model. There was a reason for this, beyond the chance to play. Really, they never let us just play. There is always a reason. If there isn’t, let this one lie and don’t viz it – trust me.

Today, we’ll teleport back to the days of yore when phone companies actually cared about nighttime minutes. I think MCI was still in business and having 2 phones (including a real copper-wired landline) actually made sense! Long distance, kids! The pain was real.

In this fictional case, we want people to upgrade. What is the next best plan we can offer them? (Sidenote: I don’t upgrade. I wait until things die and then I move on painfully. My next best plan is nothing. Pinky swear.)

Dashboard showing fake client details for #1717. Phone number, lives in main. Customer for 3 years. No international plan. 1 customer service call and 25 voicemails on server. Call distribution between international, day, evening, and night are shown.

Typically, the business is used to looking at data like this. We can see, surprisingly, this person used more nighttime minutes, but gets heavier charges during the day. This is during the era of roaming charges, so maybe that’s it.

But, despite super retro data, we’re a modern company with real data scientists building predictive models (prediction – data, video, and Disney+ subscriptions are hot) and we ran this through the ML hopper and drew out predictions we want to roll out.

So, the business case here is moving people to a new plan. But, how do you convince the people selling plans this prediction is worth a hoot?

Build trust with transparency

Ask anyone on the front line of their business, and they can tell you a lot of ground truths about the business. Ground truth exists for a reason: usually a good part of it has a level of truth. There may be nuance, but something ends up being sticky.

Tools like DataRobot absolutely shine here. They provide blueprints, feature impact weights, and even a visual confusion matrix. These items help demystify what’s happening with data science models. They show how the data was processed (Blueprint), how it was weighted (Feature Impact), and how well the particular model performed against its test set (Confusion matrix).

Data robot samples of Blueprints, Feature Impact charts, and a Confusion Matrix

Understanding what it gets wrong how often is critical. People will experience these misses and the Dunning-Kruger effect promises I can do it better (I don’t know why, but I trust me a lot more than a machine).

Dig. Relentlessly understand what the model is predicting and some semblance of why. Tools like the confusion matrix provide a huge clue.

That daytime number looks MASSIVE until you realize the precision is only .57. If your exploratory data process relies on things that make sense, this is a large distraction.

Your end users will run into this oddity and think your model belongs in the rubbish bin. Tell them early, but also highlight where it performs well. (This model, being multiclass, could be its own discussion.)

International and evening shine here.

There are ways to handle this, depending on business priority. In this case, we just want people to change plans and one doesn’t necessarily have a priority (minus all the cats like me who change absolutely nothing).

Build guardrails to make this accessible. Here, I’ve highlighted what’s important and made a cute little sonar graphic giving an idea of precision. This also, metaphorically, helps remind people that the target keeps moving.

Speak to your audience

In interpreting, we often called this ‘register’ and it mattered a whole lot. If you came to this blog expecting a high register, you’ve been wildly disappointed. The kids doing work are probably relieved.

As you visualize, consider what that user’s world looks like. They’re likely very quickly touching this to do another primary task. Your viz is secondary: there, I said it. Use familiar metaphors that support your message.

This looks a bit like my audio gear. Play with the dots and they move around in ways that help showcase how the certainty changes. The dot also helps express uncertainty. The actual number is available via tooltip, but I’m relying more on a visual understanding.

“No action” in this case has moved up quite a bit. This is definitely less certain than my earlier Enhanced Voicemail sale.

I’m also using other tactics to make lives easier. People working from a script can easily use color as a means of not having to read another item. I could add icons to add an extra layer and improve colorblindness accessibility.

When the model fits into the work, it supports movement forward, rather than creating a new hurdle to clear. Smooth bumps, rather than create them.

Prediction is not truth

So often, we’re working with data that’s a retrospective of what’s already happened. Our processes may rely on things that make clear sense – they’re retrospectives and we can trace a clear path.

Predictive data is not a clear path. It’s a bramble-filled adventure waiting for some clearing. The clearing happens when the results pour in. Keep moving forward and there will always be brambles to clear.

Ideally, your predictive tools have a way of showing their work. This includes various cross-validation scores, confusion matrices, and the like that help sort what the model does and does not do. Use these!

If you feel like you’re solving a mystery, you’re doing it correctly. Too much clarity is a sign of not enough digging with the data. Data science problems are typically complex. If you’re communicating them, embrace the complexity and work to clarify, not always simplify.

This is not a simple chart. It shows feature impact, with the most important at the top. From there, you can see how values distribute, both in actual values, and in an indexed or normalized view. It’s a lot to take in and work through initially, but as the pattern gets seen again and again, it starts to make sense.

Fail first to gain clarity

It’s easy to think you understand what the model is putting forth. As you dig though, you’ll become less certain.

This visualization failed to be effective to roll out, but was essential in the journey. It also shaped the path for later work that evolved into the final version.

The final version of the explanation feature bears a lot of similarity to the above Design elements, register changes on labels, and a shift to this as supplemental material makes this useful.

The front end of this dashboard – the part that’s designed for use – focuses on what the end user needs: a consumer profile from standard business data, an actionable recommendation, a sense of how probable, and a way to gut check against the model. Using color to cue (stroopifying) makes it easier to look and go right to the script of choice.

Take it with you

Communicating data science effectively relies on many standard visualization practices with one key caveat: the data is an educated guess. Communicating that uncertainty with transparency is key.

Parady of Oreilly Book. Butterfly with title "Data Science for Designers: Ultimate Awesome GUide". Caption at the top: Partitioned, MLed, and Confused. By B Cogley and K Roberts

This post comes from a webinar with Dr. Kyle Roberts and yours truly. You can watch it soon.

Sidenote: none of these books are real (yet). If you absolutely want us to write them, comment below and build out the petition. Who knows, maybe O’Reilly will let us roll with the designs.