I’ve been toying with the linguistics of data visualization idea for some time now as a way to teach myself to build better dashboards. For once, I decided not to procrastinate and worked on my presentation early. Something about presenting to my tribe? Maybe – you kids are intimidating. To avoid embarrassing myself completely, I’ve invited more people to shoot holes in my theory. You can thank them for this mess.
What is “language” after all?
I’m using the Baker-Shenk & Cokely definition of language. Most definitions focus on speech as a part of language, which only excludes 136 sign languages used across 122 countries [data]. Sign language is language (end rant). This definition requires:
- Relatively arbitrary symbols
- Grammatical signals
- Changes across time
- Members of community share & use to
- interact with each other
- communicate ideas, emotions, and intentions
- transmit culture from generation to generation.
Before we go too far down the rabbit hole, let’s consider this (it’ll make sense later):
- Languages like Chinese and Japanese use characters versus alphabets. The ways these are read and understood differs from languages with alphabets.
- What we know about Latin, Sumerian, and other dead languages comes strictly from its written form.
- Plains Sign Language was used by hearing and Deaf Native American tribes across the plains. It was a shared language across a vast area and is still in use today. It’s completely different than American Sign Language, despite living in the same area. It also has a different history.
- Sign languages are not alike and not universal (sorry, must stress this).
- Sometimes, we try to measure language “complexity” or maturity by word count. Sranan is rumored to only have 300-ish words, but deeper examination says 3,000. It’s really hard to measure since language changes so much and new words are often born from other existing words: head + ache = the pain you are feeling right now.
- No one owns a language. It’s not that people don’t try (Esperanto, Klingon, etc), it’s that to be a true language, it must be owned and created by the community. When people try to create languages, we call these codes in linguistics. Irony, right? Here I am trying to argue a code (Tableau) has helped foster the evolution of language (data viz) and some languages are really codes. Yes, today is the tomorrow you worried about yesterday.
Now that this is as clear as mud, let’s dive into more fun stuff!
Relatively arbitrary symbols
This could be a painting of a mountain. Or a chart. Ask a 4 year old. They know everything, I hear.
If there was no grammar, I could throw charts on a dashboard all day and it wouldn’t matter. But, I hear it does.
Some of these just made me dizzy. And confused. Maybe that’s dazed…
Changes across time
Flo would really hear about this one today. But, it worked at the time. And we still like and reference it, even though doing it today gets you sent out of the data viz class with epic speed. Maybe…
Unless you rock it a la Adam McCann and put Bill Murray in it…
Or look at how the line chart has evolved. William Playfair version with tons of annotations:
And Tableau 10 default – notice how much is missing from above. Our understanding has evolved and so has the chart. Fancy people call this the diachronic process.
Members of a community share and use to:
Interact with each other
I’ll be frank, this is probably the weakest part of my argument. Data visualization goes on paper or a screen. While Tableau lets us have a conversation with our data, it’s between the data and us. Do mouse-clicks count as conversation? What is the sound of 1 hand clapping? I don’t have good answers here, but, go on Twitter and you see raving fans of Tableau for one banding together. Would this rapport form without it?
Communicate ideas, emotions, and intentions
Andy Cotgreave loves to use this example, so I’ll be lazy and steal it. Same charts, but different message due to title, placement, and color.
Transmit culture from generation to generation
What about best practice?
Like your MLA or APA handbook, best practice is the guide to do it proper. They’d want me to use properly. But, go out on the street and ‘do it proper’ is a thing. There’s how we speak (vernacular, common forms) and how we formalize (style books, grammar classes, sentence diagrams). The two are usually at odds with each other in certain areas. I’m taking a linguist’s position here and observing phenomena, not saying what’s right or wrong about this. There’s loads of other places to learn to do it right.
But don’t languages have idioms, poetry, and other artistic forms?
People play with language. It’s what we do. We find what breaks it, what changes it, and what makes it tickle the ear. Jabberwocky is nonsense, but adheres to English rules and formalisms. It could fit within the language.
We break data viz too.
We make non-literal interpretations of data. This viz by Rody Zakovich required rigging the data to shape it – this pattern is created not from the data, but the intention around it.
What are the “words” of data visualization?
Charts would be like words or, in some cases, sentences. The marks (to steal Tableau terminology) are the lowest level of meaning. Going up, concepts are conveyed through dashboards or storypoints presentations. My primary focus is dashboards. Storypoints can use both charts and dashboards and is intended to be more guided. Dashboards are generally intended to be used without a guide.
If we use charts, it seems like we have a basic vocabulary. After all, how many charts are there? Does format affect it? I think so. We also sometimes combine charts in one – a table and bar chart – I’d call this compounding. The dimensions used affect it. A chart with 3 lines may be different linguistically than one with 4.
What about pre-attentive attributes?
We often describe things in the words we understand. Of the 2 concepts, linguistics is the most familiar to me. I feel like like there is a ton of overlap between the 2, but the linguistics model is specific to how I’ve studied dashboards, not charts. Charts and dashboards feed off each other and I think the same of Gestalt principles. The language model takes these and says they are inherent parts of grammar.
Do you really think this is a language?!
If it’s not, it’s close. It seems like to me to have language-like constructs. The #1 argument against it being a language is using it with each other. But, I can use a linguistic model to describe it, which is what helped me. If it helps you, great. If not, at least you can see some dashboards.
Side note: I used to keep a bag of marbles at a job. Sadly, between job changes and moves, I’ve quite literally lost my marbles. I’ll take any you have to spare.