The Shape of Stories: Analyzing Film Cutting Speed for Tableau’s IronViz

I often tell people I’m “sound and lights” when it comes to getting attention.  Given a choice, you’ll find me at the sound booth or playing with the lights.  So, when the final IronViz feeder came around, I already knew I’d focus not directly on films or shows, but on topics around it (blame my severe lack of TV-watching habits).  Copyright came to mind, but so did script optioning (the process of buying stories that are already made, such as books).  I also had the idea of remixing on my mind courtesy of a friend’s referral.   Finding data for these ideas seemed insurmountable.

Act 1 – A Quest

[I’m frantically hitting the interwebs at this point.  The browser may or may not have 132 tabs across 4 windows.  Copyright and optioning are leading to dead ends.  I text my friend the scriptwriter to see if he has ideas.  He says no.  I keep looking with no luck in sight.]

Then I found James Cutting.  And this:

The magic lines went something like this:

In a 2010 study, Cutting found an average of 1,132 shots per film in a smaller sample of 150 movies made between 1935 and 2010; the King Kong remake, incidentally, had the most: A whopping 3,099 shots packed into 187 minutes.

A link.  To a study.  (Or really supplemental material, but I’ll take it!)  I may have gotten lost in papers for a bit.  Either way, they lead me to Cinemetrics, an open research database.  I may have fallen off my chair at this.  After all my complaining about open data, here was a living, breathing researcher sharing his work and crowdsourcing ideas for additional research.  (Seriously, go give them kudos!)

Now, I couldn’t find an API to get at the data and I didn’t want to hand copy it.  Others, more smarter than me, know a thing or two about web scraping.  Someday, these people will write posts and tag me on Twitter (hint, hint).  So, like a newb jedi with a lightsaber just chilling nearby, I went for it (this seems to be a classic Hollywood formula).  I used Alteryx and made my very own Frankenstein.  It took days because I have absolutely no clue and this is the 2nd workflow I’ve started, and the first I’ve finished.

After getting data, merging this with IMDB data and kicking out a fair bit of test and partial data, I had my data set.  I then started making sure I understood it by replicating some of the work Dr. Tsivian and others have done.  These are not my ideas.

I also played around with some variations:

Now, Joseph Campell talks about character archtypes and we (Tableau people) sometimes discuss “style” around vizzes.  Could the camera act as a lens into this formula?  What about by genre?  Do directors favor certain cuts again and again?  Like a top 40 remix, I had to queue this up for repeat.

Act 2 – Building Early!

[Creativity is a fickle muse.  Like a toddler, it goes from jumping on the bed to demanding food with no other solution in sight.  The ideas are coming at this point and a few ideas are written on a whiteboard.  I’m both getting data still and exploring data.  This is sooooo not what you’re supposed to do, I hear.]

I was insanely curious about this data.  But, I was back and forth between getting the data, merging it with IMDB data, and analyzing it.  Naturally, this is a my-bad.

A few things of note here – I’m not a researcher.  So, for me, replicating existing work (to make sure I had data right) was crucial.  Both of my data sets are crowd-sourced, which means I really need to check it.

To me, there’s a few key parts to this data:

  • Shot length – Dr. Tsivian has focused heavily on this in his research, focusing on polynomial shapes.  Others have used mean shot length and compared these in aggregate. I want to be able to look at this more.
  • Number of shots per minute (or other interval) – this helps me get a feel for pacing and allows me to begin standardizing.  It too creates a shape, though I lose some of information (read Dr. Tsivian’s work).
  • Genre – IMDB provides me a few options with films. There’s several entries in each field and my data source is already huge.
  • Director – directors quite literally set the pace of the story.  Can we find patterns in their work in shot length and number of shots?  Does genre matter or influence their style?
  • Type is an interesting field, but there’s high variety in how it’s entered, if it’s entered.  It may be useful or it may not be.  I’ll find out.
  • This data is hand curated.  How much does that influence it?  Do I need to do something about it?

Act 3 – A Dead End

[It’s the weekend and the JSON API for my IMBD data won’t move.  It’s sitting at 5% for days.  This, kids, isn’t right.]

I ended up switching to OMDB API.  This meant changing up my Alteryx job and losing some data.  You do what you have to.  I started with a literal design around film, but didn’t make much headway.  It wasn’t hitting it.

Frustrated, I look at changing it up.  And I find this.

It makes me laugh and I consider an approach similar to Robert Rouse with Us vs Them.  I need to add to it, so I go into iMovie and I find an option to make trailers.  Silver screen, get ready for this dashboard debut!

Act 4 – A Surprise Ending

[Several hours later, the GIF is not a GIF.  It’s a trailer for the dashboard.  Hey, even B-rated films become cult classics, right?]

I decide to go literal, but in a different direction.  Channeling iMovie, I make its Linux cousin, Cutting Room Pro.  I played around through a few iterations.  A big thank you goes to Mike Cisneros for his keen eye!  Like any other Hollywood production, there will probably be a sequel to this.

Trailer:

Full Theatrical Release: (Warning – director edits may happen)


And in case you’re completely lost…I offer you smarter people than me.