Teaching Computers to Watch Movies¶

This notebook will demonstrate the film-watching capabilities of Moviegoer, as well as explain the film theory significance of its output. This will be a guided tour — no understanding of machine learning, or familiarity with the example movies, required!

At the most basic level, Moviegoer attempts to reverse-engineer a film and break it down into its component pieces, into data. But a movie is highly complex, packed with so many tiny details and intangible rules that make it seem impossible for a machine to understand what's happening.

We (humans) know how to watch a movie. It's so simple, we don't even think about it! But this is because we've all seen tons of movies; we know what to expect. And conversely, filmmakers don't want their movies to be incomprehensible, so they design them in a way that audiences will understand. So if audiences go into a theater with a set of expectations, and filmmakers build their movies for their audience, this means that there's an unspoken agreement, a set of rules behind how films are constructured, and how they're supposed to be watched.

If we can identify and define these rules, we can teach computers how to watch movies.

from IPython.display import Image
Image(filename='moviegoer_logo.png')

Moviegoer's Design Tenets¶

As you read through, keep these three principles of Moviegoer in mind. (They're a bit abstract; see https://moviegoer.ai for more details.)

Movies are self-labelling emotional data — they depict cause-and-effect of characters' changes in emotion
Movies are real life — they're a mirror image of how society perceives itself
Movies are "machines that generate empathy" — they're handcrafted by filmmakers to make the audience feel a specific emotional response

Moviegoer attempts to identify every single bit of information it can, to further its understanding of the film (in the context of the three tenets above). Some of these don't mean anything on their own, but they add up.

The eventual goal is to be able to reverse-engineer and determine the meaning of every single detail of a film, converting a film into data. This data can then be used to train emotional AI models, model and forecast real life, or simulate empathy.

Overview: Film, Scene, and Character¶

We're getting a little philosophical, so let's start the tour of Moviegoer's basic capabilities. The movie-watching process is entirely automated, with no human intervention. The only manual inputs are the commands you see here.

Film¶

We'll be looking at Moviegoer's three levels of analysis: Film, Scene, and Character, in the romantic comedy Take Me Home Tonight (2011). First, we generate a Film object, containing some basic statistics. This is useful for comparing with other films, or as a baseline for comparing against individual scenes.

import sys
sys.path.append('../')
import pandas as pd
from moviegoer.structure import Film
from moviegoer.character import filter_top_characters
pd.set_option('mode.chained_assignment', None)
%matplotlib inline

film_obj = Film('take_me_home_tonight_2011')
film_obj.print_info()

*Film Information*
Title: take_me_home_tonight
Release Year: 2011
File Runtime: 01:37:39
Film Runtime (No Credits): 01:32:25

*Technical Details*
Aspect Ratio: 2.4
Avg. Shot Duration: 2.96
Avg. Brightness: 45
Avg. Contrast: 37

*Dialogue Cadence*
Sentences Per Minute: 21
Words Per Sentence 4.54
Questions Per Minute: 2.66
Pct. Questions: 13%

*Emotion*
Pct. Upset Faces: 47%
Laughs Per Minute: 0.43
Profanity Per Minute: 1.24
Words Per Profanity: 78
Exclamations Per Minute: 3.68

Without any type of organization or structure, a film is just a few thousand image frames and a long audio track. It's second-nature to us, but Moviegoer needs to know how to partition a film into individual scenes. Broadly speaking, these are self-contained, contiguous portions of the film which take place in a single setting (physical location and time), with dialogue resembling a single coherent conversation. Here's a preview of Moviegoer's scene-identification algorithm — but what can we actually learn from individual scenes?

scene_objects = film_obj.generate_dialogue_scenes()
film_obj.display_scene_anchor_shots()

Scene¶

We've generated several Scene objects — let's look at the very first identified scene. Below is a sampling of the scene's frames so we can visualize the scene. Of course, Moviegoer doesn't actually need to see these frames, it's just for our benefit.

scene_obj = scene_objects[0]        # first scene identified
scene_obj.display_scene_frames()

Like the Film object, we can print numerical statistics about the Scene. It may be useful to use the film's stats as a baseline, to compare against a scene's.

scene_obj.print_info()

*Scene Information*
Scene ID: s_0474_0520
Start Frame: 474~0
End Frame: 520~0
Scene Runtime: 0:00:46

*Technical Details*
Aspect Ratio: 2.4
Avg. Shot Duration: 2.94
Avg. Brightness: 68
Avg. Contrast: 48

*Dialogue Cadence*
Sentences Per Minute: 28
Words Per Sentence 4.82
Questions Per Minute: 3.83
Pct. Questions: 14%

*Emotion*
Pct. Upset Faces: 28%
Laughs Per Minute: 0.00
Profanity Per Minute: 0.00
Words Per Profanity: 0
Exclamations Per Minute: 0.00

Moviegoer currently identifies a specific, common format of Dialogue scenes, the two-character alternating dialogue scene. These scenes are the basic building block of all cinema: two characters speaking to each other with no distractions, purely advancing the plot.

Visually, these follow a very distinct pattern: the camera shows Character A speak, then shows Character B speak, then back to A, then back to B, etc. These two shots are known as the Anchor Shots. We end up with one character on the left (facing right), and the other character on the right (facing left).

scene_obj.display_anchor_shots()

In addition to the Anchor shots, we look for Cutaway Shots. These might include closeups of objects, POV shots to indicate what a character sees, shots of another character, etc.

In this basic example, we've found one cutaway, a two-shot with both characters in frame. This type of shot is often used to physically ground the two characters — we can see where in the room, and how closely they're standing, which we wouldn't be able to tell from the anchor shots above.

scene_obj.display_cutaway_shots()

We can learn about the film's plot by analyzing the dialogue. Questions, and their responses, are very important to helping the audience understand the film's events. In particular, we can strictly identify "directed questions", where a character asks a question directly at "you". We've identified two sets of directed QnA, the first teaching us Matt's profession, and the second foreshadowing the film's events, which mostly take place at a party.

scene_obj.display_qna_frames()

(Throughout the presentation, we'll be displaying movie frames that correspond to when dialogue was spoken. Usually, the frame displayed will show the character speaking, but this isn't always the case — sometimes the camera will show the other character, if seeing their reaction is more important.)

Though primarily known for his work on evolutionary biology, Charles Darwin defined six base emotions in humans. We can identify those emotions (and a seventh "neutral") in characters by analyzing their facial expressions. Below, we've identified four in this scene, though they're all represented in the following chart.

scene_obj.chart_anchor_emotions()

Aggregated Scenes¶

Now that we've taken a look at the first scene, let's compare all the film's scenes against one another. The following charts are meant to help us identify scenes with high emotional content. We start by plotting scenes' facial expressions (normalized within each Scene). Scenes are identified by their Scene ID created from their start and end frame.

film_obj.chart_all_scene_normalized_emotions()

Next, we look for indicators of heightened emotion: laughter, profanity, and exclamations. For each scene, we plot the rate of how many occur per minute.

film_obj.chart_all_scene_emotional_indicators()

Again, strictly for clarity (for us eyeball-having humans), we can visualize examples of all three, each from a different individual scene.

film_obj.get_scene_object('s_2600_2666').display_exclamations()

film_obj.get_scene_object('s_3983_4116').display_profanities()

film_obj.get_scene_object('s_2264_2388').display_laughs()

Scene Shape¶

We've just looked at exclamations, profanities, and laughs, which are individual events — we were able to pinpoint the individual frames when each occurred. Next we'll generate charts which analyze a few different dialogue and cinematography traits, to give us a feel for the intangible "shape" of each scene.

To understand the significance of each of these traits, we'll comment on the scenes with the highest and lowest values of each. (These are slightly harder to visualize, because they aren't individual events like laughs, but we'll do our best.)

Words Per Sentence — average number of words in each sentence of dialogue
Sentences Per Minute — average number of sentences of dialogue per minute
Shot Duration — average length of each shot, or unbroken set of frames before the camera cuts away
Upset Face Percentage — percentage of characters' faces that are one of the following: angry, sad, fear, or disgust

film_obj.chart_all_scene_shape()

Words Per Sentence¶

The fourth scene has the lowest WPS of 2.45, depicting an awkward conversation early in the party: their first one-on-one interaction with one another. They struggle to make any meaningful conversation, and get stuck in inconsequential small talk with short, clipped sentences and one-word responses.

# lowest Words Per Sentence (2.45)
film_obj.get_scene_object('s_2093_2119').display_qna_frames()

The seventh scene has the highest WPS of 7.94. Two characters have lots of say, unleashing pent-up frustrations and lecturing the other about their life decisions. Later in the argument, each character declares assumptions about the other like "You're jealous" and "You're so scared". Below, a sampling of the scene's second-person addresses, where either character declares something about "you".

# highest Words Per Sentence (7.94)
film_obj.get_scene_object('s_2600_2666').display_second_p_address_frames()

Sentences Per Minute¶

The second scene has the highest SPM of 36, where a fast-talking Wall Street trader heckles our main character, lobbing insult after insult in rapid succession.

# highest Sentences Per Minute (36)
film_obj.get_scene_object('s_1461_1549').display_profanities()

(We'll look at the scene with the lowest SPM further below, when we look at Upset Face Percentage.)

Shot Duration¶

The eigth scene, with the longest average shot duration of 4.24 seconds, shows two characters playing truth-or-dare, with Tori questioning Matt about his feelings for her when they were in high school. It's a slow scene filled with long shots of Tori — the audience is supposed to interpret her silent reactions to these revelations. Similiarly, it's also the scene with the second-lowest Sentences Per Minute (17).

# longest Shot Duration (4.24 seconds)
film_obj.get_scene_object('s_3746_3817').display_scene_frames()

The second scene has the shortest average shot duration of 2.67 seconds, showing Barry frantically searching for cocaine and quickly being rebuffed by Matt. Below, the scene's first-person declarations (mostly Barry just jonesing for coke).

# shortest Shot Duration (2.67 seconds)
film_obj.get_scene_object('s_1603_1642').display_first_p_sentence_frames()

Upset Face Percentage¶

The sixth scene, with the highest UFP of 72%, has Barry nursing his wounds after a fistfight, still feeling vulnerable when a new character introduces herself.

# highest Upset Face Percentage (72%)
film_obj.get_scene_object('s_2538_2574').display_emotion_frames('fear')

Our movie-script ending scene, where the hero finally gets the girl, has the lowest UFP (12%) — all non-upset facial expressions (88%) are "neutral". Both characters reveal very little emotion, keeping the audience in anticipation and heightening the emotional payoff when Tori finally accepts Matt's propositions. Similarly, Tori's suspensful silence causes this scene to have the lowest Sentences Per Minute (16).

# lowest Upset Face Percentage (12%)
film_obj.get_scene_object('s_5231_5315').display_emotion_frames('neutral')

Question Proportion¶

Finally, we plot Sentences Per Minute and Questions Per Minute, calculating the proportion of sentences which are questions.

film_obj.chart_all_scene_question_proportion()

By default, we filter for directed questions, which specifically ask "you" (though it appears Carlos enjoys answering his own questions).

film_obj.get_scene_object('s_1461_1549').display_qna_frames()

Characters¶

With some structure around the film, we can begin to identify characters, tracking their emotions and pinpointing how individual characters interact with one another. Future work will involve understanding characters' wants and motivations, which in turn will help us identify what causes changes in emotions or drives character interactions.

Below, we guess the names of the primary characters, and find any time they're introduced onscreen: either introducing themselves, or being introduced by others.

film_obj.get_primary_names()

['matt', 'barry', 'tori', 'wendy', 'dad', 'kyle']

film_obj.display_self_intros()

I'm Matt Franklin!

I'm Barry!

film_obj.display_other_intros()

This is Barry...

We create Character objects by identifying and recognizing faces from their appearances in each scene, and then assign a name.

character_objects = film_obj.generate_characters_from_faces()
top_characters = filter_top_characters(character_objects, desired_characters=3)
char_obj = top_characters[0]     # most prominent character
char_obj.probable_name

'matt'

Character Frames¶

We've identified Matt, the star of the film, and the character who appears most often in our identified scenes (which we display below).

char_obj.display_anchor_scenes(scene_objects)

To get a sense of his emotions, we can select facial expressions to display.

char_obj.display_emotion_frames('disgust')

char_obj.display_emotion_frames('angry')

Character Interactions¶

A film is built on characters interacting with one another. We can identify Matt's interactions by locating when people directly address him by name. This has the added benefit of identifying sentences which are a little more emotionally charged — addressing someone by name gives the sentence a little more weight, such as a parent addressing his son by his full name ("We had a deal" vs. "Matthew, we had a deal.")

char_obj.display_direct_addresses()

We can also locate whenever anyone mentions him (or he mentions himself in the third-person). These are sentences where he's the subject or object, and can be used to eventually build an understanding of character-related plot.

char_obj.display_mentions()

Here's how his facial expressions vary throughout each scene.

char_obj.chart_scene_normalized_emotions(film_obj)

Detailed Analysis, and Additional Examples¶

Above was the basic walkthrough of Moviegoer's broad capabilities through the creation of Film, Scene, and Chararcter objects. Below, we'll go a little deeper into all three.

Film Style¶

We've previously looked at charts which attempt to quantify emotional indicators based on dialogue and facial expressions. Next, we focus on style, specifically cinematography and editing. These are a little more abstract, but we can still infer some clues about the film's intended emotional impact. We search for these style indicators across the entire film, not in individual scenes.

Long Takes¶

Long takes are uncut, unbroken shots that last longer than the typical 3-6 second shot. As the shot continues, tension builds as the audience holds their breath until the next cut. Onscreen events unfold in real-time and feel raw and realistic: without cuts, conversations can't be cut and edited into a specific cadence. This has the effect of making everything feel more authentic.

Ford v Ferrari (2019) is a motorsport drama with hyper-kinetic scenes of auto racing. The racing scenes are edited to emphasize the fast action, constantly cutting between close and wide shots of the cars, reactions of the drivers and pit crew, close-ups of the shifters and pedals, etc. But the film's pace slows down for dramatic scenes; we've identified a handful of long takes, including the following:

Matt Damon's character recounts his personal racing career, slowly opening up about the challenges and dangers he faced as an endurance racer
In a risky gambit, Damon's character tries to intimidate the Ford CEO. The CEO stays silent and poker-faced for some time, as Damon's character anxiously awaits his reaction
Christian Bales' character takes his son to the track to describe his mental state during a race. Later in the scene (in a another long take) he connects with his son as he describes the mythical "perfect lap"
Bale's character confronts Damon's character, wondering if they still have the support of the Ford CEO. Damon does his best to project confidence, but Bale seems unconvinced
After Damon's character silently reflects on the film's events, he drives into the sunset as the end-credits music begins and the epilogue text appears onscreen

film_obj = Film('ford_v_ferrari_2019')
film_obj.display_long_take_shots()

Long takes add emotional weight to monologues, allowing the audience to watch as characters develop their thoughts in real-time, as well as to dialogues, keeping the audience in anticipation for characters' reactions. Though the nature of the racing scenes' cinematography and editing didn't allow for any long takes, action scenes can also benefit from this style: think of the tension in Children of Men (2006)'s unbroken six-minute long take of Clive Owen's character running through a tank battle to save a child, with certain death lurking around every corner.

Color Shots¶

For the next two style analyses, we'll look at Booksmart (2019), a high school party comedy. First, we'll look for shots which aren't RGB balanced: they're skewed toward one of the primary or secondary colors. Below, we've found color shots from three scenes, all dialogue-free set pieces.

A dream-sequence in which a character dances with her crush as the room transforms into a stage flooded with colored lights
An underwater chase scene where a character tries to follow someone through a swimming pool
A karaoke scene where the characters let down their guard and have fun with one another

Shots might be color-skewed for style purposes (e.g. emphasizing the fantastic nature of a dream sequence, or adding pizazz to a light-hearted set piece), or just as part of the natural scene context (e.g. if there's a lot of water or fire in a shot).

film_obj = Film('booksmart_2019')
film_obj.display_color_shots()

Non-Conforming Aspect Ratios¶

Here we'll look for shots with aspect ratios which don't match the rest of the film. Booksmart is a contemporary film, and our teenage characters spend some time on their phones watching what's happening at the party. All of the non-conform shots identified (aside from some opening title and end-credit shots) are of phone footage, including one selfie-style video in portrait orientation.

film_obj.display_nonconform_aspect_ratio_shots()

Aside from phone footage, one other common usage of non-conforming aspect ratios is showing a "film-within-a-film", using a more widescreen ratio to indicate the film's characters are themselves watching something.

Profanities, Exclamations, and Laughter¶

We can identify profanity, exclamations, and laughter throughout the film, to locate points of heightened emotion. Here are a few examples from Step Brothers (2008), a Will Ferrell grown-men-yelling comedy.

film_obj = Film('step_brothers_2008')
film_obj.display_profanities()

film_obj.display_exclamations()

film_obj.display_laughs()

Scene Frames¶

Scene Identification¶

The scene-identification algorithm is capable of working across films all of genres and cinematography styles. Here are examples of the first scene identified from a few different films.

film_obj = Film('booksmart_2019')
scene_objects = film_obj.generate_dialogue_scenes()
scene_objects[0].display_scene_frames()

film_obj = Film('step_brothers_2008')
scene_objects = film_obj.generate_dialogue_scenes()
scene_objects[0].display_scene_frames()

film_obj = Film('ford_v_ferrari_2019')
scene_objects = film_obj.generate_dialogue_scenes()
scene_objects[0].display_scene_frames()

film_obj = Film('moneyball_2011')
scene_objects = film_obj.generate_dialogue_scenes()
scene_objects[0].display_scene_frames()

film_obj = Film('no_strings_attached_2011')
scene_objects = film_obj.generate_dialogue_scenes()
scene_objects[0].display_scene_frames()

film_obj = Film('paper_moon_1973')
scene_objects = film_obj.generate_dialogue_scenes()
scene_objects[0].display_scene_frames()

film_obj = Film('erin_brockovich_2000')
scene_objects = film_obj.generate_dialogue_scenes()
scene_objects[0].display_scene_frames()

film_obj = Film('easy_a_2010')
scene_objects = film_obj.generate_dialogue_scenes()
scene_objects[0].display_scene_frames()

film_obj = Film('the_holiday_2006')
scene_objects = film_obj.generate_dialogue_scenes()
scene_objects[0].display_scene_frames()

Cutaway Shots¶

Cutaways are any shots that aren't the two anchor shots. We had looked at an example of a scene with a single cutaway, a two-shot used as an establishing shot (showing the physical placement of the two characters).

Below, in Moneyball (2011), we start with the two anchor shots, and then identify two cutaways. Both cutaways are used to show other characters in the room. They may have a line or two of dialogue, but the bulk of the conversation is conducted between the two characters in the anchor shots.

film_obj = Film('moneyball_2011')
scene_objects = film_obj.generate_dialogue_scenes()
scene_obj = film_obj.get_scene_object('s_0843_0939')
scene_obj.display_anchor_shots()

scene_obj.display_cutaway_shots()

Another common usage of cutaways is the point-of-view shot. These are shots where the camera cuts to something a character is looking at, from a similar perspective (angle), giving the sensation that the audience is seeing what she's seeing. Here's an example of a POV shot from The Holiday (2006).

film_obj = Film('the_holiday_2006')
scene_objects = film_obj.generate_dialogue_scenes()
scene_obj = film_obj.get_scene_object('s_4319_4349')
scene_obj.display_anchor_shots()

We find two cutaway shots: the first is Jack Black's character preparing to look at the papers in Kate Winslet's character's hands, and the other is a shot of the actual papers.

scene_obj.display_cutaway_shots()

To better illustrate this example, we look at a few frames before and after the cutaway starts. Before the first cutaway, we see Kate Winslet holding an envelope (this shot wasn't successfully identified as part of the scene, otherwise it would also be a cutaway), and then Jack Black looks at her hands.

Right before the second cutaway, we see Kate Winslet looking down. This is a POV shot — the angle at which we see the papers roughly matches up with what Kate Winslet would see from her perspective.

scene_obj.display_cutaway_surroundings()

Scene Dialogue¶

First-Person Declarations¶

First-person declarations are sentences when a character is speaking about themselves, as the sentence's subject or object. Continuing the above example, these first-person declarations help us learn a bit more about the cutaway (the papers in her hand), as well as a few other tidbits about the plot.

scene_obj.display_first_p_sentence_frames()

As is common in many romantic comedies, No Strings Attached (2011) features a scene near the film's end where a character confesses her love for another. Here, the first-person declarations focus on the character's internal emotions, instead of the events and details of the her plot arc.

film_obj = Film('no_strings_attached_2011')
scene_objects = film_obj.generate_dialogue_scenes()
scene_obj = film_obj.get_scene_object('s_5834_5879')
scene_obj.display_first_p_sentence_frames()

Second-Person Addresses¶

Second-person addresses are sentences where a character is speaking directly to "you". We had previously seen second-person addresses in an argument scene, where characters accused the other of various affronts. ("You're jealous", "You're so scared")

Near the end of Moneyball (2011), the owner of a rival baseball team tries to poach Brad Pitt's character. The owner summarizes the plot of the film, in the form of second-person addresses to Pitt. By recapping events in this manner, he attempts to build a stronger rapport with Pitt, showing that he personally understands the sacrifices behind his success.

film_obj = Film('moneyball_2011')
scene_objects = film_obj.generate_dialogue_scenes()
scene_obj = film_obj.get_scene_object('s_6925_7121')
scene_obj.display_second_p_address_frames()

Not all second-person addresses are declarative — they may be imperative as well, commanding another character to do something. In Paper Moon (1973), Ryan O'Neal's character is preparing his nine-year-old partner-in-crime for an upcoming scam.

film_obj = Film('paper_moon_1973')
scene_objects = film_obj.generate_dialogue_scenes()
scene_obj = film_obj.get_scene_object('s_1343_1375')
scene_obj.display_second_p_address_frames()

Questions and Answers¶

As we've seen, questions and their answers often teach us about characters and plot.

Sometimes characters will ask a question, and then an immediate follow-up question. "Are you out of your mind? What's wrong with you?" We should treat follow-ups as part of the "question chain", and not designate the response until the chain is over.

film_obj = Film('no_strings_attached_2011')
scene_objects = film_obj.generate_dialogue_scenes()
scene_obj = film_obj.get_scene_object('s_4234_4285')
scene_obj.display_qna_frames()

Until now, we've been looking for directed questions, which address "you". Directed questions are usually answered with first-person responses (i.e. questions about "you" have answers about "I"), which often reveal something about the responding character.

We can also identify all questions regardless of the sentence's implied subject. Non-directed questions may pertain to anything, as opposed to just something personal about the responding character. In this scene from Erin Brockovich (2000), these non-directed questions and their responses advance the plot.

film_obj = Film('erin_brockovich_2000')
scene_objects = film_obj.generate_dialogue_scenes()
scene_obj = film_obj.get_scene_object('s_6592_6629')
scene_obj.display_qna_frames(directed_only=False)        # all questions, not just directed questions

Character Identification¶

Here are two more examples of character identification, from Easy A (2010), a high school comedy.

film_id = 'easy_a_2010'
film_obj = Film(film_id)
scene_objects = film_obj.generate_dialogue_scenes(pair_alternations=4)
character_objects = film_obj.generate_characters_from_faces(scene_objects)
top_characters = filter_top_characters(character_objects, desired_characters=2)
char_obj = top_characters[0]     # most prominent character
char_obj.probable_name

'olive'

We've had a lot of luck with both the scene identification and character identification algorithms. Olive is the film's main character, and we find her in most of the film's scenes.

char_obj.chart_scene_normalized_emotions(film_obj)

char_obj.display_anchor_scenes(scene_objects)

char_obj.display_direct_addresses()

char_obj.display_mentions()

The next identified character is a guidance counselor. Since most of the other characters are students, they address her with the honorific "Mrs.", and it's the only name that the audience knows. This becomes her canonical name — we don't know her first name, or if she even has one.

char_obj = top_characters[1]
char_obj.display_direct_addresses()

char_obj.display_mentions()

char_obj.probable_name

'mrs. griffith'

char_obj.display_anchor_scenes(scene_objects)

The End¶

Thanks for making it to the end of the demo! I hope all the examples were clear, and that sufficient context was provided behind every design decision. I welcome suggestions on both the current development of the project, as well as engineering for future functionality.

And finally, for the technical crowd: If you plan on getting hands-on with Moviegoer (and you've installed the proper libraries and organized the directory structure of the pre-serialized data) you should be able to open this as a notebook and execute Kernel -> Restart and Run All to reproduce these results. Aside from this demonstration notebook, there's also an experimentation notebook, with tips for using different parameters. You can also use your own movie files and serialize them into dataframes, then using the experimentation notebook to analyze them.