Introduction

In this post I apply some standard ecological statistics to the music of Star Wars. Doing so allows me to combine multiple things that I love. First, I’ll walk the reader through the background and the podcast that gave me a new appreciation for the music of Star Wars. Then I give the reader some background on the type of stats I will apply and give a toy example. Then I go to town and do my best explain the results in plain language. I don’t assume any knowledge fo statistics, but I do assume that you know what Star Wars is and that is has music. If I have failed to describe any of this in plain language please let me know so that I can edit the post and make this more approachable.

A short aside, I have created this page using the program R and the RMarkdown package in RStudio. There is a Table of Contents on the left hand side of your screen that you can use to move around the post. Click on a name in the Table of Contents that the page will navigate to that section and may expand to reveal sub-sections. For those curious, all of the code and data used to create this exact post are freely available through this project’s github repository.

Star Wars Oxygen

I love Star Wars. I love the story telling and fantasy, but I especially love the music. John Williams is amazing. There was a podcast out there called Star Wars Oxygen that covered the music of Star Wars and it was one of my favorite podcasts of all time. Jimmy Mac hosted while voice actor, musician, and composer David W. Collins broke down the scores for the films we know and love in a way that gave me a new appreciation for the films. I say there was a podcast because the podcast went dark following the release of Rogue One. After 38 wonderful volumes the podcast just wasn’t updated any more and we the fans have not heard anything about why they stopped producing the show.

Species diversity

I also love statistics and ecology, which is the study of how organisms relate to each other and their environments. One exciting area of research deals with species diversity, which is how many species are found in and among sites. We can use statistics to figure out how many things live in a certain area and compare how habitats are similar or different from one another. In order to conduct an analysis like this you need a count matrix, which has habitats in the rows and species in the columns. The cells are filled in with counts of how many of each species is found in each habitat. An example of a count matrix could look like this, where butterfly species are in columns and different habitats are in rows:

Table 1: Example of a count matrix where each row represents a habitat and each column represents a species. The cells are filled with counts of the number of each species observed at each habitat.
Danaus plexippus Vanessa cardui Adelpha bredowii
Donner Pass 5 6 0
Sierraville 4 2 0
Davis 0 2 3

In this example, we can see that Donner Pass and Sierraville are similar to each other for two species (D. plexippus and V. cardui). Also Davis and Sierraville are somewhat similar to each other because they have one species in common (V. cardui and A. bredowii). If we were going to group these habitats based on similarity, Donner Pass and Sierraville would be more similar to each other than to Davis.

If we apply some statistics and then plot these relationships in the form of a “tree” (Figure 1), where similar habitats are connected by a “branch,” we see that Donner Pass and Sierraville are most similar to each other (they are connected by a branch). This makes sense, because Donner Pass and Sierraville are about 50 km apart, while both of those sites are ~160 km from Davis.

**Figure 1**: Cluster plot of the toy example referred to above.

Figure 1: Cluster plot of the toy example referred to above.

Star Wars musical ecology

During the Star Wars Oxygen podcast, David W. Collins began what he called his “theme tracker,” which was essential a spreadsheet of the number of times a theme played per film. Each theme was a column. Each film was a row.

David W. Collins made a count matrix.

We can use statistics on a count matrix.

We can apply statistics to Star Wars!!!! Oh happy day!!!

For the purposes of this exercise, the theme will take the place of species. What is a theme? A theme, also known as a leitmotif, is a piece of music that is used to represent a character. Remember the music from A New Hope that was playing when Luke Skywalker was looking into the binary sunset of Tatooine? The music that played during that scene was called Obi Wan Kenobi’s theme. In all films since A New Hope that music has been called The Force theme. The Imperial March is a theme; as is the eponymous Rey’s theme that plays almost every time you see Rey. The fact that a theme plays (almost) every time you see a character is key. So, Yoda has a theme, Kylo Ren has two themes, and Leia has a theme.

The data

I was unable to get my very own copy of Mr. Collins theme tracker. No worries. To reverse engineer the theme tracker I listened back through all of the Star Wars Oxygen episodes with pencil and paper ready. I made note of how often a theme was played during a particular film every time Mr. Collins mentioned it. In some instances, I had to get a bit of help so I watched the films and made notes of all the times I heard a theme. I also read the breakdowns and threads from these sites:

This was especially helpful when going through Attack of the Clones, which had a lot of music edits.

Then I attempted my own impression of David W. Collins and Star Wars Oxygen and went through Rogue One three times and counted each instance of what I thought was a “theme.” At the time of writing this (2017-12-22) The Last Jedi has been out for a week and I have seen four time. The last two viewings I took a piece of paper and a pencil with me to note every time I heard a theme. I have almost certainly missed some things because I am not a trained musician and I could have considered themes to be separate entities when they were they were actually part of the same leitmotifs. Regardless, I stand behind the numbers I present for Rogue One and The Last Jedi, but please reach out if you find errors.

The data I ended up with, and which are used here, had:

  • 9 rows - one for each film (“ecosystem”)
  • 52 columns - one for each theme (“species”)

These data could be incomplete and could benefit from outside assistance. I am slightly concerned by the lack of “rare” themes in the data set. Rare things can be important in ecology but likely won’t have a big impact on the analyses I conducted. With that said, I could still use some help! Please contribute to the theme tracker. There are a few ways you could contribute:

General Plots

All the themes

Let’s make a histogram where the total number of appearances each theme makes in the saga is plotted.

Why did I make this plot?

Well, seeing how many times a theme appears in all of the Star Wars films can give us an idea of what the major themes are in the series (Figure 2). For example, The Force / Obi Wan Kenobi’s theme is used 135 times in the films; which is 42 times more than we hear the Main title / Luke’s theme. To really investigate the plot I made, hover your cursor over each bar to see what it represents. You can also click and drag over an area to zoom in (double-click to zoom out).

Figure 2: Plot of all theme appearances

Themes by Film

It may also be informative to look at the distribution of themes within each film. I’ve made a plot where each film is represented by a bar it is filled according to the frequency of the themes in that movie. To explore this figure, hover your cursor over a bar to see the theme and number of times it appeared in that film. Try clicking on compare data on hover to see all the themes at once. The color for each theme is consistent across all the films.

Figure 3: Themes by film

I knew it! The Last Jedi has the most number of thematic appearances at over 150 (Figure 3)! I bet if we look at the diversity metrics below the same will be true for the total number of themes present in the film.

Analysis

Clustering

Now we’ll make a tree depicting the relationships between the eight films of the Star Wars saga and Rogue One, just as we did in the toy example above.

A prediction on the clustering analysis. I postulate that the three original trilogy films will cluster separately from the prequel trilogy films (which will also cluster together). I also predict that the films of the new trilogy will be more similar to the original trilogy than the prequels.

Adding the data from Rogue One allows us to see where that film lies in relation to the others. Michael Giacchino rooted the music for Rogue One firmly within Star Wars. He used parts from A New Hope to form the themes used in Rogue One, for example Jyn Erso’s Suite was based on “the Message,” which plays in the background when Obi-Wan says “You must learn the ways of the Force….” It is also the only Star Wars film to share “Darth Vader’s” theme with A New Hope.

I have also added data from two viewings of The Last Jedi, which I think is a masterful score. I picked up on themes for Finn and Rose, as well as the return of both of Kylo Ren’s themes, Rey’s theme, Poe’s theme, The March of the Resistance, and so much more. Not going to lie, I choked up when I heard Luke and Leia’s theme in the final scene.

**Figure 4**: Clustering of the Star Wars films based on the their musical theme counts.

Figure 4: Clustering of the Star Wars films based on the their musical theme counts.

This plot (Figure 4) shows that the prequel trilogy films do indeed cluster together, and that the original trilogy films cluster together. When I include the data from The Last Jedi we see it falls out right next to The Force Awakens, suggesting that the new trilogy films have a lot in common musically. Notice that Rogue One is right there in between the prequel trilogy and the original trilogy? That makes a lot of sense to me because it shares some themes with the original trilogy but is really a uniquely scored film.

Jost’s D

This metric is a way of counting how many things there are in a certain habitat. The cool thing about Jost’s D is that you can consider how many things there are while accounting for how rare they are (that is the q on the bottom of Figure 5. Here we count the number of different themes by film and consider how many different themes there are if we weight “rarity.”

**Figure 5**: Plot of the effective number of themes by Star Wars film.

Figure 5: Plot of the effective number of themes by Star Wars film.

To read this plot we look at the y (vertical) axis to see the number of themes. The Greek letter alpha (\(\alpha\)) is the statistical designation for “unique things.” Along the x (horizontal) axis we have the different weights we place on “rarity,” the q that I mentioned above. A weight of 0 means that all themes are equal and it represents the total number of themes present in each film. As we move right along the x-axis we decrease the number of themes because we give them less weight. All the way to the right (q = 5) we hardly consider the effect that rare themes have on the number of themes.

Note that The Empire Strikes Back actually has the fewest total number of themes (when q = 0) at 8, followed by A New Hope with 9. When we get to The Last Jedi there are 18 different themes that appear in the film! Rogue One actually have the highest total number of themes at 20, but when we care less about rare themes (q = 5) The Last Jedi is a bit higher. One way to think about this is that Rogue One had more themes that were only played a few times. One other take away from this analysis, is that all Star Wars films have ~5 themes that we hear frequently in each film, but we can’t say if these themes are shared among all the films.

One last note of geekery. The colors from that plot were made with an R package called spaceMovie, which uses colors from the Star Wars franchise.

NMDS

Lastly, I employ another method of visualization called NMDS (Non-Metric MultiDimensional Scaling) which plots the locations of each “habitat” in space. In this case, each film appears on the plot in a place relative to the other films based on similarity. That is to say, similar things should be closer together than dissimilar things.

**Figure 6**: NMDS Ordination plot of the Star Wars films.

Figure 6: NMDS Ordination plot of the Star Wars films.

Think about which films you could draw an ellipse around without including any other films (Figure 6). We could have the computer draw an ellipse around the prequel trilogy so that it only contains the prequels, suggesting that these films are more similar to each other than they are to other films. We can also have the computer draw an ellipse around the original trilogy. Lastly, The Force Awakens and The Last Jedi are off by themselves and I predict that once Episode IX comes out it fall out with them. Right now the computer can’t draw an ellipse around two points. As in the clustering analysis, Rogue One is off doing its own thing. All together, these findings are consistent with the clustering “tree” we saw earlier.

Conclusions

I have four big takeaways about the music of the Star Wars films based on this exercise:

  1. The original trilogy films are most similar to each other.
  2. The prequel trilogy films are most similar to each other.
  3. The Force Awakens and The Last Jedi are grouping together but appear close to the original trilogy.
  4. Rogue One is its own thing, but more similar to the original trilogy than anything else.

These results make a lot of sense to me. I interpret this to mean that John Williams kept similar themes throughout each of the two trilogies, and that the new trilogy is building off of the original trilogy. I also see that Michael Giacchino used themes found in A New Hope to ground Rogue One in the Star Wars musical universe, but made the score his own.

Before the The Last Jedi premiered, I predicted that Episode VIII would be closely related to The Force Awakens. I’m glad to see that I was right. If you don’t believe that I predicted this, go through the “history” in the code repository that houses this page and see for yourself. Lastly,

Data Table

In case you didn’t want to follow the links to view the data I used for this post, below is a copy you can peruse:

Table 2: Data used in the present analysis
Episode Main_title Force_theme Vaders_theme Leia_theme Death_star Rebel_fanfare March_resistance Han_Leia Reys_theme Imperial_march Kylo_1 Kylo_2 Poes_theme Falcon_theme Scherzo Jedi_steps Battle_heroes Emperor_theme Across_stars Greivous_theme Arena_monsters Trade_federation Anakin_theme Yoda_theme Duel_fates Droid_Empire Jaba_theme Luke_Leia Droid_Jedi Jar_Jar Qui_Gon Jangos_escape Separatist_conspiracy Camino Tusken_slaughter Rogue_theme The_Message Jyns_theme Krennics_theme Guardian_Whills Jedha_Saw Battle_preparations The_rebels Rebel_action Troopers_moving Master_switch Scarif_battle Hope Rose_theme TIE_fighter_attack Snoke Finn
EI 6 11 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 6 0 0 0 1 3 1 1 0 1 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
EII 3 8 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 3 25 0 6 2 3 2 1 0 0 0 0 0 0 1 11 6 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
EIII 4 20 0 2 1 3 0 0 0 13 0 0 0 0 0 0 8 6 6 3 2 1 1 1 2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
EIV 17 18 17 8 7 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0
EV 28 14 0 3 0 6 0 19 0 37 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
EVI 18 19 0 4 0 9 0 9 0 20 0 0 0 0 0 0 0 3 0 0 0 0 0 4 0 0 2 5 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
EVII 13 13 0 4 0 8 5 3 25 2 10 2 4 8 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
EVIII 2 25 0 10 0 9 10 2 23 1 12 13 4 0 0 3 0 0 0 0 0 0 0 3 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17 4 7 9
ROne 2 7 2 0 3 6 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 23 20 7 9 4 13 3 1 4 10 3 0 0 0 0