Data Analysis Sample

Overview of Strikes: 1993 - 2023

Author

Danny Holt


Code
library(tidyverse)
library(readxl)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

# read in data
  strike <- read_excel("_data/monthly-listing.xlsx", skip=1)

  # filter out footnotes
  strike <- head(strike, -6)

  # remove redundant and unnecessary variables
  strike <- strike %>%
    select("Organizations involved","States","Ownership","Union acronym","Work stoppage beginning date","Work stoppage ending date","Number of workers[2]","Days idle, cumulative for this work stoppage[3]")

  # rename variables
  strike <- strike %>%
    rename(
      "Employer"="Organizations involved",
      "Union"="Union acronym",
      "Start date"="Work stoppage beginning date",
      "End date"="Work stoppage ending date",
      "Workers"="Number of workers[2]",
      "Days struck"="Days idle, cumulative for this work stoppage[3]"
    )

Introduction

How have strikes changed over the past 30 years? My goal with this analysis is to gain a better understanding of labor in the U.S. by looking at changes in work stoppages—a significant part of organized labor’s activity and a shaping force in worker-employer relations.

This analysis utilizes the monthly-listing.xlsx dataset from the Bureau of Labor Statistics, which contains information on work stoppages (strikes) from 1993 to 2023. Each case represents a distinct strike, with a total of 629 strikes.

Code
# view data
strike

Outline of significant variables

Employer

There are 485 distinct employers in the dataset.

Here are some of the employers with the most strikes in the period:

Code
# create factor levels (by occurrences)
em_levels = names(sort(table(strike$Employer), decreasing = TRUE))

# sort employers
count_em <- strike %>%
  mutate(Employer = factor(Employer, em_levels))

# view
top_empl <- data.frame(Employer=head(em_levels))
top_empl

We can visualize the frequency distribution of employers, with employers sorted from most strikes in the period to least. The bar for each distinct employer is colored to show distinct unions that have struck there.

Code
# spacing for graph
spaces_em <- function(x) x[seq_along(x) %% 80 == 0]

# bar graph
ggplot(count_em, aes(Employer)) +
  geom_bar(aes(fill=Union)) +
  scale_y_continuous(n.breaks=20) +
  labs(title="Employers in descending order of strikes",subtitle="1993 - 2023",x="Employers",y="Strikes") +
  scale_x_discrete(breaks=spaces_em,labels=c("80","160","240","320","400","480")) + 
  theme_minimal() +
  guides(fill = FALSE)

The above chart shows the few employers were struck many times, but most were only struck one or two times.

The colors show that most employers with many strikes were struck by multiple different unions. One thing this can signal is a high level of militancy across different types of workers employed by the employer. For instance, an employer like the University of California, which had the third-most strikes in the period, employs many different types of workers belonging to different unions, including CWA, the UAW, AFSCME, and the IBT. See here:

Code
strike %>%
  filter(Employer == "University of California")

However, the presence of strikes by multiple unions at a single employer can also signal turf disputes. This is the case with Kaiser Permanente, the employer with the second-most strikes in the period, which saw strikes from separate healthcare workers’ unions, including SEIU, CNA, and NUHW (in addition to UFCW, which represents non-healthcare workers). See here:

Code
strike %>%
  filter(Employer == "Kaiser Permanente")
Code
# break out entries with multiple states
strike_st <- separate_rows(strike,States,sep=", ")
strike_st <- separate_rows(strike_st,States,sep=",")

# filter out data without valid state information
strike_st <- strike_st %>%
  filter(!grepl("Interstate", States)) %>%
  filter(!grepl("East Coast States", States)) %>%
  filter(!grepl("Nationwide", States))

# convert states to factors
st_levels <- c("WY", "WI", "WV", "WA", "VA", "VT", "UT", "TX", "TN", "SD", "SC", "RI", "PA", "OR", "OK", "OH", "ND", "NC", "NY", "NM", "NJ", "NH", "NV", "NE", "MT", "MO", "MS", "MN", "MI", "MA", "MD", "ME", "LA", "KY", "KS", "IA", "IN", "IL", "ID", "HI", "GA", "FL", "DE", "CT", "CO", "CA", "AR", "AZ", "AK", "AL")
strike_st <- strike_st %>%
  mutate(States=factor(States,levels=st_levels))

# drop NA
strike_st <- strike_st %>%
  drop_na(States)

Ownership type

Shows what type of entity the employer is of the following options: private industry, as well as local and/or state government. This variable is categorical.

Here is a visualization of the frequency of ownership types in strikes:

Code
# break out state and local government into state government and local government,
strike_own <- separate_rows(strike,Ownership,sep=" and ")
strike_own <- strike_own %>%
  mutate(Ownership=case_when(
    Ownership == "State" ~ "State government",
    Ownership == "State government" ~ "State government",
    Ownership == "Local government" ~ "Local government",
    Ownership == "local government" ~ "Local government",
    Ownership == "Private industry" ~ "Private industry"),
    # code colors for ggplot
    Color=case_when(
      Ownership == "State government" ~ "darkred",
      Ownership == "Local government" ~ "darkblue",
      Ownership == "Private industry" ~ "darkgreen",
    )
    )

# filter out data without valid information
strike_own <- strike_own %>%
  drop_na(Ownership)

# convert ownership types to factors
own_levels <- c("Private industry", "State government", "Local government")
strike_own <- strike_own %>%
  mutate(Ownership=factor(Ownership,levels=own_levels))

# bar graph
ggplot(strike_own, aes(Ownership, fill=Color)) +
  guides(fill = FALSE) +
  geom_bar() +
  labs(title="Number of strikes by employer ownership type",subtitle="1993 - 2023",x="Ownership type",y="Strikes") +
  theme_minimal()

This chart shows, more than anything else, the size of the private sector compared with the public sector in the U.S. The fact that the bars for local and state government strikes are as large as they are is a testament to a much higher unionization rate in the public sector than in the private sector—33.1% in the public sector versus 6.0% in the private sector (Bureau of Labor Statistics 2023b).

Union

Shows the union to which the striking workers belonged. There are 131 distinct unions in the dataset. Here are some of the Unions with the most strikes in the period:

Code
# separate cells with multiple unions
strike_un <- separate_rows(strike,Union,sep=", ")
strike_un <- separate_rows(strike_un,Union,sep="; ")
strike_un <- separate_rows(strike_un,Union,sep=". ")

# filter out separate entries for union local numbers (redundant)
strike_un <- strike_un %>%
  filter(!grepl("234", Union)) %>%
  filter(!grepl("1594", Union))

# drop NA
strike_un <- strike_un %>%
  drop_na(Union)

# create factor levels (by occurrences)
un_levels = names(sort(table(strike_un$Union), decreasing = TRUE))

# sort unions
strike_un <- strike_un %>%
  mutate(Union = factor(Union, levels=un_levels))

# view top unions
top_union <- data.frame(Union=head(un_levels))
top_union

We can visualize the frequency distribution, with unions sorted from most strikes in the period to least. The bar for each distinct union is colored to show distinct employers struck.

Code
# spacing for graph
spaces_un <- function(x) x[seq_along(x) %% 26 == 0]

# bar graph
ggplot(strike_un, aes(Union)) +
  geom_bar(aes(fill=Employer)) +
  labs(title="Unions in descending order of strikes",subtitle="1993 - 2023",x="Unions",y="Strikes") +
  theme_minimal() +
  guides(fill = FALSE) +
  scale_x_discrete(breaks=spaces_un,labels=c("26","52","78","104","130"))

The graph shows us a few things. Perhaps most apparent is that most unions in the dataset only engaged in one strike over these three decades.

We also see that most unions with multiple strikes, like SEIU and the UAW, struck multiple employers. This may seem obvious—unions with more striking bargaining units will have more total strikes. However, the presence of strikes in one bargaining unit, I suspect, is likely to encourage action in other units (at other employers) in the same union.

Workers

Shows the number of workers who went on strike. This column contains numerical, discrete, ratio data.

Here are some measures of central tendency:

Code
# pivot to create new date type variable
strike_date <- strike %>%
  pivot_longer(c(`Start date`,`End date`),names_to="Type",values_to="Date")

# median
strike %>%
  summarize(Median=median(Workers, na.rm=TRUE))
Code
# mean
strike %>%
  summarize(Mean=mean(Workers, na.rm=TRUE))

Here, we see how the presence of many relatively small strikes brings down the median far below the mean.

Days struck

Shows the total number of hours of labor workers withheld during the strike.

Here are some measures of central tendency:

Code
# median
strike %>%
  summarize(Median=median(`Days struck`, na.rm=TRUE))
Code
# mean
strike %>%
  summarize(Mean=mean(`Days struck`, na.rm=TRUE))
Code
# select relevant variables
strike_yr <- strike %>%
  select(
    "Start date",
    "Workers",
    "Days struck",
    "Ownership",
    "States")

# truncate start date to year
strike_yr$"Start date" <- as.numeric(format(strike_yr$"Start date", "%Y"))
  
# rename start date to year
strike_yr <- strike_yr %>%
  rename("Year"="Start date") %>%
  rename("Days" = "Days struck")
  
# condense rows by year and remove problematic rows
  strike_yr_sum <- strike_yr %>%
    filter(Year>1990) %>%
    group_by(Year)
  #create condensed view
  strike_yr_sum_cond <- strike_yr_sum %>%
    summarize(
      Workers=sum(Workers, na.rm=TRUE),
      `Days Struck`=sum(Days, na.rm=TRUE),
    )
  #create less condensed view
  strike_yr_sum <- strike_yr_sum %>%
    summarize(
      Workers=sum(Workers, na.rm=TRUE),
      `Days Struck`=sum(Days, na.rm=TRUE),
      Ownership=Ownership,
      States=States
    )

Here, we see how the presence of many relatively short and small strikes brings down the median far below the mean.

Changes in annual strike magnitude

How have numbers of workers on strike and number of days struck changed over time in the past 30 years?

First, see the years in order from most workers on strike to least.

Code
strike_yr_sum_cond%>%
  arrange(desc(Workers))

Next, see the years in order from most days struck to least.

Code
strike_yr_sum_cond%>%
  arrange(desc(`Days Struck`))

We can also visualize the change in workers on strike each year over time.

Code
ggplot(strike_yr_sum,aes(x=Year, y=Workers,fill=Ownership)) +
  geom_area() +
  labs(title="Change in number of striking workers by year",subtitle="1993 - 2023",x="Year",y="Workers") +
  scale_x_continuous(breaks = seq(1993,2023,by=3),
                     minor_breaks = seq(1993, 2023, by = 1)) +
  theme_minimal()

Now, we can observe the change in days struck by year.

Code
ggplot(strike_yr_sum,aes(x=Year, y=`Days Struck`, fill=Ownership)) +
  geom_area() +
  labs(title="Change in number of days struck by year",subtitle="1993 - 2023",x="Year",y="Days struck") +
  scale_x_continuous(breaks = seq(1993,2023,by=3),
                     minor_breaks = seq(1993, 2023, by = 1)) +
  theme_minimal()

In both graphs, we see relatively significant activity through the 1990s, until a sharp drop in 1999, followed by a large spike in 2000. Note that the 2000 spike is particularly massive in the Days struck visualization, suggesting the presence of many very long strikes in that year.

After 2000, we see a period of decline with relatively little activity through the mid-2010s. Then, however, we find the largest spike in the Workers graph in 2018 and 2019, with a large but not quite as significant spike at that time in the Days struck graph, suggesting the presence of very many relatively short strikes in these years. From there, we see a sharp drop in 2020—likely the result of the COVID-19 pandemic—followed by rises in the years after.

Highlight: 2000

To reiterate, 2000 features high in these views. To look at the most significant contributors., we can view strikes in 2000 with more than 5000 workers, sorted from most to least workers involved.

Code
# create new data frame
strike00 <- strike

# change date to year
strike00$"Start date" <- as.numeric(format(strike00$"Start date", "%Y"))
strike00 <- strike00 %>%
  rename("Year" = "Start date") %>%
    
# filter to 2000, large strikes
  filter(Year==2000,Workers>5000) %>%
  select(Union,Employer,States,Workers,"Days struck")
strike00workers <- arrange(strike00, desc(Workers))

# view data
strike00workers

By far the largest strike in 2000, we see, was the commercial actors strike by the Screen Actors Guild (SAG) and the American Federation of Television and Radio Artists (AFTRA) against the American Association of Advertising Agencies, across most of the country.

The next largest strike in 2000 was by the Communication Workers of America (CWA) and the International Brotherhood of Electrical Workers (IBEW) against Verizon in the Northeast and Mid-Atlantic regions of the U.S.

Highlight: 2018 - 2019

We can do the same thing we did for 2000 with 2018 and 2019.

Code
# create new data frame
strike1819 <- strike

# change date to year
strike1819$"Start date" <- as.numeric(format(strike1819$"Start date", "%Y"))
strike1819 <- strike1819 %>%
  rename("Year" = "Start date") %>%
    
# filter to 2018, large strikes
  filter(Year==2018 | Year==2019,Workers>5000) %>%
  select(Union,Employer,States,Workers,"Days struck")
strike1819workers <- arrange(strike1819, desc(Workers))

# view data
strike1819workers

We see that many of the largest strikes in 2018 and 2019 were against state legislatures—particularly those of North Carolina, Arizona, Colorado, Oklahoma, West Virginia, Kentucky, Oregon, and South Carolina. These strikes, along with the strike against the Los Angeles Unified School District, made up the “#RedForEd” wave of teacher strikes in these years, named for the social media hashtag that accompanied it, referring to the red shirts striking teachers wore.

Changes in individual strike magnitude

A scatter plot of the number of workers in strikes by start date shows that mid-size to large strikes were more concentrated before the early 2000s compared with after.

Code
ggplot(strike, aes(x=as.Date(`Start date`))) +
  geom_point(aes(y=Workers),color="red") +
  scale_x_date(limits = as.Date(c("1993-01-01", "2023-01-01"))) +
  labs(title="Change in size of individual strikes by number of workers",subtitle="1993 - 2023",x="Year",y="Workers")

Private vs. public sector

How have strikes changed in size within the private sector and within the public sector (state government and local government), viewed separately?

Code
ggplot(strike_own, aes(x=as.Date(`Start date`))) +
  geom_point(aes(y=Workers),color="red") +
  scale_x_date(limits = as.Date(c("1993-01-01", "2023-01-01"))) +
  facet_wrap(vars(Ownership)) +
  labs(title="Change in size of individual strikes by number of workers",subtitle="Separated by employer ownership type; 1993 - 2023",x="Year",y="Workers")

There appears to be a decrease in larger strikes in the private sector in the late 2000s, along with a sharp increase in larger strikes in state government in the late 2010s.

References

Bureau of Labor Statistics, U.S. Department of Labor. 2023a. “Union Members Summary - 2022 A01 Results.” January 19, 2023. https://www.bls.gov/news.release/union2.nr0.htm.
———. 2023b. “Work Stoppages, Detailed Monthly Listing, 2023.” https://www.bls.gov/wsp/publications/monthly-details/XLSX/work-stoppages-2023.xlsx.
Bureau of Labor Statistics, U.S. Department of Labor. n.d.a. “Work Stoppages, Detailed Monthly Listing, 2023.”
———. n.d.b. “Work Stoppages, Detailed Monthly Listing, 2023.”
R Core Team. 2023. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, and Jennifer Bryan. 2023. “Readxl: Read Excel Files.” https://readxl.tidyverse.org.
Wickham, Hadley, and Garrett Grolemund. 2017. R for Data Science. O’Reilly. https://r4ds.had.co.nz/index.html.