Zebra finches transform random songs to exhibit linguistic laws

Mason Youngblood

Institute for Advanced Computational Science, Stony Brook University
masonyoungblood@gmail.com

Abstract

Linguistic laws are increasingly used as markers of efficiency in non-human communication, but it remains unclear how rapidly these patterns can emerge. In this re-analysis of experimental data from James and Sakata (1), I assessed whether zebra finches tutored with random songs that have flat frequency distributions and fixed lengths transform them to exhibit three linguistic laws associated with efficiency in human language. Menzerath’s law and Zipf’s rank-frequency law are present in the learned songs to a similar extent as in human language, while there is only weak support for Zipf’s law of abbreviation. These results suggest that some measures of language-like efficiency can emerge extremely rapidly, while others may require iterated social learning.

Links

#load libraries
library(openxlsx)
library(brms)
library(ggplot2)
library(cowplot)
library(gtools)
library(rphylopic)
library(ggdendro)
library(cluster)
library(flextable)
library(dplyr)
library(ftExtra)
library(officer)

#set working directory
setwd("/Users/masonyoungblood/Documents/Work/Summer 2025/zebra_finch_efficiency/docs")

1 Introduction

Efficiency in non-human communication systems is increasingly assessed using three linguistic laws found in human language (2–5). Zipf’s law of abbreviation and Menzerath’s law, thought to reflect reduced production costs, predict that common items should be shorter and that longer sequences should be composed of shorter items, respectively (6, 7), while Zipf’s rank-frequency law, thought to reflect learnability, predicts a power-law distributions of frequencies and ranks (i.e., first most common, second most common, etc.) (7). It is unclear, however, how rapidly these patterns can emerge in communication systems.

In a 2017 study, James and Sakata (1) tutored zebra finches with random songs that had flat frequency distributions and fixed lengths and found that learning biases caused convergence in syntactic features. Here, I re-analyze their experimental data to assess whether birds also transform these random songs to exhibit language-like efficiency, measured using the three linguistic laws described above.

2 Results

James and Sakata (1) tutored juvenile male zebra finches with synthesized songs composed of five common syllable types found in wild zebra finch populations—denoted as “a” to “e”—separated by short silent gaps of around five ms. Each syllable type only appeared in each song once, leading to 120 possible song types (e.g., “abcde”, “abced”, etc.) that were completely neutral with respect to the three linguistic laws being assessed (see top row of Figure 1).

After being tutored with these random songs, the zebra finches produced songs that exhibit patterns consistent with both Menzerath’s law and Zipf’s rank-frequency law—longer sequences are composed of shorter syllables (Est.: -0.157, 95% CI: -0.284—-0.033) (second column of Figure 1), and the relationship between the ranks and frequency of syllable types fits a power-law with \(R^2\) of 0.861 (95% CI: 0.569—0.911) (third column of Figure 1). There is only weak evidence for Zipf’s law of abbreviation, with a negative but uncertain relationship between frequency and syllable duration (Est.: -0.612, 95% CI: -1.422—0.299) (first column of Figure 1). More details about the models can be seen in Table 1.

Distributions for tutor songs (top) and learned songs (bottom), showing: syllable counts vs. durations (left), sequence lengths vs. syllable durations (middle), and syllable ranks vs. counts (right).

Figure 1: Distributions for tutor songs (top) and learned songs (bottom), showing: syllable counts vs. durations (left), sequence lengths vs. syllable durations (middle), and syllable ranks vs. counts (right).

Table 1: Estimated parameter values from the models used to assess Zipf’s law of abbreviation (ZLA), Menzerath’s law (ML), and Zipf’s rank-frequency law (ZRFL). L-2.5% and U-97.5% denote the lower and upper bounds of the 95% credible intervals. The R-hat values are all near 1, indicating that all models have converged.
Model	Param.	Est.	L-2.5%	U-97.5%	R-hat
ZLA	Intercept	0.034	-0.778	0.833	1.001
ZLA	Frequency	-0.612	-1.422	0.299	1.001
ML	Intercept	0.068	-0.641	0.775	1.000
ML	Seq. length	-0.157	-0.284	-0.033	1.000
ZRFL	α	1.111	1.004	1.410	1.001
ZRFL	β	2.063	1.247	3.132	1.001

The main results for both of Zipf’s laws, which make predictions about types, exclude the 30% of syllables that could not be assigned to the five common syllable types (i.e., “a”—“e”) by James and Sakata (1). Importantly, the results are qualitatively the same when novel syllables types are imputed by applying hierarchical clustering to their acoustic features, a standard approach in birdsong (4) (see Materials & Methods). For Zipf’s law of abbreviation, the effect of frequency on syllable duration is -0.278 (95% CI: -0.832—0.276) at the highest granularity (lowest cut height), -0.312 (95% CI: -0.791—0.174) at the middle granularity, and -0.307 (95% CI: -0.796—0.195) at the lowest granularity (highest cut height). The power-law also fits the rank-frequency distribution to a similar extent at all three granularities (highest: \(R^2\) = 0.911, middle: \(R^2\) = 0.919, lowest: \(R^2\) = 0.933).

3 Discussion

Zebra finches that are tutored with random songs transform them to exhibit both Menzerath’s law and Zipf’s rank-frequency law. The effect size for Menzerath’s law, as well as the \(R^2\) for Zipf’s rank-frequency law, are squarely within the ranges observed in human language (5, 8). However, there is only weak evidence for Zipf’s law of abbreviation, which has a wide credible interval that overlaps zero and an effect size that only borders the weaker end of the range observed in human language (5).

This discrepancy—strong support for only two of the three laws—suggests that they may have different origins. Menzerath’s law, for example, may be primarily physical—a previous study found that experimentally-deafened zebra finches and canaries exhibit the law (2), and it is stronger in spoken human language than it is in writing (9). Zipf’s law of abbreviation, on the other hand, may result from iterated social learning. The strength of the law increased over iterations in an artificial language experiment (10), and recent modeling work suggests that pressure for brevity over many rounds of social learning can produce it (11). Another study showed that when zebra finches raised in isolation act as the “seeds” for new song lineages, the duration of some syllable types decreases over several generations (12). That being said, Zipf’s rank-frequency law is also linked to social learning—in that it is thought to enhance learnability (3) and increased over iterations in an artificial language experiment (13)—and yet it appears alongside Menzerath’s law. Future studies could use iterated learning studies to illuminate the origins of these three linguistic laws in non-human animal communication.

4 Materials & Methods

The data analyzed in this study were collected by James and Sakata (1) for an experiment on how learning biases impact vocal sequencing. Juvenile male zebra finches (n = 51) were raised with their fathers for up to five days post hatch (dph), and then with their mother or a foster mother until reaching nutritional independence (~35-50 dph). At that point, the juveniles were housed individually and tutored with synthesized songs for at least 30 days. After tutoring, James and Sakata (1) recorded and analyzed spontaneously produced songs with stereotyped motifs (n = 45 songs, containing a total of n = 200 syllables).

#load data
data <- read.xlsx("data/james_sakata_2017.xlsx")

James and Sakata (1) found that 30% of syllables were novel and did not appear in the tutor songs. The main analysis excludes these novel syllables, but I conducted supplementary analyses to ensure that their exclusion did not impact the main conclusions. To impute types for these novel syllables, I applied Ward’s hierarchical clustering algorithm to four features—the identity of the singer, the duration, the mean frequency, and the mean amplitude—using Gower’s distance to account for the inclusion of both categorical (singer identity) and continuous variables. Then, I used three cut heights to generate clusters at three levels of granularity for downstream analysis: 0.1 (low), 0.3 (medium), and 0.5 (high) (see Figure 2).

#apply hierarchical clustering to features of novel syllables, alongside identity of birds
clust_feats <- scale(data[which(data$SyllableID == "n"), 5:7])
clust_feats <- cbind(data.frame(clust_feats), factor(data[which(data$SyllableID == "n"), ]$BirdID))
dist_mat <- daisy(clust_feats, metric = "gower")
heir_clust <- hclust(dist_mat, method = "ward.D2")
dendro <- dendro_data(heir_clust, type = "rectangle")

#get clusters at cut heights and add to data
clust <- data.frame(cutree(heir_clust, h = c(0.1, 0.3, 0.5)))
clust[, 1] <- as.character(clust[, 1])
clust[, 2] <- as.character(clust[, 2])
clust[, 3] <- as.character(clust[, 3])
data$syl_id_imp_lo <- data$SyllableID
data$syl_id_imp_md <- data$SyllableID
data$syl_id_imp_hi <- data$SyllableID
data$syl_id_imp_lo[which(data$SyllableID == "n")] <- clust[, 1]
data$syl_id_imp_md[which(data$SyllableID == "n")] <- clust[, 2]
data$syl_id_imp_hi[which(data$SyllableID == "n")] <- clust[, 3]

#check cut heights
png("plots/cut_heights.png", width = 2.4, height = 2, units = "in", res = 1200)
ggplot(segment(dendro)) + 
  geom_hline(yintercept = 0.1, color = "#0072B2") + 
  geom_hline(yintercept = 0.3, color = "#D55E00") + 
  geom_hline(yintercept = 0.5, color = "#009E73") + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) +
  xlab("Novel syllables") + 
  ylab("Cut height") + 
  theme_linedraw(base_family = "Avenir") + 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
dev.off()

The results of hierarchical clustering applied to the singer identity and acoustic features of the novel syllables, with the three cut heights used to identify clusters.

Figure 2: The results of hierarchical clustering applied to the singer identity and acoustic features of the novel syllables, with the three cut heights used to identify clusters.

#add sequence id
seq_id <- c(1)
seq <- 1
for(i in 2:nrow(data)){
  if(data$Position[i - 1] == "end"){
    seq <- seq + 1
  }
  seq_id <- c(seq_id, seq)
}
data$seq_id <- seq_id

#reformat and un-log duration
data <- data[, c(1, 11, 3, 4, 8, 9, 10, 6)]
colnames(data) <- c("ind", "seq_id", "pos", "syl_id", "syl_id_imp_lo", "syl_id_imp_md", "syl_id_imp_hi", "dur")
data$dur <- 10^(data$dur)

#store median syllable durations
syl_durs <- c(
  round(median(data$dur[which(data$syl_id == "a")])),
  round(median(data$dur[which(data$syl_id == "b")])),
  round(median(data$dur[which(data$syl_id == "c")])),
  round(median(data$dur[which(data$syl_id == "d")])),
  round(median(data$dur[which(data$syl_id == "e")]))
)

#get permuted sequences as per james and sakata 2017
rand_seqs <- permutations(n = 5, r = 5, v = 1:5)
rand_seqs <- unlist(lapply(1:120, function(x){rand_seqs[x, ]}))

#combine into data frame matching real data
rand_data <- data.frame(
  ind = NA, 
  seq_id = rep(1:120, each = 5), 
  pos = rep(1:5, 120), 
  syl_id = c("a", "b", "c", "d", "e")[rand_seqs], 
  dur = syl_durs[rand_seqs],
  freq = 120,
  len = 5
)
rand_data$freq <- rand_data$freq/sum(rand_data$freq)

#add frequencies and ranks of syllables
freqs <- data.frame(table(data$syl_id))
colnames(freqs) <- c("syl_id", "freq")
freqs$rank <- c(6-rank(freqs$freq[1:5]), NA)
freqs$freq[6] <- NA
freqs$freq <- freqs$freq/sum(freqs$freq, na.rm = TRUE)
data$freq <- freqs$freq[match(data$syl_id, freqs$syl_id)]
data$freq[which(data$syl_id == "n")] <- NA
data$rank <- freqs$rank[match(data$syl_id, freqs$syl_id)]

#add frequencies and ranks of imputed syllables
freqs_imp_lo <- data.frame(table(data$syl_id_imp_lo))
freqs_imp_md <- data.frame(table(data$syl_id_imp_md))
freqs_imp_hi <- data.frame(table(data$syl_id_imp_hi))
colnames(freqs_imp_lo) <- c("syl_id", "freq")
colnames(freqs_imp_md) <- c("syl_id", "freq")
colnames(freqs_imp_hi) <- c("syl_id", "freq")
freqs_imp_lo$rank <- nrow(freqs_imp_lo)+1-rank(freqs_imp_lo$freq, ties.method = "first")
freqs_imp_md$rank <- nrow(freqs_imp_md)+1-rank(freqs_imp_md$freq, ties.method = "first")
freqs_imp_hi$rank <- nrow(freqs_imp_hi)+1-rank(freqs_imp_hi$freq, ties.method = "first")
freqs_imp_lo$freq <- freqs_imp_lo$freq/sum(freqs_imp_lo$freq)
freqs_imp_md$freq <- freqs_imp_md$freq/sum(freqs_imp_md$freq)
freqs_imp_hi$freq <- freqs_imp_hi$freq/sum(freqs_imp_hi$freq)
data$freq_imp_lo <- freqs_imp_lo$freq[match(data$syl_id_imp_lo, freqs_imp_lo$syl_id)]
data$freq_imp_md <- freqs_imp_md$freq[match(data$syl_id_imp_md, freqs_imp_md$syl_id)]
data$freq_imp_hi <- freqs_imp_hi$freq[match(data$syl_id_imp_hi, freqs_imp_hi$syl_id)]
data$rank_imp_lo <- freqs_imp_lo$rank[match(data$syl_id_imp_lo, freqs_imp_lo$syl_id)]
data$rank_imp_md <- freqs_imp_md$rank[match(data$syl_id_imp_md, freqs_imp_md$syl_id)]
data$rank_imp_hi <- freqs_imp_hi$rank[match(data$syl_id_imp_hi, freqs_imp_hi$syl_id)]

#add lengths of sequences (to both datasets)
lens <- data.frame(table(data$seq_id))
colnames(lens) <- c("seq_id", "len")
data$len <- lens$len[match(data$seq_id, lens$seq_id)]

Bayesian statistical modeling was conducted in Stan using the brms package in R, using slight modifications of the specifications from Youngblood (4, 5). All models were run for 100,000 iterations across 10 chains of Markov chain Monte Carlo with z-scored variables. Zipf’s law of abbreviation was assessed with a lognormal model with syllable duration as the outcome variable and frequency as the predictor variable. Menzerath’s law was assessed with a lognormal model with syllable duration as the outcome variable and sequence length as the predictor variable. Both of these models included the same three varying intercepts—sequence nested within individual, and syllable type—as well as the same priors—Gaussian distributions with \(\mu = 0\) and \(\sigma = 1\) for the intercepts, effects, and \(\sigma\)s of the posteriors and error distributions (with lower bounds of 0 for the \(\sigma\)s). Zipf’s rank-frequency law was assessed with a nonlinear model based on Mandelbrot’s generalization (8): \(\text{frequency} = 1/(\text{rank} - \beta)^\alpha\). The priors for the rank-frequency model were a Gaussian distribution with \(\mu = 1\), \(\sigma = 1\), and a lower bound of 1 for \(\alpha\), a Gaussian distribution with \(\mu = 2.7\), \(\sigma = 1\), and a lower bound of -1 for \(\beta\), and a Gaussian distribution with \(\mu = 0\), \(\sigma = 1\), and a lower bound of 0 for the \(\sigma\) of the error distribution (8, 14).

#set priors for zipf's brevity and menzerath's laws
brev_menz_prior <- prior(normal(0, 1), class = "Intercept") + 
  prior(normal(0, 1), class = "b") + 
  prior(normal(0, 1), lb = 0, class = "sd") + 
  prior(normal(0, 1), lb = 0, class = "sigma")

#run model of zipf's brevity law
brev_model <- brm(scale(log(dur)) ~ scale(freq) + (1|ind/seq_id) + (1|syl_id),
                  data[which(data$syl_id != "n"), ], prior = brev_menz_prior,
                  chains = 10, iter = 10000, cores = 10, seed = 12345)
save(brev_model, file = "models/brev_model.RData")

#run model of zipf's brevity law on imputed clusters
brev_model_imp_lo <- brm(scale(log(dur)) ~ scale(freq_imp_lo) + (1|ind/seq_id) + (1|syl_id_imp_lo),
                         data, prior = brev_menz_prior,
                         chains = 10, iter = 10000, cores = 10, seed = 12345)
brev_model_imp_md <- brm(scale(log(dur)) ~ scale(freq_imp_md) + (1|ind/seq_id) + (1|syl_id_imp_md),
                         data, prior = brev_menz_prior,
                         chains = 10, iter = 10000, cores = 10, seed = 12345)
brev_model_imp_hi <- brm(scale(log(dur)) ~ scale(freq_imp_hi) + (1|ind/seq_id) + (1|syl_id_imp_hi),
                         data, prior = brev_menz_prior,
                         chains = 10, iter = 10000, cores = 10, seed = 12345)
save(brev_model_imp_lo, file = "models/brev_model_imp_lo.RData")
save(brev_model_imp_md, file = "models/brev_model_imp_md.RData")
save(brev_model_imp_hi, file = "models/brev_model_imp_hi.RData")

#run model of menzerath's law
menz_model <- brm(scale(log(dur)) ~ scale(len) + (1|ind/seq_id) + (1|syl_id),
                  data, prior = brev_menz_prior,
                  chains = 10, iter = 10000, cores = 10, seed = 12345)
save(menz_model, file = "models/menz_model.RData")

#set priors for zipf's rank-frequency law
#means from 10.3758/s13423-014-0585-6
#lower bounds from 10.1007/s11192-006-0052-x
rank_freq_prior <- prior(normal(1, 1), lb = 1, nlpar = "a") + 
  prior(normal(2.7, 1), lb = -1, nlpar = "B") + 
  prior(normal(0, 1), lb = 0, class = "sigma")

#run model of zipf's rank-frequency law
rank_freq_model <- brm(bf(freq ~ 1/((rank + B)^a), a + B ~ 1, nl = TRUE),
                       unique(data[, which(colnames(data) %in% c("freq", "rank"))])[-6, ],
                       prior = rank_freq_prior,
                       chains = 10, iter = 10000, cores = 10, seed = 12345)
save(rank_freq_model, file = "models/rank_freq_model.RData")

#run model of zipf's rank-frequency law on imputed clusters
rank_freq_model_imp_lo <- brm(bf(freq_imp_lo ~ 1/((rank_imp_lo + B)^a), a + B ~ 1, nl = TRUE),
                              unique(data[, which(colnames(data) %in% c("freq_imp_lo", "rank_imp_lo"))]),
                              prior = rank_freq_prior,
                              chains = 10, iter = 10000, cores = 10, seed = 12345)
rank_freq_model_imp_md <- brm(bf(freq_imp_md ~ 1/((rank_imp_md + B)^a), a + B ~ 1, nl = TRUE),
                              unique(data[, which(colnames(data) %in% c("freq_imp_md", "rank_imp_md"))]),
                              prior = rank_freq_prior,
                              chains = 10, iter = 10000, cores = 10, seed = 12345)
rank_freq_model_imp_hi <- brm(bf(freq_imp_hi ~ 1/((rank_imp_hi + B)^a), a + B ~ 1, nl = TRUE),
                              unique(data[, which(colnames(data) %in% c("freq_imp_hi", "rank_imp_hi"))]),
                              prior = rank_freq_prior,
                              chains = 10, iter = 10000, cores = 10, seed = 12345)
save(rank_freq_model_imp_lo, file = "models/rank_freq_model_imp_lo.RData")
save(rank_freq_model_imp_md, file = "models/rank_freq_model_imp_md.RData")
save(rank_freq_model_imp_hi, file = "models/rank_freq_model_imp_hi.RData")

#get best fit data for brevity
freq_vals <- seq(round(min(data[which(data$syl_id != "n"), ]$freq), 3), round(max(data[which(data$syl_id != "n"), ]$freq), 3), 0.001)
dur_vals_scaled <- fitted(brev_model, data.frame(freq = freq_vals), re_formula = NA, summary = TRUE)
dur_vals <- exp(dur_vals_scaled*sd(log(data[which(data$syl_id != "n"), ]$dur)) + mean(log(data[which(data$syl_id != "n"), ]$dur)))
brev_fit <- data.frame(dur = dur_vals[, 1], freq = freq_vals, low = dur_vals[, 3], upp = dur_vals[, 4])

#plot brevity (on both datasets)
brev_plot <- ggplot() + 
  geom_ribbon(data = brev_fit, aes(x = freq, ymin = low, ymax = upp), fill = "#0072B2", alpha = 0.2) + 
  geom_point(data = data[which(data$syl_id != "n"), ], aes(x = freq, y = dur), color = "#0072B2") + 
  geom_line(data = brev_fit, aes(x = freq, y = dur), color = "#0072B2") +
  ylab("Syllable duration (ms)") + 
  xlab("Frequency") + 
  coord_cartesian(ylim = c(25, 205)) + 
  theme_linedraw(base_family = "Avenir") + 
  theme(axis.text.y = element_text(angle = 90, hjust = 1))
rand_brev_plot <- ggplot() + 
  geom_jitter(data = rand_data, aes(x = freq, y = dur), color = "#0072B2", width = 0.00005, height = 3) + 
  geom_vline(xintercept = rand_data$freq[1], color = "#0072B2") + 
  ylab("Syllable duration (ms)") + 
  xlab("Frequency") + 
  coord_cartesian(xlim = c(0, 0.0032), ylim = c(25, 170)) + 
  theme_linedraw(base_family = "Avenir") + 
  theme(axis.text.y = element_text(angle = 90, hjust = 1))

#get best fit data for menzerath
len_vals <- seq(round(min(data$len), 1), round(max(data$len), 1), 0.1)
dur_vals_scaled <- fitted(menz_model, data.frame(len = len_vals), re_formula = NA, summary = TRUE)
dur_vals <- exp(dur_vals_scaled*sd(log(data[which(data$syl_id != "n"), ]$dur)) + mean(log(data[which(data$syl_id != "n"), ]$dur)))
menz_fit <- data.frame(dur = dur_vals[, 1], len = len_vals, low = dur_vals[, 3], upp = dur_vals[, 4])

#plot menzerath (on both datasets)
menz_plot <- ggplot() + 
  geom_ribbon(data = menz_fit, aes(x = len, ymin = low, ymax = upp), fill = "#D55E00", alpha = 0.2) + 
  geom_point(data = data, aes(x = len, y = dur), color = "#D55E00") + 
  geom_line(data = menz_fit, aes(x = len, y = dur), color = "#D55E00") +
  xlab("Sequence length") + 
  ylab("Syllable duration (ms)") + 
  theme_linedraw(base_family = "Avenir") + 
  theme(axis.text.y = element_text(angle = 90, hjust = 1))
rand_menz_plot <- ggplot() + 
  geom_jitter(data = rand_data, aes(x = len, y = dur), color = "#D55E00", width = 0.1, height = 3) + 
  geom_vline(xintercept = 5, color = "#D55E00") + 
  xlab("Sequence length") + 
  ylab("Syllable duration (ms)") + 
  xlim(2, 8) + 
  theme_linedraw(base_family = "Avenir") + 
  theme(axis.text.y = element_text(angle = 90, hjust = 1))

#get best fit for rank-frequency law
rank_vals <- seq(1, 5, 0.1)
freq_vals <- fitted(rank_freq_model, data.frame(rank = rank_vals))
rank_freq_fit <- data.frame(freq = freq_vals[, 1], rank = rank_vals, low = freq_vals[, 3], upp = freq_vals[, 4])
rand_rank_freq_data <- data.frame(freq = 120, rank = 1:5)
rand_rank_freq_data$freq <- rand_rank_freq_data$freq/sum(rand_rank_freq_data$freq)

#plot rank-frequency
rank_freq_plot <- ggplot() + 
  geom_ribbon(data = rank_freq_fit, aes(x = rank, ymin = low, ymax = upp), fill = "#009E73", alpha = 0.2) + 
  geom_point(data = unique(data[, which(colnames(data) %in% c("freq", "rank"))])[-6, ], aes(x = rank, y = freq), color = "#009E73") + 
  geom_line(data = rank_freq_fit, aes(x = rank, y = freq), color = "#009E73") +
  xlab("Rank") + 
  ylab("Frequency") + 
  scale_x_log10() + 
  scale_y_log10() + 
  theme_linedraw(base_family = "Avenir") + 
  theme(axis.text.y = element_text(angle = 90, hjust = 1))
rand_rank_freq_plot <- ggplot() + 
  geom_point(data = rand_rank_freq_data, aes(x = rank, y = freq), color = "#009E73") + 
  geom_hline(yintercept = rand_rank_freq_data$freq[1], color = "#009E73") + 
  xlab("Rank") + 
  ylab("Frequency") + 
  scale_x_log10() + 
  scale_y_log10() + 
  coord_cartesian(ylim = c(0.03, 1)) + 
  theme_linedraw(base_family = "Avenir") + 
  theme(axis.text.y = element_text(angle = 90, hjust = 1))

#create top and bottom rows
top_row <- plot_grid(rand_brev_plot, rand_menz_plot, rand_rank_freq_plot, nrow = 1, labels = c("a", "b", "c"), label_fontfamily = "Avenir Next", label_fontface = "bold")
bottom_row <- plot_grid(brev_plot, menz_plot, rank_freq_plot, nrow = 1, labels = c("d", "e", "f"), label_fontfamily = "Avenir Next", label_fontface = "bold")

#create titles
top_row_w_title <- plot_grid(
  ggdraw() + draw_label("Distributions in tutor songs", fontfamily = "Avenir Next", fontface = "bold", size = 14, x = 0, hjust = 0), top_row,
  ncol = 1, rel_heights = c(0.1, 1)
)
bottom_row_w_title <- plot_grid(
  ggdraw() + draw_label("Distributions in learned songs", fontfamily = "Avenir Next", fontface = "bold", size = 14, x = 0, hjust = 0), bottom_row,
  ncol = 1, rel_heights = c(0.1, 1)
)
column_titles <- plot_grid(     
  ggdraw() + draw_label("       Zipf's Law of Abbrev.", fontfamily = "Avenir Next", fontface = "bold", color = "#0072B2"),
  ggdraw() + draw_label("       Menzerath's Law", fontfamily = "Avenir Next", fontface = "bold", color = "#D55E00"),
  ggdraw() + draw_label("       Zipf's Rank-Freq. Law", fontfamily = "Avenir Next", fontface = "bold", color = "#009E73"),
  nrow = 1
)

#combine elements
main_plot <- plot_grid(top_row_w_title, bottom_row_w_title, ncol = 1)
final_plot <- plot_grid(main_plot, column_titles, ncol = 1, rel_heights = c(1, 0.04))

#get finch image
finch <- get_phylopic(uuid = "7bf8ca33-0fef-4985-a4d4-a147bff6e98c")

#save plot
png("plots/distributions.png", width = 8, height = 5.4, units = "in", res = 1200)
plot(1)
final_plot
add_phylopic_base(finch, fill = "#0072B2", height = 0.08, x = 1.44, y = 1.585)
dev.off()

Data & Code Availability

The data for this study can be found in the repository of James and Sakata (1) (https://doi.org/10.17632/sywzmd483f.1), and the code is available on GitHub (https://github.com/masonyoungblood/zebra_finch_efficiency).

Acknowledgments

I would like to thank Logan S. James and Jon T. Sakata for providing open data for their 2017 study.

References

L. S. James, J. T. Sakata, Learning biases underlie “universals” in avian vocal sequencing. Current Biology 27, 3676–3682 (2017).

L. S. James, C. Mori, K. Wada, J. T. Sakata, Phylogeny and mechanisms of shared hierarchical patterns in birdsong. Current Biology 31, 2796–2808.e9 (2021).

I. Arnon, et al., Whale song shows language-like statistical structure. Science 387, 649–653 (2025).

M. Youngblood, Language-like efficiency and structure in house finch song. Proc. R. Soc. B. 291, 20240250 (2024).

M. Youngblood, Language-like efficiency in whale communication. Science Advances 11, eads6014 (2025).

P. Menzerath, Die Architektonik des Deutschen Wortschatzes (Dümmler, 1954).

G. Zipf, Human Behavior and the Principle of Least Effort: An Introducton to Human Ecology (Addison-Wesley, 1949).

S. T. Piantadosi, Zipf’s word frequency law in natural language: A critical review and future directions. Psychon Bull Rev 21, 1112–1130 (2014).

I. G. Torre, B. Luque, L. Lacasa, C. T. Kello, A. Hernández-Fernández, On the physical origin of linguistic laws and lognormality in speech. R. Soc. open sci. 6, 191023 (2019).

10.

J. Kanwal, K. Smith, J. Culbertson, S. Kirby, Zipf’s law of abbreviation and the principle of least effort: language users optimise a miniature lexicon for efficient communication. Cognition 165, 45–52 (2017).

11.

O. Morin, A. Koshevoy, A cultural evolutionary model for the law of abbreviation. Topics in Cognitive Science (2025). https://doi.org/10.1111/tops.12782.

12.

O. Fehér, H. Wang, S. Saar, P. P. Mitra, O. Tchernichovski, De novo establishment of wild-type song culture in the zebra finch. Nature 459, 564–568 (2009).

13.

I. Arnon, S. Kirby, Cultural evolution creates the statistical structure of language. Sci. Rep. 14, 5255 (2024).

14.

J. Izsák, Some practical aspects of fitting and testing the Zipf-Mandelbrot model: a short essay. Scientometrics 67, 107–120 (2006).