Figure 1: The distribution of element durations and inter-onset intervals from the whale vocal sequences included in this analysis. The times are z-scored within each study to enable direct comparison.
To assess whether the strength of Menzerath’s law in human language is similar when computed from element durations and inter-onset intervals, I conducted a supplementary analysis with a corpus of spoken language data that was collected to study efficiency in information encoding (1). DoReCo could not be used for this supplementary analysis because it includes the short gaps between phonemes in its measurements (2), making it impossible to analyze element durations and inter-onset intervals separately without a reanalysis of the raw data. Coupe et al.’s dataset is composed of 2,288 recordings of speech, ranging from three to five sentences in length, and representing 17 different languages (1). For each recording, they collected the number of syllables, the overall duration including vocalizations and silence, and the duration only including silence, making it possible to compute the average element duration and inter-onset interval of each recording.
Each sequence in Coupe et al.’s data is a set of sentences from a single speaker (1). Analyzing Menzerath’s law in utterances above the sentence-level is unusual but has precedent (3, 4). It would be ideal to conduct this analysis at the same level of the linguistic hierarchy as DoReCo (e.g., phonemes within words, words within sentences.), but Coupe et al.’s dataset is the only one I found that separates phonation from silence (1).
For each sequence, I analyzed (1) the length in syllables, (2) the average duration of syllables, and (3) the average inter-onset interval between syllables. These data were analyzed using the same linear model as in the main text, excluding the varying intercept because each point corresponds to the average duration/interval within a sequence:
\[\begin{equation} \ln(\textrm{duration}) \sim \ln(\textrm{length}) \tag{1} \end{equation}\]
The results indicate that the strength of Menzerath’s law for syllables in series of sentences is quite similar when computed from element durations (estimate = -0.216, 95% CI: [-0.256, -0.176]) and inter-onset intervals (estimate = -0.292, 95% CI: [-0.331, -0.253]).
Group | Species | Type | Effect | 2.5% | 97.5% |
|---|---|---|---|---|---|
Mysticete | Blue Whale | Elements | -0.255 | -0.331 | -0.178 |
Bowhead Whale | Elements | -0.184 | -0.318 | -0.051 | |
Common Minke Whale | Elements | -0.278 | -0.294 | -0.261 | |
Humpback Whale | Intervals | -0.678 | -0.692 | -0.665 | |
North Pacific Right Whale | Elements | 0.309 | 0.278 | 0.340 | |
Sei Whale | Elements | -0.194 | -0.329 | -0.059 | |
Odontocete | Bottlenose Dolphin | Intervals | -0.242 | -0.347 | -0.138 |
Commerson's Dolphin | Intervals | 0.221 | 0.087 | 0.356 | |
Heaviside's Dolphin | Intervals | -0.119 | -0.320 | 0.083 | |
Hector's Dolphin | Intervals | -0.008 | -0.274 | 0.258 | |
Killer Whale | Elements | 0.121 | 0.003 | 0.239 | |
Narrow-Ridged Finless Porpoise | Intervals | -0.304 | -0.338 | -0.270 | |
Peale's Dolphin | Intervals | -0.333 | -0.489 | -0.177 | |
Risso's Dolphin | Intervals | -0.420 | -0.448 | -0.392 | |
Sperm Whale | Intervals | -0.234 | -0.241 | -0.226 |
Length | Position | |||||||
|---|---|---|---|---|---|---|---|---|
Group | Species | Type | Effect | 2.5% | 97.5% | Effect | 2.5% | 97.5% |
Mysticete | Blue Whale | Elements | -0.255 | -0.331 | -0.178 | -0.064 | -0.087 | -0.041 |
Bowhead Whale | Elements | -0.179 | -0.313 | -0.046 | -0.751 | -0.789 | -0.713 | |
Common Minke Whale | Elements | -0.278 | -0.294 | -0.261 | -0.017 | -0.023 | -0.011 | |
Humpback Whale | Intervals | -0.678 | -0.691 | -0.665 | -0.193 | -0.201 | -0.186 | |
North Pacific Right Whale | Elements | 0.309 | 0.278 | 0.340 | 0.107 | 0.096 | 0.119 | |
Sei Whale | Elements | -0.194 | -0.329 | -0.058 | -0.101 | -0.132 | -0.070 | |
Odontocete | Bottlenose Dolphin | Intervals | -0.242 | -0.346 | -0.138 | -0.084 | -0.111 | -0.056 |
Commerson's Dolphin | Intervals | 0.221 | 0.087 | 0.356 | -0.106 | -0.118 | -0.095 | |
Heaviside's Dolphin | Intervals | -0.119 | -0.320 | 0.083 | 0.019 | 0.010 | 0.027 | |
Hector's Dolphin | Intervals | -0.008 | -0.274 | 0.258 | -0.001 | -0.010 | 0.008 | |
Killer Whale | Elements | 0.121 | 0.021 | 0.221 | 0.528 | 0.428 | 0.628 | |
Narrow-Ridged Finless Porpoise | Intervals | -0.305 | -0.339 | -0.271 | 0.168 | 0.151 | 0.185 | |
Peale's Dolphin | Intervals | -0.333 | -0.489 | -0.177 | -0.013 | -0.017 | -0.009 | |
Risso's Dolphin | Intervals | -0.420 | -0.448 | -0.392 | -0.196 | -0.200 | -0.192 | |
Sperm Whale | Intervals | -0.234 | -0.241 | -0.226 | 0.028 | 0.026 | 0.031 | |
Language | Effect | 2.5% | 97.5% |
|---|---|---|---|
Anal | -0.104 | -0.113 | -0.095 |
Arapaho | 0.030 | 0.020 | 0.040 |
Asimjeeg Datooga | -0.063 | -0.073 | -0.053 |
Baïnounk Gubëeher | -0.102 | -0.110 | -0.093 |
Beja | -0.066 | -0.073 | -0.059 |
Bora | -0.127 | -0.138 | -0.116 |
Cabécar | -0.110 | -0.120 | -0.100 |
Cashinahua | -0.100 | -0.108 | -0.091 |
Daakie | -0.131 | -0.141 | -0.121 |
Dalabon | -0.079 | -0.091 | -0.066 |
Dolgan | -0.130 | -0.139 | -0.121 |
English (Southern England) | -0.053 | -0.066 | -0.040 |
Evenki | -0.101 | -0.109 | -0.092 |
Fanbyak | -0.091 | -0.101 | -0.080 |
French (Swiss) | -0.050 | -0.063 | -0.037 |
Goemai | -0.124 | -0.138 | -0.110 |
Gorwaa | -0.125 | -0.136 | -0.114 |
Hoocąk | -0.099 | -0.109 | -0.090 |
Jahai | -0.062 | -0.073 | -0.050 |
Jejuan | -0.093 | -0.104 | -0.083 |
Kakabe | -0.123 | -0.135 | -0.111 |
Kamas | -0.096 | -0.111 | -0.082 |
Komnzo | -0.081 | -0.091 | -0.071 |
Light Warlpiri | -0.144 | -0.154 | -0.133 |
Lower Sorbian | -0.078 | -0.089 | -0.067 |
Mojeño Trinitario | -0.147 | -0.156 | -0.137 |
Movima | -0.014 | -0.023 | -0.006 |
Nafsan (South Efate) | -0.072 | -0.082 | -0.062 |
Nisvai | -0.064 | -0.074 | -0.053 |
Nllng | -0.098 | -0.111 | -0.086 |
Northern Alta | -0.069 | -0.079 | -0.059 |
Northern Kurdish (Kurmanji) | -0.022 | -0.033 | -0.011 |
Pnar | -0.021 | -0.035 | -0.007 |
Resígaro | -0.079 | -0.089 | -0.069 |
Ruuli | -0.039 | -0.048 | -0.030 |
Sadu | -0.124 | -0.136 | -0.112 |
Sanzhi Dargwa | -0.135 | -0.148 | -0.122 |
Savosavo | -0.052 | -0.062 | -0.042 |
Sümi | -0.090 | -0.101 | -0.079 |
Svan | -0.066 | -0.074 | -0.057 |
Tabaq (Karko) | -0.213 | -0.222 | -0.203 |
Tabasaran | -0.138 | -0.151 | -0.125 |
Teop | -0.170 | -0.181 | -0.159 |
Texistepec Popoluca | -0.070 | -0.081 | -0.059 |
Urum | -0.126 | -0.134 | -0.118 |
Vera'a | -0.152 | -0.163 | -0.141 |
Warlpiri | -0.147 | -0.157 | -0.137 |
Yali (Apahapsili) | -0.191 | -0.202 | -0.179 |
Yongning Na | -0.128 | -0.142 | -0.114 |
Yucatec Maya | -0.067 | -0.079 | -0.056 |
Yurakaré | -0.198 | -0.204 | -0.191 |
Length | Position | |||||
|---|---|---|---|---|---|---|
Language | Effect | 2.5% | 97.5% | Effect | 2.5% | 97.5% |
Anal | -0.105 | -0.114 | -0.096 | 0.100 | 0.091 | 0.109 |
Arapaho | 0.030 | 0.020 | 0.039 | 0.099 | 0.090 | 0.109 |
Asimjeeg Datooga | -0.063 | -0.074 | -0.053 | 0.186 | 0.177 | 0.196 |
Baïnounk Gubëeher | -0.102 | -0.110 | -0.093 | 0.096 | 0.088 | 0.104 |
Beja | -0.066 | -0.073 | -0.059 | 0.065 | 0.058 | 0.072 |
Bora | -0.127 | -0.138 | -0.116 | -0.131 | -0.140 | -0.123 |
Cabécar | -0.110 | -0.120 | -0.100 | 0.021 | 0.012 | 0.031 |
Cashinahua | -0.100 | -0.108 | -0.091 | -0.019 | -0.028 | -0.010 |
Daakie | -0.131 | -0.141 | -0.122 | 0.164 | 0.155 | 0.173 |
Dalabon | -0.079 | -0.091 | -0.066 | 0.165 | 0.153 | 0.177 |
Dolgan | -0.130 | -0.139 | -0.121 | 0.043 | 0.034 | 0.052 |
English (Southern England) | -0.053 | -0.066 | -0.040 | 0.050 | 0.038 | 0.062 |
Evenki | -0.101 | -0.109 | -0.092 | 0.042 | 0.033 | 0.050 |
Fanbyak | -0.091 | -0.101 | -0.081 | 0.161 | 0.151 | 0.171 |
French (Swiss) | -0.050 | -0.063 | -0.037 | 0.160 | 0.151 | 0.169 |
Goemai | -0.124 | -0.138 | -0.110 | 0.063 | 0.052 | 0.074 |
Gorwaa | -0.125 | -0.136 | -0.114 | 0.009 | 0.000 | 0.018 |
Hoocąk | -0.099 | -0.109 | -0.090 | 0.141 | 0.132 | 0.151 |
Jahai | -0.062 | -0.073 | -0.050 | 0.142 | 0.131 | 0.153 |
Jejuan | -0.093 | -0.104 | -0.083 | 0.038 | 0.028 | 0.049 |
Kakabe | -0.123 | -0.135 | -0.111 | 0.103 | 0.091 | 0.115 |
Kamas | -0.096 | -0.111 | -0.082 | 0.003 | -0.012 | 0.017 |
Komnzo | -0.081 | -0.091 | -0.071 | 0.026 | 0.018 | 0.035 |
Light Warlpiri | -0.144 | -0.154 | -0.133 | 0.078 | 0.067 | 0.088 |
Lower Sorbian | -0.078 | -0.089 | -0.067 | 0.046 | 0.037 | 0.056 |
Mojeño Trinitario | -0.147 | -0.156 | -0.137 | -0.074 | -0.083 | -0.065 |
Movima | -0.014 | -0.023 | -0.006 | -0.053 | -0.062 | -0.045 |
Nafsan (South Efate) | -0.072 | -0.082 | -0.062 | 0.094 | 0.085 | 0.103 |
Nisvai | -0.064 | -0.074 | -0.054 | 0.151 | 0.144 | 0.159 |
Nllng | -0.098 | -0.111 | -0.086 | 0.155 | 0.142 | 0.167 |
Northern Alta | -0.069 | -0.079 | -0.059 | 0.091 | 0.081 | 0.101 |
Northern Kurdish (Kurmanji) | -0.022 | -0.033 | -0.011 | 0.033 | 0.023 | 0.042 |
Pnar | -0.021 | -0.035 | -0.007 | 0.087 | 0.076 | 0.099 |
Resígaro | -0.079 | -0.089 | -0.069 | 0.012 | 0.003 | 0.022 |
Ruuli | -0.039 | -0.048 | -0.030 | 0.069 | 0.061 | 0.078 |
Sadu | -0.124 | -0.136 | -0.112 | 0.200 | 0.189 | 0.212 |
Sanzhi Dargwa | -0.135 | -0.148 | -0.122 | 0.012 | 0.000 | 0.023 |
Savosavo | -0.052 | -0.062 | -0.042 | -0.125 | -0.134 | -0.117 |
Sümi | -0.090 | -0.101 | -0.080 | 0.109 | 0.098 | 0.120 |
Svan | -0.066 | -0.074 | -0.057 | -0.026 | -0.035 | -0.017 |
Tabaq (Karko) | -0.213 | -0.223 | -0.203 | -0.071 | -0.080 | -0.061 |
Tabasaran | -0.138 | -0.151 | -0.125 | -0.025 | -0.037 | -0.012 |
Teop | -0.170 | -0.181 | -0.159 | 0.072 | 0.063 | 0.081 |
Texistepec Popoluca | -0.070 | -0.081 | -0.059 | 0.020 | 0.011 | 0.030 |
Urum | -0.126 | -0.134 | -0.118 | -0.007 | -0.015 | 0.001 |
Vera'a | -0.152 | -0.163 | -0.141 | 0.169 | 0.160 | 0.177 |
Warlpiri | -0.147 | -0.157 | -0.137 | -0.026 | -0.035 | -0.017 |
Yali (Apahapsili) | -0.192 | -0.203 | -0.180 | 0.140 | 0.130 | 0.150 |
Yongning Na | -0.128 | -0.142 | -0.114 | 0.098 | 0.085 | 0.112 |
Yucatec Maya | -0.067 | -0.079 | -0.056 | -0.075 | -0.084 | -0.066 |
Yurakaré | -0.198 | -0.204 | -0.191 | -0.035 | -0.040 | -0.029 |
Figure 2: The 95% confidence intervals for the effect of sequence length (top) and position (bottom) on element durations and inter-onset intervals for the 16 whale species and 51 human languages. The human language data are comprised of words within sentences. The colors correspond to the taxonomic group and whether the data are element durations (ED) or inter-onset intervals (IOI).
James et al. (5) recently found that Menzerath’s law can be detected in pseudorandom sequences of birdsong syllables that are forced to match the durations of real songs. James et al. (5) interpret their model as approximating simple motor constraints, while stronger effects in the real data would indicate additional mechanisms (e.g., communicative efficiency through behavioral plasticity). I originally planned to compare the strength of Menzerath’s law in the real data with simulated data from the model of James et al. (5), as I recently did for house finch song (6), but analyses of language data suggest that it is far too conservative of a null model. 0 of the 51 of languages in the DoReCo dataset exhibit Menzerath’s law to a greater extent than simulated data. Even though many whale species exhibit Menzerath’s law to a greater extent than simulated data from the null model of James et al. (5) (75%; 12 out of 16 species), I do not want to over-interpret this result given the pattern in the human data. Upon further reflection I think that the fundamental assumption of James et al. (5), that sequence durations are governed by motor constraints alone, is unlikely to apply to many species with more complex communication systems. In humpback whales and sperm whales, for example, there appears to be significant inter-individual variation in song and coda length depending on social context (7, 8). More details about this analysis are below.
The production constraint model of James et al. (5) works as follows. For each iteration of the model, a pseudorandom sequence was produced for each real song in the dataset. Syllables were randomly sampled (with replacement) from the population until the duration of the random sequence exceeded the duration of the real song. If the difference between the duration of the random sequence and the real song was < 50% of the duration of the final syllable, then the final syllable was kept in the sequence. Otherwise, it was removed. Each iteration of the model produces a set of random sequences with approximately the same distribution of durations as the real data.
For each species, I generated 100 simulated datasets from the (1) random sequence model and the (2) production constraint model. Then, I fit Menzerath’s law separately to each of the 100 simulated datasets and pooled the model estimates for \(a\) and \(b\) using Rubin’s rule as implemented in the mice package in R. The results can be seen in Figure 3.
Most importantly, the estimated effects from the production constraint model tend to be more negative than those from the real human language data, suggesting that this null model is far too conservative to be informative about “language-like” efficiency.
Figure 3: The point estimates (points) from the real data alongside 95% confidence intervals (bars) from 10 simulated datasets from the production constraint model, for the effect of sequence length on element durations and inter-onset intervals for the 16 whale species and 51 human languages. The human language data are comprised of phonemes within words. The colors correspond to the taxonomic group and whether the data are element durations (ED) or inter-onset intervals (IOI).
Figure 4: The point estimates from the original datasets (orange) compared to median-interpolated datasets (blue). Interpolating sequences with the median inter-onset interval of each phoneme appears to systematically shift model estimates towards zero (in over 90% of cases).
Martin et al. (9) noticed that Heaviside’s dolphins sometimes produce temporally-patterned burst pulses with much more rhythmic variation, especially during social interactions. I analyzed 27 patterned burst pulses provided by Martin et al. (9) and found that they adhere to Menzerath’s law—there is a negative relationship between sequence length and inter-onset intervals (estimate = -0.186, 95% CI: [-0.308, -0.063]).
Figure 5: The baleen whale (Mysticete) species included in the study (left), alongside the distribution of element durations or inter-onset intervals and sequence lengths (middle) and the slope of Menzerath’s law (right). The x- and y-axes have been log-transformed and z-scored to match the structure of the statistical model. Each point in the distribution plots (middle) marks the mean element duration or inter-onset interval, but the slopes on the right were computed from the full set of elements/intervals. The bars in the slope plots (right) mark the 95% confidence intervals around the point estimates.
Figure 6: The toothed whale (Odontocete) species included in the study (left), alongside the distribution of element durations or inter-onset intervals and sequence lengths (middle) and the slope of Menzerath’s law (right). The x- and y-axes have been log-transformed and z-scored to match the structure of the statistical model. Each point in the distribution plots (middle) marks the mean element duration or inter-onset interval, but the slopes on the right were computed from the full set of elements/intervals. The bars in the slope plots (right) mark the 95% confidence intervals around the point estimates.
Group | Species | Effect | 2.5% | 97.5% |
|---|---|---|---|---|
Mysticete | Blue Whale | -0.102 | -0.139 | -0.065 |
Bowhead Whale | 0.368 | -0.386 | 1.121 | |
Humpback Whale | -0.696 | -0.943 | -0.450 | |
Sei Whale | 0.249 | -0.422 | 0.921 | |
Odontocete | Killer Whale | -0.114 | -0.314 | 0.086 |
Language | Effect | 2.5% | 97.5% |
|---|---|---|---|
Anal | -2.104 | -2.613 | -1.594 |
Arapaho | -1.438 | -1.704 | -1.172 |
Asimjeeg Datooga | -2.295 | -2.952 | -1.638 |
Baïnounk Gubëeher | -1.936 | -2.380 | -1.492 |
Beja | -1.455 | -1.813 | -1.097 |
Bora | -1.983 | -2.394 | -1.572 |
Cabécar | -1.642 | -1.935 | -1.349 |
Cashinahua | -2.115 | -2.568 | -1.663 |
Daakie | -1.390 | -1.788 | -0.992 |
Dalabon | -1.957 | -2.426 | -1.487 |
Dolgan | -1.412 | -1.731 | -1.094 |
English (Southern England) | -1.065 | -1.315 | -0.815 |
Evenki | -1.578 | -1.799 | -1.357 |
Fanbyak | -1.230 | -1.524 | -0.936 |
French (Swiss) | -0.938 | -1.141 | -0.736 |
Goemai | -0.955 | -1.238 | -0.671 |
Gorwaa | -2.120 | -2.681 | -1.559 |
Hoocąk | -1.314 | -1.539 | -1.088 |
Jahai | -1.786 | -2.138 | -1.435 |
Jejuan | -1.618 | -1.947 | -1.289 |
Kakabe | -1.992 | -2.539 | -1.446 |
Kamas | -1.108 | -1.376 | -0.840 |
Komnzo | -1.656 | -2.085 | -1.227 |
Light Warlpiri | -1.624 | -2.168 | -1.079 |
Lower Sorbian | -1.343 | -1.688 | -0.999 |
Mojeño Trinitario | -1.436 | -1.758 | -1.113 |
Movima | -1.783 | -2.134 | -1.432 |
Nafsan (South Efate) | -1.565 | -1.870 | -1.259 |
Nisvai | -2.496 | -3.049 | -1.943 |
Nllng | -1.859 | -2.408 | -1.309 |
Northern Alta | -1.648 | -1.981 | -1.315 |
Northern Kurdish (Kurmanji) | -1.412 | -1.904 | -0.919 |
Pnar | -1.713 | -2.278 | -1.149 |
Resígaro | -1.271 | -1.609 | -0.934 |
Ruuli | -1.789 | -2.048 | -1.530 |
Sadu | -0.838 | -1.178 | -0.497 |
Sanzhi Dargwa | -2.009 | -2.598 | -1.420 |
Savosavo | -1.506 | -1.990 | -1.022 |
Sümi | -1.395 | -1.680 | -1.110 |
Svan | -1.695 | -2.095 | -1.294 |
Tabaq (Karko) | -1.727 | -2.101 | -1.352 |
Tabasaran | -1.506 | -2.085 | -0.928 |
Teop | -1.693 | -2.145 | -1.242 |
Texistepec Popoluca | -1.480 | -1.755 | -1.205 |
Urum | -1.822 | -2.043 | -1.600 |
Vera'a | -1.618 | -1.933 | -1.302 |
Warlpiri | -1.962 | -2.662 | -1.262 |
Yali (Apahapsili) | -1.414 | -1.667 | -1.162 |
Yongning Na | -0.930 | -1.281 | -0.578 |
Yucatec Maya | -1.056 | -1.257 | -0.855 |
Yurakaré | -2.145 | -2.537 | -1.753 |
Figure 7: The 95% confidence intervals for the effect of count on element duration for the five whale species and 51 human languages. The human language data are comprised of words. The colors correspond to the taxonomic group and whether the data are element durations (ED) or inter-onset intervals (IOI).
Figure 8: The whale species included in the study (left), alongside the distribution of element durations and counts (middle) and the slope of Zipf’s law of abbreviation (right). The x-axis has been z-scored, and the y-axis has been z-scored and log-transformed, to match the structure of the statistical model. Each point in the distribution plots (middle) marks the mean duration of elements, but the slopes on the right were computed from the full set of elements. The bars in the slope plots (right) mark the 95% confidence intervals around the point estimates.