After Scansion: Visualizing, Deforming, and Listening to Poetic Prosody

Scansion, for generations of American students, has been the dominant method of studying prosody in poetry. How and why did this happen? What if scansion had never become dominant? What alternative methods for understanding poetic prosody have been passed over? How might reliance on scansion of the text on the page—at the expense of other approaches to listening to and analyzing the prosody of recorded poems—have helped oversimplify the history of poetry performance, and of the evolution of poetic forms?

These are some questions we want to explore. First, we encourage readers to listen to the 12-minute podcast, AFTER SCANSION, which explores the goals and limits of scansion in the classroom and introduces vocal deformance. Vocal deformance is a playful strategy of defamiliarization that involves manipulating vocal qualities of a recording, such as pitch values, vocal tract size, and speaking rate, to draw attention to the subjective nature of speech perception and thus of vocal performance styles, and to imagine alternative histories of and futures for poetic performance. Next, we discuss the goals and limitations of scansion as typically practiced in the literature classroom in more detail (SCANSION IN THE CLASSROOM). Then we sketch some relevant history of the study of prosody across disciplines (A LITTLE HISTORY OF PROSODY ACROSS DISCIPLINES) and explain our methods (EXPLORING POETIC PROSODY: VISUALIZING INTONATION AND TIMING AND PRACTICING VOCAL DEFORMANCE), and introduce some simple open-source tools that support alternative heuristics for the teaching and study of prosody in poetry (THREE SIMPLE OPEN-SOURCE TOOLS FOR EXPLORING POETIC PROSODY: DRIFT, GENTLE AND TANDEM-STRAIGHT). These include a pitch-tracker (for tracing intonation patterns, to create pitch contours) that works especially well on noisy recordings often found in the poetry audio archive, a forced aligner (for tracking speaking rate and tempo by word and pause duration, and aligning lines of poetry with pitch contours), and a speech synthesis tool that we use for vocal deformance. These tools will, we hope, be useful in sound studies and literature courses, and in novel research on the vast and growing archive of recorded poetry and other performative speech.

Finally, in a spirit of serious play, we offer a casual history and some counterfactual examples of prosody-in-performance through brief samples from sixteen modern and contemporary poets and pitch and timing data about those recordings (SAMPLING THE PROSODY OF SIXTEEN POETS). We hope that poets and students of poetry at all levels will explore these recordings and, in listening to the original and deformed versions, and considering our visualizations and preliminary analysis of the data, attain a deeper understanding of prosody-in-performance and appreciate its centrality to the study of poetry. Readers may also want to see our article with linguist Georgia Zellou, “Beyond Poet Voice: Sampling the (Non)-Performance Styles of 100 American Poets,” published in The Journal of Cultural Analytics in April 2018, in which we analyze recordings of 100 modern and contemporary American poets and compare them with 20 conversational speakers.


Many readers probably remember the exercise in a literature course—reading a poem in silence, counting syllables and poetic feet, marking stresses or accents. Maybe chewing on a pencil in uncertainty, muttering the odd line out loud. But what does scansion try to capture? What is it good for? Let’s recall the classroom scenario in a little more detail.

With the goal of teaching close reading skills, our instructor might write a formal poem on the board, such as William Butler Yeats’s “The Lake Isle of Innisfree.” Then she, or a brave student or two, would read it aloud. For a while, the class would discuss the poem’s speaker (a disaffected urbanite), the natural imagery and lilting word choice (the cozy cabin, the bees and the beans, the lapping Lake, the sounds of crickets and linnets), the syntax (some reinforcing parallelism and repetition), and the tone—resolute, yet wistful.

And then, the class would take on the poem’s form together, applying terms developed for ancient Greek verse to Yeats’s poem in English, with this binary symbolic system that vastly oversimplifies the prosody of spoken English. (Three of these marks, it’s worth noting, are not featured on most keyboards, nor in ASCII, the American Standard Code for Information Interchange).


Screen Shot 2018-05-22 at 6.26.37 PM.jpg

The class would agree that “The Lake Isle of Innisfree” is written in quatrains, and, perhaps with some mild hectoring, that each quatrain begins with three long lines of loosely iambic hexameter, and ends with a shorter line of iambic tetrameter—though only one of the long lines has fourteen syllables; most have thirteen, one has fifteen. And only the last line of the poem has eight syllables; the two other short lines have nine. Our instructor would then lead the class toward consensus, for each line, as to where the stresses actually fall—or which syllables are stressed, which unstressed—and how much the pattern of stress matches or departs from the metrical expectations. Discrepancies in the number of expected syllables for many lines would be resolved by finding substitutions of three-syllable feet here and there—anapests, dactyls, or amphibrachs. And trochees would occasionally be found, subtitituted for iambs.

Ideally, in further discussion, the interplay between metrical expectations and the apparent rhythms of the poem would test and reinforce various interpretations. One student might argue that Yeats’s speaker is resolute, determined to escape some unnamed unsatisfactory locale; “I WILL aRISE and GO[,]” he declares iambically! On this interpretation, the metrical expectation of iambs, in the opening line, would coincide with the emphatic rhythm, reinforcing the speaker’s ardent wish. And the class might agree that the closing quatrain, which shifts into the present tense as the wish becomes imaginative reality, ends with three stressed syllables—against the metrical expectation of iambs—to emphasize how real the dream is to the speaker, how powerful his vision of escape, the “DEEP HEART’S CORE.” Plausible enough.

Inevitably, though, after such an exercise, some students would conclude that they are simply bad at scansion—because they did not understand or agree with the class consensus about the rhythms of the poem, they lack a certain sensitivity to rhythm and its semantic significance, and that therefore, they cannot really appreciate poetry.

Scansion can, unfortunately, reinforce the idea that a single, dominant interpretation of a poem is correct—usually the instructor’s—because it demonstrates one way, or one way that is more correct than other ways, to hear its rhythms. And little wonder if scansion turns students off, since critical comments like these, about the importance and difficulty of scansion, can be found in many guides to rhythm and meter, from proponents of both open and closed, or experimental and traditional, forms:

“[T]he readers are numerous who hear nothing when they read silently and who are helpless in their efforts to read aloud: some of them have defective sensibilities; some have merely never been trained[.]”

– Yvor Winters, “The Audible Reading of Poetry” (1951), in The Structure of Verse: Modern Essays on Prosody, ed. Harvey Seymour Gross (1979)

“[There is a] shocking inability of many students who have only read non-traditional poetry to scan traditional poetry properly. That evidence of aural insensitivity is something we all deplore; but when non-traditionally formed poems are equally mangled, it goes unrecognized and undeplored. The inability to scan traditional poems correctly is due to ignorance and laziness: if an individual has no natural ‘ear’ he or she can at least learn to read correctly. The inability to scan non-traditional forms is due sometimes to inflexibilities of expectation on the part of those who are skilled in traditional prosody, but more often the refusal of non-traditional practitioners to adopt and utilize a common terminology and a common recognition of the functions of the various techniques they use.”

– Denise Levertov, “On the Need for New Terms,” New & Selected Essays (1986)

Other students in our hypothetical class on Yeats’s “The Lake Isle of Innisfree,” might reasonably decide that the problem lies in scansion, not in their allegedly defective ears. They might, if not in so many words, argue that scansion has some serious limitations as a useful heuristic for the study of poetry—which, as we all know, has been primarily an oral form for most of history.

After class, the poem would be left speechless on the board, trapped in a lattice of lines and marks that bears too little relation to how the poet, or any other person, might actually perform the poem, in terms of intonation and stress.


Screen Shot 2018-05-22 at 6.30.28 PM.jpg

How can we liberate the poem, and hear it anew?

In teaching poetry, one of us (MacArthur) shares the system of scansion with students as a rather primitive curiosity, only to encourage them to believe in and cultivate their intuitive sense of the rhythms of speech and the significance of tone of voice, in conversation and performance. A favorite assignment asks them to eavesdrop on, transcribe, and turn a conversation into a poem, which they might then perform, to begin to appreciate the role of intonation in expression and interpretation.

A highly anecdotal survey of early and mid-career poetry scholars, trained at a range of graduate programs in the 1990s and 2000s, reveals that some of us simply avoid teaching prosody in poetry—despite the fact that, in our reading lives and in our research, we care about it. Perhaps we have reached a consensus on the limitations of scansion, which feels unfashionably New Critical and just a little too geeky and specialized for today’s undergraduates—many of whom, nevertheless, enthuse about poetry slams, love rap and hip-hop, and otherwise hint at potential untapped interest in the performance of poetry. Teaching rap and hip-hop as poetry can’t be the only way (though it is a viable one) to meet them where they are.

One thing most English instructors do use in the classroom now are recordings of poets reading their work. After we listen, what then?

Let’s listen to Yeats reading the poem’s opening, in a 1936 recording.

“What??” Students might say. “That’s crazy. Why does he read it like that? Is he trying to put us to sleep, or enchant us with his fantasy? Is he a priest or something?”

How could we capture the differences between a conventional scansion of the poem on the page, and Yeats’s intonation patterns when he reads it?

If we consider the stress and pitch of his reading separately, we discover that Yeats’s near-monotone pitch patterns work against the stress he gives to accented syllables by increasing intensity (or volume). He’s deliberately flattening his pitch, avoiding contrastive intonation.



One thing our example of Yeats’s reading shows—and helps us hear—is that pitch and intensity are two independent acoustic qualities. Scansion tries to capture imagined intensity patterns, but it basically ignores pitch. We modulate pitch independently from intensity, for the purposes of expression. And when people talk, human listeners pay a lot of attention to changes in pitch. Arguably, if we had to choose between them, pitch matters more than intensity.

So let’s listen to what Yeats sounds like, with his pitch and intensity patterns more closely aligned

How does this version change our perception of the speaker’s tone? To me, the speaker suddenly sounds more determined, like he’s a ready to hike out of town immediately, instead of enchanting the reader to join in his fantasy, with a kind of incantatory tone.

So that’s one example of how changing the intonation patterns, but preserving the poet’s other vocal qualities, can help us recognize the importance of pitch patterns in the performance and interpretation of poetry. At the same time, these changes in pitch can help us imagine different interpretations.

Rae Armantrout, reading the first stanza of her poem “The Subject” (from Next Life [2007]) uses much more varied intonation, and her pitch accents line up more with her intensity patterns than Yeats’s pitch do—yet the sliding between pitch values, the difference between rising and falling intonation, and the pauses, for emphasis and contrast, would clearly not be captured by scansion. Listen.




Between the extremes of our intonation examples here—Yeats and Armantrout, whom we might characterize as near-monotone and highly expressive—an enormous amount of variety exists in poetry performance.


This complexity in how we perceive prosody highlights the fact that poetry is, after all, stylized communication. It is naturalistic, calling on deep, evolutionarily old neural pathways. These old pathways support language as practiced naturally, as all spoken and signed languages are, but not language as written.

Writing is a recent kludge, an add-on, not a deep, inevitable function of the human brain like speech. We cannot ignore this if we really want to understand the experience of poetry. In fact, from a bio/behavioral perspective, scansion as inferred from the written text seems a particularly inapt tool for poetic interpretation.

Bringing greater attention to the experience of listening to poetry, rather than limiting the study of poems to mostly silent reading and textual interpretation, requires that students know what to listen for. This naturally invites perspectives from psychology, linguistics, and the neuroscience of prosody. Understanding how our minds and brains actually process prosody could help students recognize it when they listen to poetry and speech, to parse it, analyze it, manipulate it. As a happy side-effect, they might also become more aware of their own prosodic tendencies in speech, which could help them develop skills in effective public speaking and presentations.

There are at least two challenges, however, the first historical. Though the aesthetics of prosody have enjoyed scholarly attention for centuries, the science of prosody is rather young. Hints about its neural bases evidently arose in the late 1800’s, around the time Paul Broca first identified the left-hemisphere dominance for fluent speech production. But while psycholinguistics flourished mid-20th-century, along with Noam Chomsky and the “cognitive revolution,” prosody as a neuroscientific topic appears to have languished until renewed interest in the 1970s.

It’s illustrative that aphasias (disorders in the use of propositional language) have been well characterized since Broca’s time, but the term aprosodias (disorders in the use and interpretation of emotional prosody, for instance in people with autism spectrum disorders) was first coined in 1981. Being scientifically underappreciated, the number of neuroscientific prosody studies is dwarfed by those addressing lexical, structural (syntax), and categorical aspects of spoken language. We just don’t know as much about it.

The second challenge for a student of prosody is conceptual. Prosody, as Yeats illustrates, is not one thing.

In typical use, the term “prosody” conflates many acoustically independent features, namely the modulations in pitch, intensity (volume), speaking rate or tempo, and rhythm, which endow natural speech with meaning beyond the words themselves—paralinguistic meaning. As we noted earlier, the accent mark used in scansion attempts to mark peaks in intensity to roughly characterize rhythmic patterns. What scansion does not attempt to capture is pitch accents or speaking rate or tempo, since it takes as its object of study the poem on the page, as read silently and sometimes aloud, but not primarily as heard or performed.

Using one term or one mark for several acoustic features makes it difficult to tease apart their possibly independent contributions to the poetic listening experience. Worse, these cues serve more than one role in communication, both linguistic (especially lexical and syntactic) and paralinguistic (especially affective). And our affective responses to acoustic, non-verbal qualities of speech matter tremendously to our interpretation of verbal semantics, of the meaning of the words spoken. We routinely recognize when a change in intensity and pitch is signaling a phrase boundary or word cue—HOT dog vs. hot DOG!, denoting processed meat in a bun or a state of excitement unrelated to food—or when it’s conveying an affective attitude or shift (sadness, sarcasm, anger, and so on), or both.

According to voice perception research in Foundations of Voice Studies: An Interdisciplinary Approach to Voice Production and Perception by Jody Kreiman and Diana Sidtis, when we listen to speech, “[s]ome authors … have claimed that normal adults usually believe the tone of voice rather than the words…. For example, the contrast in ‘I feel just fine’ spoken in a tense, tentative tone might be politely ignored, while, ‘I’m not angry’ spoken in hot anger would not” (304). In other words, we pick up on the affective meaning of a speaker’s tone of voice, and weigh it against the semantic meaning of the words spoken. While Kreiman and Sidtis argue that tone cannot be reduced to intonation patterns, “the fundamental frequency of the human voice [pitch] … heads the list of important cues for emotional meanings” (311). Pitch manipulation, then, changes the affective meaning of speech. Tone of voice is also influenced by other acoustic features, including speaking rate or tempo, rhythm, and intensity. In poetry recordings and live poetry readings, the poet’s tone of voice can dramatically influence the listener’s interpretation of a poem.

Much of this complexity of speech perception is reflected in our contemporary neuroscientific understanding of prosody. In the early days, the consensus held that prosody means pitch, pitch conveys emotion, and emotion is processed in the right hemisphere. Period. This view was accurate in a coarse way, but woefully incomplete. With the wider availability of techniques such as EEG (electroencephalogram) and neuroimaging, particularly fMRI (functional Magnetic Resonance Imaging) since the 1990’s, our understanding of prosody has become more nuanced.

Affective prosody is indeed processed preferentially in the right cerebral cortex, located a bit above the right temple. This fits ample evidence, confirmed with non-speech sounds that (in most people) the right hemisphere is biased toward frequency, pitch, and longer-duration acoustic features, whereas the left hemisphere is biased toward more rapid, temporal information. So far, so good.

However, emotional prosody relies not just on one right-hemisphere brain location for one instant but a vast, dynamic neural network, with activity developing over most of a second. The stress and pitch alone will register in early auditory brain areas, but then need to be contextualized in the utterance, compared to the semantics or meaning of the conversation, even the social context and speaker identity. So when a father tells his son in a falling tone, “Nice work, tiger,” he’s being supportive, but when a peer uses the exact same falling tone, he’s teasing. That is, the context and speaker alone render the same intonation pattern with the same words ironic or sarcastic, rather than sincere. And that emotional prosody brain network is substantially different from the one supporting linguistic prosody (indicating phrase boundary, or question vs statement), which is mostly left lateralized instead.

Finally, and crucially, which networks are recruited to attend to and parse an utterance aren’t determined by the acoustics so much as by the listener’s intentions, which can vary according to many contextual factors. For tonal language speakers, a given pitch contour will be processed in the left hemisphere for lexical (word) decisions but in the right hemisphere for frequency decisions.

Interestingly, one critic who might be thought responsible for the propagation of scansion, I.A. Richards, demonstrated an understanding of rhythm that far exceeds scansion’s insights. In “Rhythm and Meter,” a chapter from his seminal work Principles of Literary Criticism (1926), he wrote:

Rhythm and its specialized form, metre, depend upon repetition, and expectancy. Equally where what is expected recurs and where it fails, all rhythmical and metrical effects spring from anticipation. As a rule this anticipation is unconscious. Sequences of syllables both as sounds and as images leave the mind ready for certain further sequences rather than for others. Our momentary organization is adapted to one range of possible stimuli rather than to another. Just as the eye reading print unconsciously expects the spelling to be as usual, and the fount of type to remain the same, so the mind after reading a line or two of verse, or half a sentence of prose, prepares itself ahead for any one of a number of possible sequences, at the same time negatively incapacitating itself for others. The effect produced by what actually follows depends very closely upon this unconscious preparation and consists largely of the further twist which it gives to expectancy. It is in terms of the variation in these twists that rhythm is to be described.

The more that science discovers about perception and the auditory-linguistic brain, the more insightful this passage from Richards seems.

For decades, psychologists have appreciated how we carry a sort of buffer of recent history, an implicit memory, such that specific words (or words representing related concepts) repeated over time are processed differently (faster).

This is called “priming,” and it fits with the growing recognition that the brain is perhaps best described as a tremendously complex statistical prediction machine. Of course prediction is especially important for language, which is necessarily temporally structured and extended.

One recent study of temporal sound perception summarizes the presently influential view: “Brain function can be conceived as a hierarchy of generative models that optimizes predictions of sensory inputs and minimizes ‘surprise.’” In other words, the brain generates and constantly chooses among many models that predict the sensory inputs it is most likely to receive, with an apparent goal of optimal accuracy and minimal surprise, or error. The study goes on to show how we are continuously, and as Richards says unconsciously, tracking and predicting sound timing across multiple dimensions such as frequency, intensity, duration, and silent gaps or pauses

precisely the features that distinguish prosody.

Even without paying attention, our brains “know” within a fraction of a second (150ms) that an acoustic prediction was violated; the same goes for violations of semantic and syntactic predictions. When recorded via EEG/MEG this is called a “mismatch negativity” or MMN. And rhythm or strict temporal regularity, isochrony–what we might call strict adherence to metrical expectation in a poem–is only one type of pattern. The brain also automatically maintains expectations about more complex patterns in time or pitch, even holding more than one prediction at a time in memory.

In one way—a deep way—we are programmed from birth to test our expectations, in this case expectations of prosodic patterns, seeking experiences that are neither too predictable nor too surprising. This phenomenon known in cognitive neuroscience as the “Goldilocks effect”) . Too little complexity bores us. Too much confuses us and isn’t rewarding. This compulsion to learn is codified in the mathematics of information theory, developed by Shannon and Weaver in the mid-20th century, which shows that a totally predictable pattern is also a totally uninformative one. So when we encounter patterns that repeat over and over, both our subjective engagement as well as our neural responses tend to wane–a process called habituation.

Listen to the first few minutes of Ginsberg’s Howl (here in the KPFA 1956 recording at PennSound), included in the sixteen poets sampled. While he uses the same monotone cadence over many long lines, we are lulled—pleasantly or unpleasantly—and we probably attend less to what he is saying, because the prosody is uninformative. At the same time, his vocal pitch level is gradually rising, and that may help keep our attention, as it suggests change or intensification in mood. As Robert Hass put it in “Listening and Making,” an essay on rhythm in free verse in Twentieth Century Pleasures (2000), “Repetition makes us feel secure and variation makes us feel free.”

The principle spans age and behavioral domain, from infant attention to language learning to computer-based “brain-training” games. But though we all share this heuristic, we are still individuals. Having different brains with different experiential histories means that we engage the world with different statistical predictions.

So when we listen to a recorded poem, what counts as boring or too complex depends on our internal models, our predictions, which are in turn based on our previous experiences of listening to and reading poetry. Monotonous or otherwise highly repetitive prosodic patterns in poetry reading may comfort some listeners and bore others, while more idiosyncratic prosodic patterns in poetry reading may annoy the former and reward the latter. The sixteen poets in Sampling the Prosody of Sixteen Poets can be used as test cases to develop a listener’s sense of their own preferences.

In attending to the complex perceptual experience of listening to poetry in more detail, we are actually honoring the initial intent of I.A. Richards, whose work on poetry was deeply influenced by his collaboration with the linguist C.K. Ogden. He continued in “Rhythm and Meter” by urging poetry critics to pay more attention to the role of pitch and to “movement.” By this, we think he means speaking rate, or tempo, as influenced by word choice, syntax, punctuation and line breaks. And he also warned against the oversimplification of rhythm through metrical analysis, which scansion has encouraged:

The whole conception of metre as ‘uniformity in variety,’ a kind of mental drill in which words, those erratic and varied things, do their best to behave as though they were all the same, with certain concessions, licences and equivalences allowed, should nowadays be obsolete. It is a survivor which is still able to do a great deal of harm to the uninitiated…. it is as difficult to kill as Punch. Most treatises on the subject, with their talk of feet and of stresses, unfortunately tend to encourage it, however little this may be the aim of the authors.

As with rhythm so with metre, we must not think of it as in the words themselves or in the thumping of the drum. It is not in the stimulation, it is in our response. Metre adds to all the variously fated expectancies which make of rhythm a definite temporal pattern and its effect is not due to our receiving a pattern in something outside us, but to our becoming patterned…

The notion that there is any virtue in regularity or in variety, or in any other formal feature, apart from its effects upon us, must be discarded before any metrical problem can be understood.

A more serious omission is the neglect by the majority of metrists of the pitch relations of syllables. The reading of poetry is of course not a monotonous and subdued form of singing. There is no question of definite pitches at which the syllables must be taken, nor perhaps of definite harmonic relations between different sounds. But that a rise and fall of pitch is involved in metre and is as much a part of the poet’s technique as any other feature of verse, as much under his control also, is indisputable. [our italics]


These particular warnings and insights from Richards about the oversimplifications of metrical analysis appear to have fallen on deaf ears, if the subsequent dominance of scansion is any measure. It is true that, in 1926, the available tools did not make it easy to study these phenomena empirically. That is no longer the case. Whatever the reason, the dominant system of graphic scansion, set out in many guides to versification, rhythm and meter, has encouraged many of the hazards Richards warned against, and has largely ignored pitch.

As recently as the April 2017 issue of Poetry, we find the eminent poet James Longenbach, in “The Music of Poetry,” insisting that in considering tone in poetry, we must not consider pitch, and that, following Richards, we must focus on the poem on the page, not as read or heard aloud: “the influential literary critic I.A. Richards defined poetic tone as speaker’s ‘attitude to his listener.’ But it’s important to remember that Richards uses the words speaker and listener metaphorically; he’s talking about the written text, not an oratorical performance. What he refers to (equally metaphorically as a poem’s tone is generated by the material characteristics of rhythm and echo [repeated sonic patterns, such as alliteration and assonance]” (80).

Longenbach’s anti-oral reading of Richards has a long history. As one example of this tendency, we offer the influential work Poetic Meter and Poetic Form (1965) by Paul Fussell. Early on, Fussell defends graphic scansion against two alternative systems, which he calls “the musical” and “the acoustic.”


Prosodists use one of three systems of signs for scanning English verse: the graphic, the musical, and the acoustic…. Musical scansion does have the advantage of representing more accurately than graphic certain delicate differences in degree of stress: it is obvious to anyone that an English line has more than two prosodic kinds of syllables in it, and yet graphic scansion, preferring convenience to absolute accuracy, seems to give the impression that any syllable in a line is either clearly stressed or unstressed. But musical scansion has perhaps a greater disadvantage than this kind of oversimplification; it is not only complex, but even worse tends to imply that poetry somehow follows musical principles….


The third method of scansion, the acoustic, translates poetic sounds into the marks on graph paper produced by such machines as the kymograph and the oscillograph. Like musical scansion, this system has the advantage of accuracy, especially in its representations of many of the empirical phenomena of verse when it is actually spoken aloud; its disadvantages are its complexity, its novelty, and its incapacity to deal with rhythms which no speaker enunciates but which every silent reader feels. Musical scansion may do no harm to those already learned in musical theory; acoustic scansion may be useful to the linguist and the scientist of language; but graphic scansion is best for those who intend to become not merely accurate readers but also intelligent critics of English poetry [our italics].


More recent critics, such as T.V.F. Brogan, in The New Princeton Encyclopedia of Poetry and Poetics (1993), also caution against attempts to bring into the study of prosody in poetry anything but a binary system of marks: “all such efforts exceed the boundary of strict metrical analysis, moving into descriptions of linguistic rhythm, and thus serve to blur or dissolve the distinction between meter and rhythm. Strictly speaking, scansion marks which syllables are metrically prominent—i.e. ictus and nonictus—not how much. Scansions which take account of more levels of metrical degree than two, or intonation, or the timing of syllables are all guilty of overspecification” (1118).

Though we shrink from undue complexity and “overspecification” as much as the next person, Fussell’s easy dismissal of “the advantage of accuracy … in ... represent[ing] … the empirical phenomena of verse when it is actually spoken aloud” strikes us as bizarre. What are “rhythms which no speaker enunciates but which every silent reader feels,” if not one more New Critical assertion of the primacy of the text, of silent reading, over orality, aurality and performance? (On silent reading and poetry, see this recent issue of Thinking Verse, and also Christopher Grobe’s recent essay “On Book: The Performance of Reading” in New Literary History.)

If it ever did, it no longer seems crucial to prioritize analysis of meter over the actual rhythms of performed poetry, especially given that so much poetry written and performed today is in open forms, or varieties of so-called free verse. While we sympathize with Denise Levertov’s frustration, when she complains that “All discussion of contemporary poetics is vitiated by the lack of a more precise terminology,” new terms alone do not seem to help us appreciate prosody with more nuance. From the accentual-syllabic system of scansion to William Carlos Williams’s variable foot, throwing more words at the problem of close listening may result in new taxonomies, but not necessarily new insights.

Nor are more complex methods of graphic scansion helpful. Their goal is to increase perception of subtlety and nuance in intonation and stress. In practice, they can overwhelm and confuse even expert readers of poetry. Wimsatt and Beardsley, for instance, in their system of four degrees of relative stress, attempt to capture subtleties of accent with more confusing visual complexity than a simple intonation or intensity contour, while insisting that their object of study is not the audible reading of a poem.

Of course, scansion cannot be held responsible for the neglect of the aural and oral experience of poetry, of the audio archive, and ignorance of complementary linguistic methods of studying prosody. But it has not helped matters.

Some metricists, notably Harold Whitehall, have attempted to bring pitch into the discussion of meter, noting in Structural Essentials of English (1956) that “[pitch is] not usually regarded as important in prosody. Nonetheless, the higher pitches usually occur at points of primary stress and reinforce the stress peaks in both the metrical and isochronic line” (419). Though this is often true, pitch is more variable and complex than a stress peak; intonation has shape, not simply highs and lows, and whether pitch is rising or falling, and how steeply, influences auditory perception and cognitive interpretation. Alan Holder, in Rethinking Meter: A New Approach to the Verse Line (1995), takes particular exception to the metrical foot and begins to seriously investigate the role of pitch in metrical accent. He puts it nicely: “Perusing the critical literature of prosody … the sad fact is that the bulk of that literature is wrong-headed or irrelevant, forever warming up dubious pieties. It appears to be a law of metrical studies that once a notion is advanced, it persists, as stubbornly as Richard Nixon, in never going completely away” (19).

We suggest that not only “the linguist and the scientist of language,” but also the student of poetry, interested in performance history and the aural dimensions of poetry, might find stimulating and insightful, and clarifying—rather than too complex—some methods of acoustic analysis. These methods allow us to attend to two crucial paralinguistic qualities, namely pitch and timing. We would like to mention here some crucial early and recent work on intonation in recorded poetry, including G. Burns Cooper’s Mysterious Music: Rhythm and Free Verse (which also made use of spectrographs), the work of Reuven Tsur (who has also contributed to this Colloquy), and all of the contributors (particularly Lacy Rumsey) to the recent issue of Thinking Verse, edited by David Nowell Smith and Natalie Gerber.

Our contribution offers a different emphasis: we want to democratize access to simple open-source tools for the study and teaching of poetic prosody and the history of poetic performance that do not require a great deal of technical skill or linguistic training. This is not to diminish linguistic expertise—the understanding and interpretation of prosodic patterns requires considerable study—but to let students and scholars explore viable alternatives to scansion.

In a recent article in PMLA, “Monotony, the Churches of Poetry Reading, and Sound Studies,” one of us (MacArthur) outlined a rough history of poetry reading styles. As with other modes of performance, and in rough parallel with the evolution of acting, styles of poetry reading in the U.S. have gone through fashions, displaying an unresolved conflict between sincerity and theatricality, both before the institutionalization of the reading in the academy and since.

In the late nineteenth and early twentieth centuries in the U.S. and England, the ideal form of poetry reading was formal, dramatic recital, emphasized through elocution training and sometimes used, misleadingly, to demonstrate literacy. As Lesley Wheeler notes in Voicing American Poetry: Sound and Performances from the 1920s to the Present, notes, “many modernist-era poets were schooled in the presumption that the poet would rarely be the best oral interpreter of her own poetry, … that superior recitation required unusual skill and sensitivity” (4).

While we might agree with these presumptions—many talented songwriters and playwrights do not perform their songs and plays—modern and contemporary poets have long been expected to read their work to audiences, as audio recordings and (sometimes recorded) poetry readings help them gain recognition and supplement income from teaching. Robert Frost, who according to Allen Ginsberg invented the contemporary poetry-reading circuit, imagined in his anonymous youth that he might make a little side money declaiming Shakespeare (see Jay Parini, Robert Frost: A Life, 319; Lawrance Thompson, Robert Frost: The Early Years, 1874-1915, 154). Frost maintained that both a poem and a poetry reading are performances, that a poet ought to be able to “act / Without his being taken for an actor” (The Notebooks of Robert Frost 544).

When Frost gave readings, he chatted about poetry, art and politics, passing down an anecdotal, pedagogical reading style exemplified by popular contemporary poets such as Billy Collins. The antithesis of Frost’s approachable manner was, allegedly, the austere, influential reading style of high modernist poets (e.g., T.S. Eliot, Wallace Stevens)—which was overthrown in turn by the Beats and the confessional poets at midcentury (Christopher Grobe, in “The Breath of the Poem: Confessional Print/Performance Circa 1959,” calls this “poetry’s performative turn”), inspired by various sources, from the theatrical bard and “charismatic drunkard” Dylan Thomas to Method acting (Grobe 216, 225; Wheeler 131).

An understated style of academic poetry reading arose in response to, without entirely displacing, the perceived histrionics of the confessional reading style, which Donald Hall criticizes in 1985: “the poet’s performance substitutes an actorly texture (pitch, volume, gestures; screaming, jumping, singing) for the real sound of words [italics mine]” (Hall 76). By 1998, Charles Bernstein (whose aesthetic allegiances otherwise differ from Hall’s) characterizes as “anti-expressivist … the common dislike, among poets, of actors’ readings of poems… [when] the ‘acting’ takes precedence over letting the words speak for themselves (or worse, eloquence compromises, not to say eclipses, the ragged music of the poem) [italics mine]” (Close Listening 10).

Wheeler ascertains that a neutral style is now the norm; after attending an exhausting number of readings at the Associated Writing Programs (AWP) conference in 2006, she concluded that as a rule “poets perform the fact that they are not performers…. poets [do] not display emotions at their readings but instead ten[d] to manifest intellectual detachment, if not in the poem’s words then through carefully neutral delivery” (Wheeler 140). A Poets & Writers grant application separates poets into two categories: “poet” and “performance poet” (“Funding for Readings/Workshops”).

As vague and deceptively neutral as the anti-expressivist alternative might sound (what is the “real sound of words”?) it is a fact that an almost Puritan avoidance of theatricality—justified by the implicit belief that an understated style implies sincerity, an anti-confessional, even spiritual humility, and/or skepticism about the coherence of the self—now typifies the reading style of a wide range of academically sponsored contemporary poets.

We need, however, to interrogate both this notion of neutrality and the opposition between the theatrical and the sincere, or the expressivist and the ostensibly neutral. It can be argued with equivocal logic that traumatic subject matter necessitates theatricality, or that theatricality sensationalizes trauma (how should Sylvia Plath’s “Lady Lazarus” be read?). Bernard Williams, who in Truth and Truthfulness: An Essay in Genealogy designates “Accuracy and Sincerity” as “the two basic virtues of truth,” suggests that a person sounds sincere if, in stating a belief, she sounds “spontaneous and uninhibited” (Williams 1, 193). Yet we cannot define a “spontaneous and uninhibited” speaking style, except to call it unrehearsed—intensity, affect, and inhibition depend on the speaker’s character and mood. Good Method acting, in short.

How can we get beyond characterizing an allegedly neutral or allegedly expressive style reading style as sincere? How can we be more precise about what we mean by expressive or neutral?


What scansion misses or diminishes in our perception of prosody, in contemporary and historical performances of poetry, might well be restored with two approaches to the study and teaching of recorded poetry alongside the text.

The first approach is, simply put, data visualization and analysis of pitch and timing. Not the complex acoustic spectrograms that Fussell might have feared, and which require considerable linguistic training to understand and analyze, but simple pitch contours, showing the intonation patterns of the voice in a recorded poem (see Sampling the Prosody of Sixteen Poets), including breaks in the pitch contour and the waveform that show moments of silence, or pauses (caesuras) within lines and between lines (except with very noisy older recordings, when background noise obscures the patterns of silence or pauses in the waveform, as with Whitman). Related data about pitch and timing can also be statistically analyzed, for instance comparing pitch range (in octaves) and pitch velocity (in octaves per second) as a measure of expressivity, and applying quantitative measures of timing to explore rhythmic patterns of stress or accent and speaking rate or temp. As mentioned below, timing information can also be used to verify whether a poet pauses at line breaks and caesuras marked with punctuation.

The second approach, which we call vocal deformance (introduced in our 2016 piece in Sounding Out!, which we draw on here), involves manipulating canonical recordings of poems to reflect on actual and counterfactual examples of prosody in poetry performance with greater attention to the subjective nature of the perception of speech, and thus of performance styles. In practice, vocal deformance is machine-assisted manipulation of vocal recordings, which is very common in experimental and pop music production, from Lori Anderson to Justin Bieber. But aside from the work of some experimental poets, vocal deformance has not been applied much to poetry recordings or the teaching of poetry. And there is lot we can learn from it. For our purposes, vocal deformance is essentially a playful strategy of defamiliarization that reminds us, in many ways, of the subjective, creative, even arbitrary nature of interpretation.

On the page, we might say that a poem is underdetermined, because its various aural potentials, and thus interpretations, aren’t realized. And even though, without deformance, we can listen to multiple performances of the same poem, and perform it in different ways ourselves, it can be difficult to break out of our own habitual speaking patterns. Or to find different recordings of the same poet reading with drastically different intonation patterns, much less as a different gender.

The concept of deformance dates to a 1999 essay by Jerome McGann and Lisa Samuels. They take inspiration from Emily Dickinson, who sometimes liked to read poems backward, for the potential insights of reading against the form, scrambling the original sequence, and so on. As an interpretive practice, vocal deformance opens up new possibilities for testing assumptions about performance, poetic authority and gender, and, potentially, about race, class, education, region, and canonicity.

Certainly, as Jennifer Stoever writes, “listening [is] an interpretive site where racial difference is coded, produced, and policed” (62). The same is true of gender difference and many other identity markers and cultural factors related to authority and authenticity. As Shai Burstyn notes in the article “In Quest of the Period Ear,” about attempts to imagine how contemporary audiences experienced medieval music, “culture plays a highly significant—though not exclusive—role in shaping the cognitive skills of its members” (695). If it is remarkably difficult to escape our stereotypical expectations and perceptions of what a person’s voice “should” sound like, that is partly because our brain uses such expectations to make predictions about our sonic experience. As well as refining our sense of our preferences in performance styles, vocal deformance may help us to viscerally realize some of the insights of language ideology research, such as Rosina Lippi-Green’s work demonstrating the role of accent and other linguistic markers, including intonation patterns, in civil rights cases of discrimination.

An unusual or unfamiliar manner of speaking or reading a poem created through vocal deformance—particularly when it manipulates a known voice, as with canonical poets, or a familiar way of speaking, as with conventional poetry reading styles—waves a red flag at the brain. Change wakes up the quiescent, habitual brain to something new and potentially informative, because the voice does not fit our expectations for what the person would or should sound like. By corollary, monotone performance is—at least acoustically—terrifically uninformative for the brain.

Two fundamental intonation patterns are rising or falling pitch. In American English, relatively high or relatively low pitch at the end of an utterance, compared to the beginning and middle, seems to carry distinct meanings, as demonstrated by Janet Pierrehumbert and Julia Hirschberg. They developed the ToBI (Tones and Break Indices) system for marking the prosody or intonation of speech. Rising intonation can make any utterance sound like a question, whether it is one or not.

A relatively high pitch at the end of an utterance, or at the end of a line of poetry—called a high boundary tone—can make the speaker sound less confident or assertive, and more open to other’s opinions. Rising intonation implies that more is to come, that the utterance is not conclusive or concluded, that it should be understood in connection to the next utterance, and sometimes, that the speaker seeks the listener’s agreement before proceeding. Uptalk, which often has negative associations in the media with young women’s voices (also see Paul Warren, “Credibility Killer and Conversational Anthrax: Uptalk in the Media,” in Uptalk: The Phenomenon of Rising Intonation (2016)), may be taken to indicate “uncertainty, continuation, deference, verification, facilitation, checking, grounding, negotiation, implication, and lack of confidence” (Warren 47)—depending on the listener and the context.

In Sampling the Prosody of Sixteen Poets, we offer deformances of recordings of canonical and contemporary poets, including examples of changing the perceived biological gender of the poets and changing falling intonation patterns, the sound of declarative confidence, into uptalk, to investigate how much ideas of poetic authority and interpretation of a poem might be changed by such differences in vocal performance style.


The three tools we use to visualize and track pitch and timing patterns, and to practice vocal deformance, are Drift, Gentle and TANDEM-STRAIGHT. These tools will, we hope, be useful in sound studies and literature courses, and in novel research on the vast and growing archive of recorded poetry and other performative speech.

The first two, Drift and Gentle, work together. (These tools are currently undergoing further development thanks a 2018 NEH Digital Humanities Advancement grant for Tools for Listening to Text-in-Performance, with a team of 20 user-testers in the U.S. and abroad. See a fuller explanation of the tools in Jacket2, “Introducing Drift and Gentle...”).

Gentle, developed in 2015 by Robert Ochshorn and Max Hawkins, is a powerful forced aligner that lines up a given transcript with an audio recording, word by word. It can be downloaded for free and installed on Macs. Gentle is built on top of an open-source speech recognition toolkit developed at Johns Hopkins University, Kaldi, which uses modern neural network-based acoustic modeling, trained on thousands of hours of recorded telephone conversations. Gentle was designed specifically to function with more flexibility than FAVE (Forced Alignment and Vowel Extraction), a tool developed in the Linguistics Lab at the University of Pennsylvania and commonly used by linguists, to be “easier to install and use…. handle noisy and complicated audio … and … be unusually accommodating to long or incomplete transcripts.” Gentle also works well with some musical recordings, particularly hip-hop and rap.

Here is a screenshot of Gentle’s interface.





A user simply uploads an audio file (in this case, the opening of Yeats’s “The Lake Isle of Innisfree”), with or without a transcript (in this case, with the transcript), and clicks “Align.” After a few seconds or a minute or so (depending on the length of the recording), Gentle produces a playable transcript of the audio file, with options to download the transcript with precise timing information as a CSV or JSON:




The “Conservative” and “Include disfluencies” options ask Gentle either to ignore disfluencies like “uh” and “um,” or to include them. On noisier recordings, including disfluencies sometimes produces more accurate results; in some very noisy older recordings (as with Whitman reading “America,” included in Sampling the Prosody of Sixteen Poets), Gentle has trouble recognizing some words, but it still performs better than other forced aligners on such recordings.

Gentle can also produce a rough transcript of a vocal recording from scratch, which can then be corrected and aligned with the recording. This feature has great advantages in research on poetry recordings and other audio common in humanistic research, as transcripts for many recordings do not exist or are not easily accessible, in part because of copyright law.

Here is an example of Gentle’s CSV output for the Yeats snippet, with pauses calculated between words.



Clearly the most significant, audible pauses occur between the second “and” and “go” in the first line, and between “Innisfree” and “And,” marking the line break. In cases when Gentle could not recognize and calculate the duration of some words, either because of the quality of the recording or because the words were not in its lexicon (e.g., “Innisfree” and “wattles,” here), we made corrections to the timing information using Audacity.

Such data may allow us to test some of insights and questions about the interplay between poetic form and prosody in performance. For instance, Mary Kinzie, in A Poet’s Guide to Poetry (1999), observes that “the line is the conservative force and the sentence is the anarchist.” In other words, the sentence often wants to continue to its end, over a line break, but the unit of the line can curb it. One thing we find in Sampling the Prosody of Sixteen Poets is that poets, whether they write formal or freer verse, exercise considerable latitude over whether they audibly pause at the line break, or let the syntactic momentum carry over into the next line without pausing.

Drift, the second tool we would like to introduce, was prototyped in 2016 by Ochshorn and Hawkins with support from MacArthur’s ACLS Digital Innovations Fellowship. It is a highly accurate pitch-tracker that also incorporates the forced alignment features of Gentle, visualizing a pitch trace over time and aligning it with a transcript. Using an algorithm developed by Byung Suk Lee and Daniel P. W. Ellis at Columbia University to work with precise accuracy on the noisy, low-quality vocal recordings common in the audio archive, Drift measures what human listeners perceive as vocal pitch (the fundamental frequency, the vibration of the vocal cords, as measured in hertz) every 10 milliseconds in a given recording. Drift can also be downloaded and installed on a Mac; to function, it needs to be running on High Sierra system and Gentle must be opened first.

Here is a screenshot of Drift’s current interface, after it has analyzed the Yeats recording and aligned the transcript. The features allow the recording to be played from the text on the left by clicking on a word, or by moving the blue dot in the play bar at the top.



Here is the beginning of the CSV output for Drift, which works with Gentle to align the transcript with the pitch contour:



The data from Drift and Gentle, along with some additional data from TANDEM-STRAIGHT, is what we we use to calculate and characterize patterns related to pitch and timing for the samples of sixteen poets below.

The third tool, TANDEM-STRAIGHT, is a state-of-the-art open-source voice synthesis program developed by Hideki Kawahara at Wakayama University in Japan, with the Advanced Telecommunications Research Institute and the Auditory Brain Project. It can be downloaded and installing for free for academic use. It basically works by applying signal processing algorithms based on human auditory processing to create a rich model of a recorded voice, which can then be manipulated. This screenshot of TANDEM-STRAIGHT’s Graphic User Interface (GUI) shows the spectrograph and pitch contour for Yeats reading the opening of “The Lake Isle of Innisfree.” One can draw in a different shape for the pitch contour, as well as change the size of vocal tract (Size) and speaking rate (Duration), click “Synthesize” to listen, and save the manipulated recording as a WAV file.



Kawahara’s home page at Wakayama University includes several tutorials and guides to TANDEM-Straight. This one-minute movie walks through the basic steps, and this three-minute movie shows how to manipulate the different aspects of the Graphic User Interface, including f0 (pitch), duration (ratio), and so on. TANDEM-STRAIGHT is compatible with Matlab, but the GUI can also function without it, and the user can save manipulated files as WAV files.

NOTE: In 2016, some of Kawahara's colleagues released another speech synthesis tool that may prove as or more useful than TANDEM-STRAIGHT, called World. We are exploring it for future research.


Let’s imagine that, in an introductory course in poetry or literature, we want to emphasize the performative, the oral and the aural in modern and contemporary poetry in the U.S. Rather than beginning the course with an overview of literary movements or the evolution of genres or literary forms, we would present students with sixteen poems, a sample of the varieties of poetic performance, with sample recordings of those poems recorded—the originals and deformances—and related data about pitch and timing in the original recordings, all to listen to and explore. Assignments in close listening and analysis would engage students in questions about prosody-in-performance and poetic interpretation, questions about gender and poetic authority, and the topic of opening the canon to new voices and new forms.

We chose these poets, and aimed to choose a representative recording of each, because of their influence as poets and/or performers of poetry. Because of copyright we cannot include here the entire recordings of the poems, which are hyperlinked, but we do include the shorter recordings (again, originals and deformances) with each sample.

Walt Whitman – “America” (1889/90)

W.B. Yeats – “Lake Isle of Innisfree” (1936)

T.S. Eliot – “The Waste Land” (1946?)

Robert Frost – “The Road Not Taken” (1956)

Allen Ginsberg – “Howl” (1956)

John Ashbery – “At North Farm” (1987)

Amiri Baraka – “Poem for Half-White College Students” (1965)

Frank Bidart – “An American in Hollywood” (2008)

Edna St. Vincent Millay – “Love Is Not All” (1941)

Anne Sexton – “Music Swims Back to Me” (1974)

Sylvia Plath – “Lady Lazarus” (1962)

June Jordan – “A Poem about Intelligence for My Brothers and Sisters” (1992)

Louise Gluck – “Wild Iris” (1992)

Rae Armantrout – “The Subject” (2004)

Eileen Myles – “Each Defeat” (2009)

Trace Peterson – “After Before and After” (2015)


These sixteen samples, which give a little history of prosody-in-performance in themselves, will help students directly encounter the poetry of earlier historical periods, and also reflect on their own response to varied vocal performance styles.

Before listening to the original recordings, we offer a counterfactual history of poetry performance by deforming the originals. What if our biologically female poets were male? And what if our canonical male poets, whose voices are often more familiar to us, were female instead?

Here, for the sake of comparison, are the originals in two files, including Trace Peterson, a transgender poet whom we included in the sixteen and whose voice we did not deform.

These manipulations can help dramatize questions about poetry in performance, and about speech perception, that sometimes feel theoretical or abstract. For instance, do we ascribe more authority to male-sounding poets than to female-sounding poets, or to male-sounding speakers in general, or to any human speaker with lower average pitch? We might not want to think we do, but when we laugh at a female Eliot, is it just because we’re used to hearing him as male? Or are we also laughing at the idea of taking an elderly woman’s tremulous voice seriously, as a voice of poetic authority?

And now, what if we turned Eliot’s authorative, confident falling intonation into uptalk?

Of course it’s fun to make Eliot use uptalk, but it’s not only fun. Because the second version sounds more speculative, we might feel invited by the speaker to consider whether such moments (“The awful daring of a moment’s surrender”) do define our existence or not, even to challenge him on that point. The Waste Land so often feels like a closed case. Perhaps this sort of deformance can open it up again.

In Table 1, we present a selection of data about the poets, the poems, and the recordings for each sample (from eight to sixteen seconds long) and the full recorded poem (or long section of a longer poem, in the case of Ginsberg and Eliot). The data was generated by analyzing the output from Drift and Gentle, as well as from TANDEM-STRAIGHT, in R Studio and Matlab. Simple formulas in Excel can be used to generate some of this data, such as average pitch, pause duration, etc.


Screen Shot 2018-05-22 at 9.14.39 PM.jpg

IMPORTANT NOTE: The data should not be used to generalize about a poet’s vocal performance style, as the context of a recording (for instance, a studio recording versus a live reading, an enthusiastic versus a lukewarm audience, and age versus youth, among other factors) influence a vocal performance style on a given occasion. And crucially, the recording format or medium matters. With older recordings, such as the wax cylinder used to record (probably) Whitman, wider pitch range and higher pitch might have been used simply to make sure his voice was recorded effectively. This is not to mention the evolution of an individual poet’s performance style over an entire career. What such data does provide, along with the visualizations and deformances for each sample, are ways of testing, refining and adding precision to our observations about the vocal performance style at the level of the individual performed poem.

Table 2 includes Words Per Minute (WPM), Pause Counts (the number of pauses of at least 100 milliseconds and 250 milliseconds in length, and Average Pause Length. The Standard Deviation of Pause Duration is one approach to estimating the regularity of the speaking rate or tempo; the higher the standard deviation, the more the pause lengths differ. Rhythmic Complexity is what it sounds like, as a higher value suggests a less predictable, and thus less repetitive, pattern of pauses and pause length. It is meaningful only in comparison among different speakers. For instance, of our Sixteen Poets, Allen Ginsberg has the lowest rhythmic complexity, while Edna St. Vincent Millay has the highest. (Rhythmic complexity is calculated here using the Lempel-Ziv algorithm to estimate Kolomogorov complexity, which is also used for compression, as with GIF or ZIP files. The idea is to find any and all repeated temporal patterns. The more easily one can reconstruct the data with a set of repeated patterns, in this case rhythmic patterns, the simpler it is, i.e. lower complexity.)

The Mean Pitch, or f0, is given in Hertz, and Pitch Range is calculated in octaves. Note: octaves increase logarithmically. When we graph the pitch contour for a speaker with lower relative pitch, he or she will appear to use a narrower pitch range. Calculating pitch range in octaves counters this misleading visual effect. (Converting linear Hertz to log Hertz before graphing is another option.) Pitch Velocity indicates how quickly vocal pitch changes, and whether the direction is rising (positive) or falling (negative). In very short segments, positive pitch velocity may be associated with questions, or a questioning, leading tone (including many of the same semantic possibilities as uptalk). All of these pitch measures, like rhythmic complexity, are of primary interest in comparing speakers.

Figures 11 and 12 visualize the pitch range for the samples and the full recordings.



Here we present the makings of sixteen close listenings, if you will. Each sample includes an MP3 that begins with the original recording, followed by one or more deformances, with shifts in gender, inverted pitch patterns (which usually turns a declarative original into uptalk), a more expressive or neutral style mimicking another poet, etc. Listening to the same lines read in different styles and genders, which may still recognizably have the poet’s unique vocal quality (sometimes called timbre, like a violin versus a flute, which is independent of pitch and timing variables), should help listeners reflect on and articulate how they respond to the original recording. A possible assignment would be to analyze one of the deformances, and compare it to the original.

SAMPLE 1 (1941)

Edna St. Vincent Millay, “Love Is Not All,” ll. 12-14)


“I might be driven to sell your love for peace,

Or trade the memory of this night for food.

It well may be. I do not think I would.”


The MP3 begins with the original sequence, followed by a deformance of Millay’s voice as male and a deformance that inverts her pitch contours.





Edna St. Vincent Millay, in this studio recording of “Love Is Not All” (Sonnet XXX), reads the final lines of the sonnet. A popular performer of her own poetry, Millay uses here a measured pitch range that would be above average for the male poets, at 1.48 octaves for the sample, but which is on the narrow side for the female poets, as she sounds restrained and calm about a subject of intense importance to her. She also has a fairly high and positive pitch velocity, at 1.52 octaves per second, implying frequent rising, leading intonation, as the speaker offers several ways of valuing other things over love, only to reject the possibility. Her sonnet is the most formal poem we consider, in iambic pentameter rhymed ABAB, and she reads fairly slowly (110 WPM), with frequent long pauses, and the highest rates of rhythmic complexity for all sixteen poets. While the pitch accents generally follow the metrical expectation, the pitch contour captures two subtle effects: she gives “your” and “this” higher pitch accents than we might expect, emphasizing the great value of “YOUR love” and “THIS night” with rising intonation, each word given a slightly higher pitch than the metrical expectation would suggest, in contrast, for instance, to the lower pitch given to “food.”

SAMPLE 2 (1974)

Anne Sexton, “Music Swims Back to Me” (ll. 11-16)


“Imagine it. A radio playing

and everyone here was crazy.

I liked it and danced in a circle.

Music pours over the sense

and in a funny way

music sees more than I.”


The MP3 begins with the original sequence, followed by a deformance of Sexton’s voice as male and a deformance that inverts her pitch contours.






One of the best-known female poets of the Confessional movement, Anne Sexton is an interesting examplar of one dramatic reading style. In this sample from a studio recording of “Music Swims Back to Me,” she uses an especially wide pitch range of 2.42 octaves, and she also uses a fairly fast, negative pitch velocity, at -1.4 octaves per second. The negative pitch velocity correlates with the strong falling intonation of the opening phrases, which use highly contrastive pitch to register surprise, or to surprise the reader; the high pitch accent on “CRAzy,” with falling intonation, directly contrasts with the poet—after a long pause—insisting that “I LIKED it.” Sexton’s speaking rate is moderate, at 132 WPM, and she pauses infrequently, but at length. The three significant pauses in the recording coincide with periods: after “it” (.39 seconds), after “crazy” (.65 seconds), and after “circle” (.52 seconds). Two of those also happen to be line breaks. This might make the form of the poem highly audible, for a free verse poem, or it might suggest that for Sexton, a conversational and dramatic reader, the momentum of syntax matters most. Only Millay, of the female poets, has longer pauses on average than Sexton.

SAMPLE 3 (1962)

Sylvia Plath, “Lady Lazarus” (ll. 23-29)


“The peanut-crunching crowd

Shoves in to see

Them unwrap me hand and foot—

The big strip tease.

Gentlemen, ladies

These are my hands

My knees.”


The MP3 begins with the original sequence, followed by a deformance of Plath’s voice as male and a deformance that inverts her pitch contours.







Sylvia Plath, also known of course as a Confessional poet, in this recording of “Lady Lazarus” reads a bit faster than Sexton, and like Sexton, pauses significantly for punctuation, after “foot—” (.41 seconds) and after “tease” (.69 seconds). Other short pauses are mostly inaudible, and so the form of the tercet in short lines would not be highly audible if not for the rhyme scheme. Plath, compared to Sexton, has a much narrower pitch range, at 1.16 octaves for this sample, but the widest range of all sixteen poets for the entire poem, at 3.74 octaves. This may suggest that she changes pitch range gradually, rather than shifting dramatically within a short passage. She has a similar pitch velocity to Sexton, at -1.5 octaves per second, which again suggests frequent falling, declarative intonation, which carries an air of authority, as the line: “These are my HANDS, my KNEES.” The pitch accents on those words carries an ironic emphasis, as does the elaborate politeness of the pitch-accented “GENTlemen” after the steely complaint, in a narrower pitch range, of the preceding lines.


SAMPLE 4 (1992)

June Jordan, “A Poem about Intelligence for My Brothers and Sisters” (ll. 30-36)


“ ‘How you doin,’ she answer me,

sideways, like she don’t

want to let on she know I ain’

combed my hair yet and here it is

Sunday morning but still I have the nerve

to be bothering serious work with these crazy questions about

‘E equals what you say again, dear?’ ”


The MP3 begins with the original sequence, followed by a deformance of Jordan’s voice as male, a deformance that inverts her pitch contours, and a deformance that intends to mimic Sexton’s vocal performance style.




June Jordan, in this recording of lines 30-36 from the very free verse “A Poem about Intelligence for My Brothers and Sisters,” uses the widest pitch range of the sixteen poets in a short sample, at 2.62 octaves. The wide pitch range may relate to her use of and contrast between two different characters, the voice of the speaker and the voice of “Mrs. Johnson” in the poem. Jordan, affiliated with the Black Arts movement, uses a pitch velocity similar to most of the other female poets, at -1.35 octaves per second. One thing that’s interesting about the lines selected from Sexton’s, Plath’s and Jordan’s poems is that they are clearly humorous, though Plath’s humor is heavily ironic. We might speculate whether humor in contemporary poetry performance sometimes correlates with rapid pitch velocity and falling intonation, as in the phrase with a high pitch accent on “STILL”; that is, “STILL I have the nerve...” Jordan also reads very fast, at 246 WPM, which marks the poem as more conversational than formal, and she hardly pauses significantly at all. The longest pause, of .25 seconds, occurs after “yet,” lingering on the humor of that line.


SAMPLE 5 (1992)

Louise Glück, “Wild Iris” (ll. 18-20)


“I tell you I could speak again: whatever

returns from oblivion returns

to find a voice”


The MP3 begins with the original sequence, followed by a deformance of Glück’s voice as male and a deformance intended to mimic Armantrout’s vocal performance style.



Louise Glück, reading from “Wild Iris,” provides us with our singular example of Poet Voice, or monotonous incantantion, as one of us (MacArthur) calls it. Her pitch range is not especially narrow, but she has a very low pitch velocity, at -0.68 octaves per second for the sample, and -0.48 for the whole poem, which suggests relatively slow changes between pitch values, and a subtly falling intonation pattern. We might speculate that a roughly iambic rhythm occurs in these lines, but as with Yeats, the pitch accents are flattened so that that pattern is less audible. Glück speaks fairly slowly here, at 129 WPM, with one significant pause—of .31 seconds—after “again:[.]” This coincides with a colon and creates some suspense, before “whatever[.]” There are two brief pauses after “oblivion” (.14 seconds) and “returns” (.18 seconds) as well.


SAMPLE 6 (2006)

Rae Armantrout, “The Subject” (ll. 1-7)


“It’s as if we’ve just been turned human

in order to learn

that the beetle we’ve caught

and are now devouring

is our elder brother

and that we

are a young prince.”


The MP3 begins with the original sequence, followed by a deformance of Armantrout’s voice as male, a deformance that inverts her pitch contours, and a deformance intended to mimic Glück’s vocal performance style.



Rae Armantrout, a poet affiliated early in her career with the L=A=N=G=U=A=G=E school, here reads the first seven lines of her poem, “The Subject.” She uses a fairly wide pitch range, of 2.08 octaves, and a very fast pitch velocity, at -2.49 octaves per second. This is audible in her rapid sliding from high to low pitch values; like Jordan’s and Sexton’s poems, Armantrout’s is clearly humorous. She speaks relatively quickly, at 169 WPM, but also has a number of long pauses, all of them to mark line breaks, after “human” (.41 seconds), “learn” (.32 seconds), “caught” (.32 seconds), “devouring” (.41 seconds), and “we” (.15 seconds). Is this surprising—that line breaks might matter so much to a contemporary poet clearly writing in free verse that she troubles to make them highly audible here—or predictable, in a poet who might include in her lineage William Carlos Williams and Denise Levertov?


SAMPLE 7 (2009)

Eileen Myles, “Each Defeat” (ll. 11-22)


“let me fry an egg

on your ass

& I’ll pick up

the mail.

I feel your

absence in

the morning

& imagine your

instant mouth

let me move

in with you—



The MP3 begins with the original sequence, followed by a deformance of Myles’s voice as male, a deformance that inverts her pitch contours, and a deformance intended to mimic Glück’s vocal performance style.




Sometimes called a second-generation New York School poet, Eileen Myles rivals Jordan and Sexton in her pitch range, of 2.25 octaves, perhaps resembling the former’s conversational style, yet with a pitch velocity similar to the moderate Millay’s, at 1.4 octaves per second. The high pitch accents on “let ME,” “FEEL your absence,” “in the MORNing” “move IN” stand out in the otherwise flatter intonation, creating a clear repetitive cadence, as the speaker rapid-fires a series of seductive and flattering proposals. Myles reads at the fastest rate of all 15 poets, at 266 WPM, only slightly faster than Jordan, and she has just two significant pauses, after “mail” (.36 seconds) and “mouth” (.21). The first coincides with a period, a line break and a stanza break, but the second pause seems merely to linger on “mouth.”



Trace Peterson, “After Before and After” (ll. 11-18)


“And from here
I can find the edge
of the cunning, supposedly
clear window that
divides us from the world
of Michael Kors, that
divides a kiss from
its aftertaste.”

The MP3 begins with the original sequence, followed by a deformance that inverts Peterson’s pitch contours.



In these lines from “After Before and After,” Trace Peterson, a young transgender poet, favors a fairly moderate speaking rate, at 149 WPM, with long pauses to mark some line breaks and create suspense. For instance, she pauses for 1.36 seconds after “here,” as the speaker contemplates what this new prospect reveals, and uses a semantically apt pause of . 45 seconds between “that” and “divides.” With a fairly narrow pitch range of 1.33 octaves in the sample, and 2.12 octaves for the whole poem, and a pitch velocity of -1.08 in the sample, her falling intonation sounds confident. The notable pitch accents on “find” and “clear,” however, seems to call into question whether she can “find” the edge, whether the window in in fact clear. And the falling pitch on “aftertaste” seems to underscore the possibly unpleasant, or simply ambivalent, sense of the kiss’s “aftertaste.”


SAMPLE 9 (1889-90)

Walt Whitman, “America” (ll. 1-2)


“Centre of equal daughters, equal sons,

All, all alike endear’d, grown, ungrown, young or old.”


The MP3 begins with the original sequence, followed by a deformance of Whitman’s voice as female and a deformance that inverts his pitch contours.





The 1889-90 recording of (probably) Whitman reading these opening lines of “America,” a six-line free verse poem with lines ranging from eight to twelve syllables, provides an example of our slowest dramatic reading style, at 63 WPM. Because of the noisy quality of the recording, we could not measure precisely the length or number of pauses, but he certainly pauses at the line break. Whitman also employs a fairly wide pitch range (1.45 octaves for the sample, and 1.92 octaves for the full poem), and uses highly contrastive pitch with repeated steeply falling intonation patterns, which serve to reinforce the parallel syntactic structure and perhaps to emphasize the theme of equality. His pitch velocity is -1.37 octaves per second in this sample, which indicates rapidly shifting pitch; the negative value indicates that, overall, his tendency is for falling pitch, which corresponds with more declarative, confident intonation, rather than questioning intonation. This is reflected in Whitman’s use of pronounced falling intonation, or low boundary tones, at the end of each utterance.


As we can see in the pitch contour, he says the word “equal” with rising pitch both times, and says the words “daughters” and “sons” with falling intonation. Conventional scansion of these lines might identify a roughly trochaic pattern—CENter of EQual DAUGHters, EQual SONS—and thus anticipate that the first syllable in “EQ” would be emphasized, as in typical pronunciation of the word. But Whitman goes against these expectations with his pitch accents. Similarly, he makes his longest pauses not at the line break (between “sons and “All”), but between “daughters” and “equal” (a full second) and between “grown” and “ungrown” and “ungrown and “young,” to highlight his theme of equality amid difference.


SAMPLE 10 (1930)

William Butler Yeats, “The Lake Isle of Innisfree” (ll. 1-2)


“I will arise and go now, and go to Innisfree,

And a small cabin build there, of clay and wattles made;”

The MP3 begins with the original sequence, followed by a deformance of Yeats’s voice as female, a deformance that inverts his pitch contours, and a deformance that aligns his pitch patterns with his intensity patterns.





In this studio recording, which we discuss in the “After Scansion” podcast, Yeats uses a near-monotone intonation pattern, with a pitch range of 1.33 octaves and a pitch velocity of -1.05. The flattened pitch patterns work against the stress he gives to accented syllables by increasing intensity (or volume), avoiding contrastive intonation. He speaks at a moderate pace, of 140 WPM. Clearly the most significant, audible pauses occur between the second “and” and “go” in the first line (perhaps emphasizing the speaker’s determination and briefly pausing, literally, before he sets out) and between “Innisfree” and “And,” marking the line break.


SAMPLE 11 (1956)

Robert Frost, “The Road Not Taken” (ll. 13-15)


“Oh, I kept the first for another day!

Yet knowing how way leads on to way,

I doubted if I should ever come back.”


The MP3 begins with the original sequence, followed by a deformance of Frost’s voice as female, a deformance that inverts his pitch contours, and a deformance that aligns his pitch patterns with his intensity patterns.



Frost, in this recording of “The Road Not Taken” at his home, reads much faster than Whitman, at 174 WPM, pauses very little, and uses a narrower pitch range, of .96 octaves in the sample and 1.29 octaves in the entire recording. Though this is a formal poem—written in five five-line stanzas of roughly iambic tetrameter / pentameter, with an ABAAB rhyme scheme—Frost does not pause for line breaks in this sample; the only perceptible pause (of .27 seconds) is between “doubted” and “if,” to enact the speaker’s feeling that he is unlikely ever to return and try the other road.

Though Frost famously said on many occasions that writing free verse is like playing tennis without a net, in his performance of this poem, at least, the rhythms of the sentence clearly take precedence over the formality, if ignoring line breaks is any indication. Favoring syntactic momentum over formal poetic rhythm would seem to fit Frost’s affinity for colloquial speech patterns, from which he developed his poetics early on. Because Frost’s average pitch is lower than any other male poet we sampled (85 Hz for the sample, ompared to Whitman’s 137 Hz), and his range is narrow (.95 octaves), the pattern of falling intonation is not as visually obvious. (Vocal pitch, as measured in linear Hertz increases exponentially, so that an octave above 80 Hertz is 160, for instance, while an octave above 160 Hertz is 320, which can make speakers at lower pitches appear as if they use less contrastive intonation on a graph). However, Frost’s pitch is similarly contrastive to Whitman’s, with a pitch velocity of 1.29 octaves per second, but the fact that the pitch velocity is a positive value, both for the sample and the full poem (for which is it 1.14 octaves) indicates a slight preference for rising, and thus leading statement or questioning intonation. His confident, declarative intonation in line 15, however (“I doubted if I should ever come back) is evident in the ease with which flipping the pitch contour turns into a question, as is audible in the first deformance of Frost.



T.S. Eliot, The Waste Land (ll. 403-405)


“The awful daring of a moment’s surrender

Which an age of prudence can never retract

By this, and this only, we have existed”


The MP3 begins with the original sequence, followed by a deformance of Eliot’s voice as female, a deformance that inverts his pitch contours, and a deformance that aligns his pitch patterns with his intensity patterns.



Eliot, in reading three lines from the last section of The Waste Land, uses a slightly narrower pitch range than Frost, at .83 octaves, a slightly slower speaking rate, at 152 WPM, and a slightly slower pitch velocity, at -1.17 for the sample and .97 for the last two minutes of the poem, implying a fairly consistent pitch speed, yet with a declarative intonation tendency in the sample, with perhaps more rising, leading or questioning intonation over the two minutes. Interestingly, Eliot makes slightly more and longer pauses than Frost, but like Frost, he ignores one of the line breaks, between “surrender” and “Which,” letting the momentum of the sentence matter more than the line break. His two long pauses occur at the second line break, between “retract” and “By” (.65 seconds), and between “only” and “we” (.52 seconds), to emphasize how much, in the speaker’s view, that moment of surrender defines our lives.

While of course some sections of The Waste Land become more formal – as Eliot wrote in “Reflections on Vers Libre” (1917), “the ghost of some simple metre should lurk behind the arras in even the ‘freest’ verse; to advance menacingly as we doze, and withdraw as we rouse”—this section, in response to the question “Datta: What have we given?” (ll. 402-409), consists of mostly long lines, from 11 to 14 syllables, though the first and last are shorter (7 and 5 syllables, respectively). The rhythm, a conventional scansion might point out, briefly approaches iambic with “The AWful DARing of a MOment’s surRENnder.” More interestingly, in terms of intonation pattern, is that Eliot’s pitch subtly rises toward, and then falls away from, the peak pitch of “surrender,” implying building and falling tension, and then ends with low boundary tones on “retract” and “existed,” to make the declaration confidently.

In remaking verse in the Modernist period, Eliot and Frost, for all their differences, share a significant preference in performance styles in these samples: they favor the momentum of the sentence’s rhythm, or the continuity of conversational intonation, over the formal patterning of the poem on the page, using pauses more for semantic emphasis within lines than to mark line breaks. Of course, the syntax of these few lines of their poems lack the parallelism that Whitman emphasizes with his intonation patterns and pauses, and the lines are not the short parcels of information Armantrout offers. It may be that in reading lines of poetry with parallel syntax, or simply reading shorter lines, Frost and Eliot might pause more regularly at line breaks.


SAMPLE 13 (1956)

Allen Ginsberg, Howl (li. 4)


“who poverty and tatters and hollow-eyed and high sat up smoking in the supernatural darkness of cold-water flats floating across the tops of cities contemplating jazz”


The MP3 begins with the original sequence, followed by a deformance of Ginsberg’s voice as female, and a deformance that aligns his pitch patterns with his intensity patterns.



Allen Ginsberg reads the fourth line of Howl, in this radio studio recording, in a near monotone, reminiscent of Yeats, but using a narrower pitch range of roughly one octave (1.08 for the sample, though he uses a range of 1.7 octaves over the first 2:46 minutes of Howl, as his pitch gradually rises), and with a very low pitch velocity (.52 octaves per second for the sample and -.86 octaves per second for the first 2:46 of Howl). Though this is a free verse poem, it has a clear pattern of long lines using anaphora and parallelism. While he uses the same monotone cadence over many long lines, we may be lulled—pleasantly or unpleasantly—and we probably attend less to what he is saying, because the prosody is literally uninformative. At the same time, his gradually rising vocal pitch may help keep our attention, as it suggests change or intensification in mood. (Many political and religious figures, notably the Reverend Martin Luther King, Jr. in his “I Have a Dream” speech, use a similar rhetorical progression with monotone cadence and rising pitch.) According to Jason Miller, Dr. King composed and delivered an earlier version of the speech as a poem, titled “Psalm of Brotherhood,” in Detroit on June 23, 1963.)

Ginsberg’s speaking rate is very close to Eliot’s, at 154 WPM, and like Frost he pauses infrequently, for semantic emphasis. The only significant pauses in this sample occur between “poverty” and “and” (.33 seconds) and “tatters” and “and” (.38 seconds), and “high” and “sat” (.31 seconds). That parallelism, in the simple list of nouns and adjectives-treated-as-nouns, resembles Whitman’s slow and emphatic reading of the parallel syntax in the opening lines of “America,” though without Whitman’s high pitch accents or contrastive intonation. As noted in the Whitman sample, Ginsberg’s rhythmic complexity, over the first 2:46 minutes of Howl, is the lowest of any of the male poets sampled, at 2.03, suggesting a repetitive rhythmic pattern. All of these effects make Howl sound like a very formal poem.


SAMPLE 14 (1985)

John Ashbery, “At North Farm” (ll. 1-3)


“Somewhere someone is traveling furiously toward you,

At incredible speed, traveling day and night,

Through blizzards and desert heat, across torrents”


The MP3 begins with the original sequence, followed by a deformance of Ashbery’s voice as female, a deformance that inverts his pitch contours, and a deformance that aligns his pitch patterns with his intensity patterns.




In this recording of a live reading, John Ashbery reads the opening lines of his free verse poem “At North Farm.” The intonation patterns here show a consistently falling rhythm, in each phrase and at the end of each line, with the higher pitch accents usually visibly lining up with the syllables we would expect to be stressed in conventional scansion. He has, with Amiri Baraka, the narrowest pitch range, at .75 octaves, for this sample, yet his pitch range for the whole poem is wider, at 1.6 octaves, which may suggest that it widens as the poem develops. Ashbery’s pitch velocity in the sample is similar to Ginsberg’s, at -0.9 octaves per second in the sample and almost as slow, at -1.2 octaves per second, over the entire poem, which indicates falling, declarative intonation, despite the fact that the poem includes several questions addressed to the reader about the identity of the mysterious traveler—the younger self? a potential or former lover?—and the reader’s thoughts about him. We may notice the falling pattern over and over in the pitch contours of the sample, in each phrase and at the end of the first two lines. Ashbery reads a bit slower than most other male poets sampled here, aside from Whitman and Frank Bidart, at 132 WPM, and he makes just two significant pauses, between “speed” and “traveling” (.24 seconds) and “night” and “Through” (.95 seconds). The first long pause coincides with a comma—what we might call a caesura, in conventional scansion—and the second with a line break.


SAMPLE 15 (1965)

Amiri Baraka, “Poem for Half-White College Students” (ll. 9-11)


“How do you sound, your words, are they

yours? The ghost you see in the mirror, is it really

you, can you swear you are not an imitation greyboy”


The MP3 begins with the original sequence, followed by a deformance of Baraka’s voice as female and a deformance that inverts his pitch contours.




Amiri Baraka, reading three lines from “Poem for Half-White College Students” at San Francisco State in 1965, includes a series of questions, though only the first ends with a question mark. In this poem of direct address, Baraka uses the same narrow pitch range as Ashbery in this sample, at .75 octaves, but over the whole poem, his range is the widest of all the male poets we sampled, at 2.25 octaves, similar to Frank Bidart’s. He also uses highly contrastive intonation, visible in the pitch contour; his pitch velocity is -1.25 octaves per second for the sample, with falling intonation, perhaps because his questions are rhetorical questions—challenges to disagree more than open questions—and 1.23 octaves per second for the whole poem, suggesting frequently rising intonation. He also places high pitch accents on “YOURS” and “YOU,” the final words in the questions. This marks them as questions, and makes them sound more challenging; another effect is to mark the line breaks differently. Between line 9 and 10, rather than simply pausing after the line break, Baraka emphasizes the beginning of new lines, as “YOURS” and “YOU” occur as the first words in line 10 and 11. This emphasis is reinforced by the only two significant pauses, which fall between “yours” and “the” (.57 seconds) and between “really” and “YOU” (.26 seconds) and, immediately following, between “YOU” and “can” (.25 seconds). Like Eliot and Ashbery, Baraka does not always pause at a line break, but does when it suits semantic emphasis, as between “really” and “YOU.” Like them, he sometimes lets the sentence’s momentum carry over the line, but unlike them, he uses a strong pitch accent at the start of each line for both semantic and rhythmic reasons.

Beginning several lines in a row with both a pitch and intensity accent on the first syllable—what would be called, in scansion, a substitution of a trochaic foot in an iambic line—and preceding the last word in questions with a pause, combine to create a relatively surprising rhythmic pattern that uses delay to create suspense and emphasis both. The jazz pianist and composer Vijay Iyer in “Embodied Mind, Situated Cognition, and Expressive Microtiming in African-American Music” has noted the pattern of a backbeat delay in 20th century African and African-American music, in which, for instance, “the snare drum is very often played ever so slightly later than the midpoint between two consecutive pulses” (407). Though this typically occurs in music in the interplay between two different instruments, not a single voice delaying an accent in a rhythmic pattern, it would be interesting to investigate whether the slightly surprising placement of pauses, followed by a pitch accent, recurs over the entire poem.



Frank Bidart, “An American in Hollywood” (ll. 7-9)


“Crazy narratives—that lend what is merely

in you, and therefore soon-to-be-repeated,

the fleeting illusion of logic and cause.”


The MP3 begins with the original sequence, followed by a deformance of Bidart’s voice as female, a deformance that inverts his pitch contours, and a deformance intended to mimic the vocal performance style of Yeats.



Frank Bidart, in reading lines 7-9 from “An American in Hollywood,” audibly stands out, with Baraka, for a distinctively dramatic performance style. His pitch range in the sample is the widest of these eight male poets, at 1.83 octaves, and 2.2 octaves over the entire poem, and his pitch velocity is similar to Baraka’s, at -1.33 octaves per second for the sample and -1.55 for the entire poem, suggesting a tendency toward falling, declarative intonation. He speaks more slowly than anyone but Whitman and Millay, at 99 WPM, and uses frequent long pauses and highly contrastive intonation. In the sample, one of these pauses marks a line break but also, primarily, his pauses provide emphasis within lines, like Eliot and Baraka. The four long pauses occur between “narratives” and “that” (1.23 seconds), “you” and “and” (1.04 seconds), “repeated” and “the” (1.03 seconds), and “logic” and “cause” (.55 seconds). One notable point about Bidart’s intonation patterns in this sample is that the second line ends with rising intonation—more is to come. To say “what is merely in you” with falling intonation would sound conclusive,” finished, as if that were the end of the story, when in fact the speaker’s point is that we acribe rational motives to behavior that simply arises from within us, without explaining itself, from who we are.


The development of Drift was supported by an ACLS Digital Innovations Fellowship in 2015-16. Grateful thanks to Dave Cerf for composing music based on our intonation patterns for the accompanying podcast, “After Scansion,” recorded at the Center for Mind and Brain at the University of California, Davis, and to Gillian White (University of Michigan), Ben Lee (University of Tennessee, Knoxville), Ben Glaser (Yale University), Christopher Grobe (Amherst College) and Steve Evans (University of Maine, Orono) for their helpful reading and listening suggestions related to scansion and poetry performance, and to the HiPSTAS (High Performance Sound Technologies for Access and Scholarship) research community led by Tanya Clement (University of Texas at Austin) and inspired by Charles Bernstein and PennSound. Thanks also to Robert Ochshorn and Max Hawkins for their work developing Gentle and Drift, and to MacArthur’s undergraduate research assistants at CSU Bakersfield (David Stanley and Mateo Lara) and at the ModLab at UC Davis (Pavel Kuzkin).

Marit MacArthur is a poetry scholar and a poet. She is associate professor of English at California State University, Bakersfield, and a research associate in Cinema and Digital Media at the University of California, Davis.

Lee M. Miller is a cognitive neuroscientist and bioengineer who studies the neural bases of speech perception, especially in noisy environments (and in the context of hearing loss, hearing aids, and cochlear implants). He is associate professor of neurobiology and technical director of the Center for Mind and Brain at the University of California, Davis.

Join the colloquy

Prosody: Alternative Histories

What are the historical stakes of prosody, and why should we ask? ‘Prosody’ refers both to the patterning of language in poetry and to the formal study of that patterning.


In both senses, it is roughly synonymous with ‘versification.’ Like many terms in the modern study of poetics, ‘prosody’ derives from a Greek word of much wider application (prosōdía, ‘song; tone’). In Modern English, ‘prosody’ additionally designates a branch of linguistics concerned with the intonational and rhythmical patterning of speech.

The multiple meanings of ‘prosody’ hint at the historical perplexities of the term. One major difficulty is the qualitative difference between prosodic theory and practice—often itself a historical difference. In English literature, for example, the practice of meter predates metrical theory by 900 years. Between the composition of the Old English poem Cædmon’s Hymn (late seventh century) and the publication of George Gascoigne’s Certayne Notes of Instruction Concerning the Making of Verse or Ryme in English (1575), poets practiced but evidently did not theorize English prosody. (Modern poets’ continuous proselytizing letters, essays, and talks promulgating their prosodic theories has now more than made up for this gap!) Nonetheless, the medieval centuries are notable for metrical experimentation, from twelfth-century forays into syllabic verse to Geoffrey Chaucer’s invention of the French- and Italian-inspired iambic pentameter in the fourteenth century. This experimentation is incomprehensible without situating English in a cross-linguistic context, one that includes, at minimum, French, Italian, Latin, Norse, and Welsh, each with its own complex history.

The study of prosody in the centuries since Gascoigne has presented any number of historical complications, and the present era is no exception. Even as it enjoys a resurgence of interest, spurred by concurrent discoveries in sound studies, cognition, performance, psycholinguistics, and new technologies, verse prosody remains a problematic field. The linguistic turn of the twentieth century, for example, has meant that many prosodists have focused on developing, and refining, metrical theories, i.e., descriptive systems that account for the match or ‘fit’ between the phonological structure of the language and the aesthetic structure of the verse. This approach, originally sponsored not by a linguist but by a literary critic—that “every language has the prosody which it deserves”[1]—has certainly advanced a fundamental understanding of technique, but it has done so at significant cost: the assumption of verse’s artificiality as a transparent stylization of natural language, with an attendant, and surprising, lack of curiosity about the historical factors conditioning these outcomes.

Following the linguistic turn, literary scholars have endeavored to describe metrical traditions and to coordinate metrical histories and historical prosodic theories with cultural, intellectual, material, and social histories. Yet what is the status of such description and coordination, given the gap between practice and theory, or between cultural production and cultural analysis? Do early theories of prosody, from Pāṇini to Snorri Sturluson to Gascoigne, clarify the nature of verse or entail new epistemological problems? Do later approaches, from generative metrics to cognitive poetics to historical poetics, represent research progress or just add terminological complication? Can the historical practice of prosody be disentangled from the history of prosodic study—and if not, whence prosody?

Contemporary poets at all levels face an analogous gap between practice and theory: to what extent can the researches of prosodists influence or be of use to poets? What utility could there possibly be, given the outright inaccuracies of meters in most poetics handbooks (here, a reverse historical dilemma: practice may continue to outstrip theory, but theory outstrips primers). Does the textbooks’ persistence in oversimplifying and misrepresenting metrical study only prove the point that the academic pursuit of verse prosody is immaterial to practice?

Prosody thus traverses a set of vexing historical oppositions—between structuralist and poststructuralist, or formalist and historicist, or empirical and theoretical, methodologies; between departments in the twenty-first-century university—especially the languages, linguistics, cognitive sciences, and comparative literature; not to mention between poets and critics, the producers and analysts of prosody. Hoping to move past these artificial divides, this Colloquy brings together work in multiple media across disciplines, all considering reciprocal relationships between prosody and history, variously defined. The goal of the discussion is to inspire the kinds of productive disagreements that can move prosody closer to Donald Wesling’s vision of a unified field: “When literary criticism can complete linguistic metrics, and when it can in turn be completed by being deepened with a cognitive psychology of the reader, and when it can be fully historicized, then we shall have a prosody adequate to the greatness and range of poetry in English.”[2] This Colloquy shows that verse rhythm and aesthetic pleasure always exist in a dialectic relationship with many histories.

[1] George Saintsbury, A History of English Prosody (3 vols.) (London: Macmillan, 1906-10), vol. 1, 371.

[2] Donald Wesling, The Scissors of Meter: Grammetrics and Reading (Ann Arbor: Univ. of Michigan Press, 1996), 22.

Join the Colloquy

My Colloquies are shareables: Curate personal collections of blog posts, book chapters, videos, and journal articles and share them with colleagues, students, and friends.

My Colloquies are open-ended: Develop a Colloquy into a course reader, use a Colloquy as a research guide, or invite participants to join you in a conversation around a Colloquy topic.

My Colloquies are evolving: Once you have created a Colloquy, you can continue adding to it as you browse Arcade.