Following up on my earlier post on Nantucket, as well as all the discussion here about Ngrams, I want to offer a few spontaneous speculations based on a new paper that's the talk of at least a couple of neighborhoods in the town. That paper challenges (though is consistent with) what's called Zipf's Law: George Kingsley Zipf's principle that word-length tends to be inversely correlated with frequency of usage.
Steven T. Piantadosi, Harry Tily, and Edward Gibson -- through some very careful considerations of the Ngrams made possible by Google and also by other databases too (NB) -- argue that in fact word lengths are better predicted by their information content. Such a relation between length and information-content ministers to linguistic efficiency: you get a more constant stream of information if word length is thus correlated, that is you get more uniform information density. As they explain: "A constant information rate can make optimal use of the speech channel by maximizing the amount of information conveyed, without exceeding the channel capacity of speech or our cognitive systems. Thus, lexical systems that assign length according to information content can be communicatively more efficient than those that use frequency."
Now they accept that frequency might very well be related with the amount of information conveyed: though they don't mention Markov chains (if you've ever used the Kant or Hegel generator, or if you're a decent Scrabble player, you know how they work: they grade the probability of the next item in a list based on the previous one. Thus in English an initial S is almost never followed by an R, and a Q is almost always followed by a U), they have something similar in mind when they suggest that predictions of a common next word would mean that word would be both very frequent and not very informative.
To come up with an example that I think makes their point: If you hem you very frequently haw. Now no one haws very frequently compared to all the other things we do, but we do haw an awful lot after we've hemmed. So the word haw is a short one despite its low frequency in the wild because of its high frequency after hawing. And if you ask why "haw" in turn is a short word (since we don't actually spend a lot of time hawing), well the answer is actually that you rarely haw without hemming, so that the unit of meaning from the point of view of hawing is actually a trisyllabic one. You may find yourself going to and fro on this, but then consider that to is a very frequent word that contains almost no information, whereas fro is an extremely infrequent word that is nevertheless invariably preceded by to and.
Now naturally my own interest in these matters has to do with what light they might cast on poetic form, in particular on rhyme and meter. Piantadosi et al. suggest that the way words tend to get shortened over time is that they're spoken more quickly and truncated when they don't contain much information. As a fan of za with shrooms I don't think swallowing an idea like that's infra dig either: anyhow I'll take it on spec.
But what I want to try out here is a way that this claim might illuminate some aspects of poetic meter, at least in English. In an English iambic pentameter line, you'll find (I'm back-of-the-enveloping) that the number of syllables appearing in polysyllabic words gravitates around a mode and probably a mean of about four per line (especially if you leave out feminine endings as essentially moments of breathing); put otherwise you'd expect to find about six monosyllabic words per line. I can think of examples to quote easily: "Of man's first disobedience and the fruit...." "How loved, how honored once, avails thee not." "Yet faithful how they stood, their glory withered." "With naked foot stalking in my chamber." "O, there is blessing in this gentle breeze." "A gentle knight was pricking on a plain." "That's my last duchess painted on the wall." "And thee returning with thy silver wheels." Yes, I'm quoting from memory, but the point here is that these lines are memorable. More technically, it seems to be the case that in all languages almost all poetic lines are fewer than ten words long (hence Pope's parodic "And ten low words oft creep in one dull line"), which means that a ten syllable line usually has at least one polysyllable.
Let's stipulate (for argument and in this context) that the bi- and polysyllabics convey more information than the monosyllables, which I think is true: disobedience, honored, avails, faithful, glory, naked, stalking, blessing, gentle, gentle, pricking, duchess, painted, returning, silver. They're not the only important words, but they do look important. Now, the thing about English is that most two-syllable words are trochaic. It's hard to think of an iambic one offhand, if you disallow what are clearly prefixes which almost function as separate words. (When they don't you get words like Almost, since our minds tend to trochaize words, whether this is an overt (!) process or not, over (see?) time. Trochee's a trochee and so is iamb. And dactyl, spondee, and pyrrhic. So we can say or hazard that the more information-bearing words in a line will be trochaic, though each syllable will carry roughly the same amount of information as its fellows.
But in poetry stress matters too. Unstressed syllables tend to be spoken faster than stressed ones (there's some relation between stress and quantity), so that stressed monosyllables should carry more information than unstressed monosyllables. This seems born out by poetic form, since it's almost definitional of what we call rhyme that it begins with and focusses on a stressed syllable. (Wyatt rhymes "appeareth" and "fleeth," as Saintsbury complains, but that's the rule-proving exception, a kind of exception Dickinson will make her stock in trade. But both of them still rely on assonance in the stressed syllables.) So rhyme presents an interesting phenomenon in the context of the information theory at issue here.
I'll return to rhyme in a minute, but pause to say that it's not the only interesting question. Meter is older and more universal, so let's consider meter in English. If most bisyllables are trochaic, why are most lines iambic? Because the monosyllables fill out the lines. They frequently begin lines, as articles or pronouns or conjunctions or prepositions or even stative verbs: "A gentle knight was pricking on a plain"; "A little more than kin and less than kind"; "My mistress' eyes"; "My first thought was, he lied in every word"; "I am so lated in the world that I..."; "The glory and the freshness of a dream"; "That afternoon they came upon a land"; "Is this the region, this the soil, the clime...?" "And frost performs in these what fire in those." A line will naturally tend to start either with an unstressed syllable or with trochaic inversion, which is the most common form of variation in iambic pentameter. (More spaciously, the most common form of the first four syllables of an iambic line is a choriamb, in which a trochee is followed by an iamb: "Hail, holy light, offspring of heav'n first born"; "Whether 'tis nobler in the mind"; "Season of mists and mellow fruitfulness"; "Swift as a spirit, hastening to his task"; "Down to a sunless sea".
All Indo-Eurpoean verse seems to follow the rules of loose onsets-strict endings. "After great pain a formal feeling comes," and after "After great pain" a formal feeling comes with the words "a formal feeling comes." Lines are maximally free in their first feet, and have little freedom (except the choice of a feminine ending) in the last foot or even two feet. Why is this?
I can think of two reasons in the context of poetic information. Poetry (like music) is about the orchestration of different effects with and in counterpoint to each other. Trochaic words overlap iambic feet. But the importance of this overlap increases towards the end of the line as semantic information gives way to metrical information. If we lose the beat, we need it back, and the line has to make sure to give it back to us as it approaches its ending. The end of the line is its most crucial poetic component. (NB: A line ending may be defined as a place where word and foot-ending coincide with near invariable regularity.) At line ending, the lexical and metrical converge, when foot and word-ending correspond. In end-stopped lines, there's grammatical convergence as well, both syntactic and semantical; whereas in enjambment there's a further dimension put into play with and against the others, and we have to wait for the end of the stanza or even of the poem. Free verse, contrariwise, breaks the connection between word- and foot-ending. Anyhow, this convergence allows for the combination of different kinds of information, and produces the same pleasure of sudden economy that Freud saw at work in jokes.
In rhymed poetry the fact that certain rhymes telegraph their resolutions ("chimes" telegraphs the "sure returns of still expected rhymes") would seem to reduce the information carried by what seems to many the defining characteristic of poetry. We know what Nantucket will rhyme with. Why should that rhyme matter?
I have ideas about this, which I hope to explore later, having to do with the kind of willing that a hearer or reader directs towards a literary work. The rhyme is a sort of ratification of that will: it gives a sense that the bowling ball curved true and yielded a strike because of our body English, our active expectation that the meter would settle into a groove and the rhyme would come. In this context, though, all I want to say is that rhyming and meter smooth the information transfer too. We get to rest our interpretive abilities for a moment and coast along on the rhythmical resolution to the line. If the end-words are are expected (whether as actual rhyme words or as rimes, the rhyming part of the word, or even, in blank verse as metrical qualities), we pay all the more attention to the way we get to those words.
Maybe it would be correct to say that the smoothing of information in a poem takes place more or less in the equivalence of lines as bearers of information. Each line describes a little hyperbolic curve, discounted heavily as to meter at the start, discounted heavily as to referential information at the end. The two discount curves (forwards and backwards) produce a fairly stable average across the line, and from line to line. At any rate, this brings me to my second idea, which is that the rhyme (or last word or whatever) acts as a sort of error-checking-bit, you know the 0 through (Roman numeral) X at the end of ISBN numbers that represents a calculation on all the previous numerals and flags any mistake in any particular numeral.
This description, bloodless as it sounds, actually conforms to a lot of our naive experience of reading poetry: rereading the line to give the right metrical value to various syllables, which in turn can redound upon the meanings of words, and vice versa. I think all of what I've said should feel non-controversial, except for my lightly communications-scientific jargon. (Readers of deconstruction, I've maintained for a long time, should learn a little communications theory, which challenges the central deconstructive tenet that meaning is almost infinitely fragile. It's not.) The point is that there are a lot of different and studiously independent parameters of meaning that converge at the end of a poetic period (line or stanza or whole poem) and that convergence governs our understanding of what's come before. Different kinds of information mesh, and they do so not arbitrarily (even if rhymes are arbitrary) but through a kind of declaration that you now have all you need. The rhyme can be monosyllabic because there's really not much left to say except that the rhyme word is nailing the meter and the line, and it's the experience of nailing it that gives stability to the whole fabric of the poem.