Ministry of Prose: making words work for your business

One in a semi-regular series of ponderings, musings and contemplations on the interaction of words and psychology in business.

Computers can't read

January 2015

If you're involved in writing for the web these days, you've probably come across reading score tools. These are websites or apps that will 'read' your content and give you a readability score or reading age, sometimes breaking things down into individual sentences.

Sounds great, doesn't it? Computers have finally reached the stage where they can read and comprehend our writing, and even give us marks for it! Except they haven't. Even Google's internal teams haven't got that far yet, and not for the want of trying.

The algorithms behind these reading engines are basic, to put it mildly. Here's a description from one such site: "Coleman Liau and ARI rely on counting characters, words and sentence. The other indices consider number of syllables and complex words (polysyllabics - with 3 or more syllables) too. Opinions vary on which type are the most accurate. It is more difficult to automate the counting of syllable as the English language does not comply to strict standards!"

So, no attempt to infer context. No means of grouping vocabulary by subject matter. No method of determining the target audience. No actual 'reading' at all, in fact. Just basic syllable processing if you're lucky, letter counting if you're not.

It's plain old statistical analysis dressed up in new clothes. Give me an hour and I'll knock out a 10-line Perl script to do the same thing. Not that these apps claim to do anything else. I'm not blaming their authors. It's the way such tools are used that's the problem.

Context is vital when calculating true readability, and so is the target audience. An example? OK...

Consider this sentence: "The dual-slit diffraction experiment demonstrates the quantum superposition properties of photons." Any reading engine would call that hard. One of them says "This page has an average grade level of about 19. Ooh, that's probably a bit too complicated. Have you thought about using smaller words and shorter sentences?"

Yet to a physicist that sentence is no more complex than "The sun rises in the east." It would be possible for me to rewrite it to make it 'good' for a reading engine, but that would expand it to about five sentences - and any physicist reading it would think I was a moron.

This type of software still has its uses, of course. If your CEO has a habit of using ridiculously long words in an attempt to appear clever, then showing them a reading score may... no, that's not going to help, is it? It'll only encourage them.

Still, if you're writing for a general audience it can be useful to get a rough idea of the level at which your writing is pitched. But a rough idea is all you'll get, since no two readability tools agree with each other.

I ran a 1,500-word article through two different online reading score tools. One of them said it contained several hard-to-read sentences. The other claimed it was suitable for 13-year-olds. Both were wrong. The software was unable to take account of the subject matter and the context of each sentence. So the scores that emerged were well presented, convincingly authoritative and essentially meaningless.

Here's another example: "Consider the word procrastination. It means to put off doing stuff."

A simple algorithm would tag that first sentence as hard to read, since 50 percent of its words contain three or more syllables and the average word length is more than eight letters. One reading engine even gives it a negative score! The second sentence would be scored as easy to read, for similar reasons.

In reality, neither sentence makes sense without the other: they can't be read in isolation, so they can't be scored in isolation. You can't change the 'hard' sentence without making the 'easy' one meaningless.

Then there's the more subtle fact that good writing depends upon sound as much as content. There's a poetry to good prose, a musical art in producing text that reads well. If you start hacking that around because statistical analysis tells you some of your words are too long, out goes readability in its truest sense: the enjoyment of reading. And away go your readers - even the 13-year-olds.

It takes a long time and a lot of thought to craft good writing. It can no more be enhanced in this way than you could improve the Mona Lisa by asking Leonardo to make her smile a bit more emphatic. And yes, hyperbole is part of my craft. Sometimes.

Good writers add spice to the dullest of subjects. They bring their writing alive with carefully-chosen words and phrases that make the conscious and unconscious mind sit up and pay attention. They use cadence and tone, context and punctuation, rhythm and vocabulary, and all of it to Get Inside Your Head (or your customer's).

One day computers will be able to understand that, and maybe even do it themselves. But not yet. Not today.

If enough people start using these basic tools as the primary guide to the quality of their written output, my job will be secure for many years to come. Because computers can't read, so slavishly following their rules leads to prose that no intelligent human would want to.

Alex Cruickshank has been a professional writer since 1994 and has a higher reading age than his PC.

back back to blog index

All content © copyright 2013-onward Ministry of Prose ( reproduction in whole or in part is illegal under international law.

home | blog | about