Hyphenation on the Web

Nobody ever cared about hyphenated text since information moved from printed media like newspapers and books to screen which is now in front of you. Typography is same everywhere, at least it has to be same, but some of the typographic elements do not work on screen, like hyphenation. In this article I'll try to cover importance of hyphenation and some of the methods to cure this problem on the web. If you look close to old printed books and newspapers you'll notice that columns are perfectly justified to a page margins while remaining even spacing between words. Now look at any website with newspaper style columns — New Your Times would be a perfect example — you'll see that lines are aligned to left margin, right side tho is left "as is". It is perfectly readable, but image how much "analog" and neat would it be with hyphenated and justified text, just as you see in your favorite newspaper.

Justifying Text

Yes, there is a CSS style which will justify text for you automatically, but without hyphenation you'll end up with ugly typesettings and huge spaces between long words, especially in slightly narrow columns. Personally I think that using text-align: justified; is a mortal sin without proper hyphenation. It makes text unreadable and utterly ugly to look at. As a result, word hyphenation has to be done by a browser automatically.

Word Hy-phen-a-tion by Com-put-er

This is a complex issue even for printing giant, but there is always a human being who knows aspects of language and knows how to hyphenate word correctly. Second great thing in printed text is that it's static — word won't change places as someone resizes book and newspaper column and there is no need to re-think hyphenation. On the screen tho — it's possible to resize window with text you are reading on, so you simply cannot write text with hyphen signs in it because you simply do not know which word will appear at the end of line, and even if you'll use static width columns — no browser will render it exactly as the other one will. We've finally reached the moment when hyphenation has to be done by a computer, in our case — by a browser, which is a very complex task because it mainly depends on a language. For example: English words are shorter than German ones, some languages like Thai have no spaces between words and so on. But fortunately there is pattern based algorithm for finding 90% of hyphenation point in the words. But even here are some problems too, like word "recover", which hyphenated could become "re-cover".

Hyphenation in Today's Web

Even tho today's browsers cannot hyphenate text themself, they support some hyphenating "tricks" and "hacks" like a ­ placeholder, which can be inserted into words by editor where hyphenation might occur. For example this

anti­dis­est­ab­lish­ment­arian­ism
in your browser will look like anti­dis­est­ab­lish­ment­arian­ism anti­dis­est­ab­lish­ment­arian­ism anti­dis­est­ab­lish­ment­arian­ism. As it appears browsers are smart enough to detect on which placeholder line ended and inserted there hyphen for us, but let's be honest here, no-one except for lunatics, will insert those strange symbols into words by hand.

Automatic Hyphenation Using JavaScript

There is a brilliant project hosted on the Google Code called Hyphenator.js. What it does, is simple and very smart. Using Liangs Thesis it detects where the words could be hyphenated and then inserts those ­ symbols into those positions, as a result we have perfectly hyphenated text in our browser even if we resize column width.

How it Works?

1. Download an actual version of Hyphenator.js and copy it to your server (make sure to copy the folder called patterns, too). 2. Prepare your HTML-documents by

  • Encoding them in UTF-8 (not absolutely necessary, but highly recommended)
  • Setting the appropriate lang-attributes (e.g. <html lang="en">).
  • Adding class="hyphenate" to the elements whose text should be hyphenated (children do inherit this setting). Hyphenation can be stopped by adding class="donthyphenate".
3. Include the script by adding the following code to your HTML-document:
<script src="http://yourdomain.com/path/Hyphenator.js" type="text/javascript">
</script>
4. Invoke the script:
<script type="text/javascript">
    Hyphenator.run();
</script>
Done. There are many interesting and useful settings you can change before you invoke the script. See Documentation for more details.

Disadvantages

Even tho Hyphenator.js work beautifully there are some disadvantages using JavaScript web-hyphenation altogether.
  • Copy-paste is real pain in the arse, it works very weird. Consider making a toggle-switch for your hyphenator of choise to make text easy to copy.
  • On Macs dictionary behaves incorrectly, control + command + D tries to find definition in dictionary for the part of the word instead of whole one.
  • It's a hack nevertheless and a not proper way of doing the web hyphenation as it should be done.

The Future

CSS3!.. Well... fully supported CSS3 is a future, not that one in your current browser. Thing is that CSS3 does have a hyphenation style already standardized and confirmed by W3C and it looks like this...
p { hyphenate: auto; /* "none" is another value it can take */ }
... and unfortunately it is not supported by current browsers at all, but it has to be, and sometime it will be. It will do absolutely same thing as a JavaScript hack does but without any of it's disadvantages.

If you enjoy reading you should follow @totocaster on Twitter. In case you have something to add and/or provide feedback—feel free to do so by writing to toto@totocaster.com. For more articles and linked resources subscribe to myRSS feed.