Table of Contents
Close Close table of contents

Typography

Correct quotation marks per language in html

Apostrophes, quotation-, feet-, inch- and primemarks has been mixed up forever and I guess they will continue to be for a long time. Why? Read Handwriting keyboards and the devaluation of typography and you will see why. The software, browsers, and hardware we are using today might change and improve a bit but it won’t solve the problem we have with faulty micro-typography and that has bothered me for a while now.

So what can be done to automatically improve our everyday typography? Since redesigning the keyboard is … well, not an ideal solution … another way of getting correct quotation marks in a text would to fix it programatically. With my OpenTypography solution I’ve come a bit closer to a solution. But currently only it covers English and a few other languages. So there is a need for a solution for quotation in languages such as French, German, Danish and others that don’t use “ and ”.

Languagebased quotation marks

Here is an initial drafted solution that I’m working on and if it works it will be included in the OpenTypography framework. The steps is pretty straight forward but far from waterproof:

  1. Check what language a text is written in.
  2. Based on the language get quotation rules from a reference list.
  3. Replace incorrect quotation marks with correct.

So I’m doing some tests with the html lang attribute. Basically I’m getting the country/language code from it and then matching that against a list of country codes and it’s corresponding quotation characters.

Lets say the country code attribute of the lang is «fr» we then know the text is in French (or we can at least assume that it will be in French). Once we know that we can get the correct quotation marks and use it in the text.

I could not find an existing list so I took the Unicode cldr one and run some find-replace on it to get a decent json-array with the data I needed, thanks Carl Jeffrey and Carl Morris for helping me out finding it.

The data in the reference list looks like this:

{
Country code : fr,
Language : French,
Standard Primary : «…»,
Standard Secondary : ‹…›,
Style : French,
Alternative Primary : “…”,
Alternative Secondary : ‘…’
}

The good thing that it contains info for standard and alternative quotation character for primary and secondary quotations for each language.

So both those pieces of info is then used in the drafted solution to replace incorrect quotes with correct quotation marks:

Pros

You get pretty precise quotation characters for all the different languages in the world.

Cons

It relies on a specific tag and its attribute, if the lang tag is missing in the document we are pretty much in the dark.