Character Tags
The characters tag for most European languages looks like this:
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
There is a new recommendation to use "ISO-2022-JP" instead of "ISO-8859-1", but I haven't seen anybody using that yet (Aug. 2006).
The "
charset" tags for HTML 4.0, XML, and XHTML are a little bit different:
- HTML 4.0 use <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- XML use at start of a document or an entity: <?xml version="1.0" encoding="utf-8"?>. XML creates documents, not web pages and is therefore used very much to make web pages available on more than just the Web (Internet). You must therefore include display instructions for the browser (.css or HTML 4.0) to make a web page with XML.
- XHTML use <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> Note the end slash (/).
The Unicode character sets (utf) should make HTML browsers and (XML) processors behave as if they used Unicode internally without transferring documents in Unicode.
Info on character set codes:
Unicode Consortium
Special Characters for Different Languages
Some languages, especially East European, have their own alphabets with very special accented characters. These require special HTML codes (ASCII - ascii) to display correctly. All browsers do not recognize every code, you need to check in those browsers your visitors are using (you need
"Visitor Statistics").
Whenever possible use the "Friendly code". See list below:
When you use any of the "numbered codes", remember these are always of same format, i.e. { - sometimes you must add the numbers sign (#) yourself.
Please note: Russian and Turkish characters are part of the
Unicode characters. You need to declare this in the <HEAD> section of your webpage:
<meta http-equiv="content-type" content="text/html; charset=utf-8">
Don't copy this,
type it exactly as it is.
Example: If you, for instance, write the German name "
Jürgen" without the special code, i.e. like Jurgen, it's not the same name anymore. I don't think you yourself would be happy if someone wrote and misspelled your own name. Would you?
Symbol Codes
When you want to
display symbols, also used in the HTML code, it's recommended you use "
HTML equivalent" code - and when these are not enough, then you use the numbers from the ISO standards list. See below in list of
special characters.
< < (less than)
> > (more than)
" " (quotation mark)
& & (and)
# # (number sign) |
<< « (left angle quote)
>> » (right angle quote
= = (equals)
/ / (forward slash)
\ \ (back slash) |
When you use several text specifying tags together, the sequence is the same as in many
Asian languages - i.e. you go from
less specific to more specific.
Example: "house, big, yellow" or in Computerese:
<Font color="#FFF700"><B><u>HELLO !</u></b></Font>
which gives:
HELLO ! Note the end tags are
always in opposite order.
B = Bold
U = Underlined
Font = Character
The code is
always in English with
American spelling (i.e.
"color" = American spelling, "colour" = British spelling). The #-sign indicates to the browser to use the
hexadecimal code for that colour. You can, of course, write the colour name, but it's not recommended.
Language Tags
The language tag looks like this:
<meta http-equiv="Content-Language" content="en">
"en" is for English. If you use one or several
full sentences in an other language, mixed with English, then it's better to include that language. For instance, if both English and German your language tag should be: content="en,de">, where "de" is the code for German. See list of
language codes.
There is a new recommendation from W3C to specify a primary language in the meta tags and then specifying the secondary language / languages in the text where you are using it/them. If you are using XML, XHTML, or CSS you can get more
specific details on how to do.