diacritic adj : capable of distinguishing; "students having superior diacritic powers"; "the diacritic elements in culture"- S.F.Nadel [syn: diacritical] n : a mark added to a letter to indicate a special pronunciation [syn: diacritical mark]
- A special mark added to a letter to indicate a different pronunciation, stress, tone, or meaning.
A diacritic or diacritical mark is a small sign added to a letter to alter pronunciation or to distinguish between similar words. The term derives from Greek διακριτικός (diakritikos, "distinguishing"). "Diacritic" is both adjective and noun, whereas "diacritical" is only an adjective. Many diacritical marks are often called accents; e.g. the grave and acute accents are, but the cedilla is not.
A diacritical mark can appear above or below a letter, or in some other position. Its main usage is to change the phonetic value of the letter to which it is added, but it may also be used to modify the pronunciation of a whole word or syllable, like the tone marks of tonal languages, to distinguish between homographs, to make abbreviations, such as the titlo in old Slavic texts, or to change the meaning of a letter, such as denoting numerals in numeral systems like early Greek numerals.
A letter which has been modified by a diacritic may be treated as a new, individual letter, or simply as a letter-diacritic combination, in orthography and collation. This varies from language to language, and in some cases from symbol to symbol within a single language.
Types of diacritic
- accent marks (thus called because the acute, the grave and the circumflex accent were originally used to indicate different types of pitch accents, in the polytonic orthography of Greek)
- ( ), ( . ) dot (Indic anusvara)
- ( ˚ ) ring (Czech kroužek)
- macron or line
- curls above
- curls below
- ( ) colon, used in the International Phonetic Alphabet to mark long vowels.
Some of these marks are sometimes diacritics, but also have other uses: tilde, dot, comma, titlo, apostrophe, bar and colon.
Diacritics specific to non-Latin alphabets
Arabicsee Arabic alphabet
Greeksee Greek diacritics
Hebrewsee Hebrew alphabet
- ( ׳ ) Geresh
Some non-alphabetic scripts also employ symbols that function essentially as diacritics.
- Non-pure abjads (such as Hebrew and Arabic script) and abugidas use diacritics for denoting vowels. Hebrew and Arabic also indicate consonant doubling and change with diacritics; Hebrew and Devanagari use them for foreign sounds. Devanagari and related abugidas also use a diacritical mark called a virama to mark the absence of a vowel. In addition, Devanagari uses the moon-dot chandrabindu ( ँ ).
Alphabetization or collation
Different languages use different rules to put diacritic characters in alphabetical order. French treats letters with diacritical marks the same as the underlying letter for purposes of ordering and dictionaries.
The Scandinavian languages, by contrast, treat the characters with diacritics ä, ö and å as new and separate letters of the alphabet, and sort them after z. Usually ä is sorted as equal to æ (ash) and ö is sorted as equal to ø (o-slash). Also, aa, when used as an alternative spelling to å, is sorted as such. Other letters modified by diacritics are treated as variants of the underlying letter, with the exception that ü is frequently sorted as y.
Languages that treat accented letters as variants of the underlying letter usually alphabetize words with such symbols immediately after similar unmarked words. For instance, in German where two words differ only by an umlaut, the word without it is sorted first in German dictionaries (e.g. schon and then schön, or fallen and then fällen). However, when names are concerned (e.g. in phone books or in author catalogues in libraries), umlauts are often treated as combinations of the vowel with a suffixed e; Austrian phone books now treat characters with umlauts as separate letters (immediately following the underlying vowel).
In Spanish, the grapheme ñ is considered a new letter different from n and collated between n and o, as it denotes a different sound from that of a plain n. But the accented vowels á, é, í, ó, ú are not separated from the unaccented vowels a, e, i, o, u as the acute accent in Spanish only modifies stress within the word, not the sound of a letter.
For a comprehensive list of the collating orders in various languages, see Collating sequence.
Generation with computers
Modern computer technology was developed mostly in the English speaking countries, so data formats, keyboard layouts, etc. were developed with an English bias; a "simple" alphabet without diacritical marks. This has led to fears internationally that the marks and accents may become obsolete to facilitate the worldwide exchange of data. Efforts have been made to create internationalized domain names that further extend the English alphabet, e.g. "pokémon.com".
Depending on the keyboard layout, which differs amongst countries, it is more or less easy to enter letters with diacritics on computers and typewriters. Some have their own keys, some are created by first pressing the key with the diacritic mark followed by the letter to place it on. Such a key is sometimes referred to as a dead key, as it produces no output of its own, but modifies the output of the key pressed after it.
In modern Microsoft Windows operating systems, the keyboard layout US International allows one to type almost all diacritics directly: "+e gives ë, ~+o gives õ, etc. On Apple Macintosh computers, there are keyboard shortcuts for the most common diacritics; Option-e followed by a vowel places an acute accent, Option-u followed by a vowel gives an umlaut, option-c gives a cedilla, etc. Diacritics can be composed in most X Window System keyboard layouts.
On computers it is also a matter of available code pages, whether you can use certain diacritics. Unicode solves this problem by assigning every known character its own code; if this code is known most modern computer systems provide a method to input it. With Unicode it is also possible to combine diacritical marks with most characters.
Languages with letters containing diacriticsThe following languages have letters which contain diacritics.
- Danish and Norwegian uses additional characters like the ae æ, o-slash ø and the a-circle å. These letters are collated after z, in the order æ, ø, å.
- Faroese uses acute accents, digraphs, and other special letters. All are considered separate letters, and have their own place in the alphabet: á, ð, í, ó, ú, ý, æ and ø.
- Icelandic uses acute accents, digraphs, and other special letters. All are considered separate letters, and have their own place in the alphabet: á, ð, é, í, ó, ú, ý, æ, ö and þ.
- Among the Scandinavian languages, Danish and Norwegian have long used ash (æ, actually a ligature) and o-slash (ø), but have more recently incorporated a-ring (å) after Swedish example. Historically the å has developed from a ligature by writing a small a on top of the letter a; if an å character is unavailable, some Scandinavian languages allow the substitution of a doubled a. The Scandinavian languages collate these letters after z, but have different collation standards. Danish and Norwegian both follow the order æ, ø, å.
- Swedish uses characters identical to a-diaeresis (ä) and o-diaeresis (ö) in the place of ash and o-slash in addition to the a-circle (å). Historically the diaresis for the Swedish letters ä and ö, like the German umlaut, has developed from a small gothic e written on top of the letters. These letters are collated after z, in the order å, ä, ö.:* Galician: as in Spanish, the character ñ is a letter and collated between n and o
- Romanian uses a breve on the letter a (ă) to indicate the sound schwa /ə/, as well as a circumflex over the letters a (â) and i (î) for the sound /ɨ/. Romanian also writes a comma below the letters s () and t () to represent the sounds /ʃ/ and /ʦ/, respectively. These characters are collated after their non-diacritic equivalent.
- Spanish: the character ñ is considered a letter, and collated between n and o.:* Bosnian and Croatian have the symbols ć, č, đ, š and ž, which are considered separate letters and are listed as such in dictionaries and other contexts in which words are listed according to alphabetical order. Bosnian and Croatian also have one digraph including a diacritic, dž which is also alphabetised independently, and follows d and precedes đ in the alphabetical order. The Serbian Latin alphabet contains the same letters, but the Serbian Cyrillic alphabet has no diacritics.
- The Czech alphabet contains 27 graphemes (letters) when written without diacritics and 42 graphemes when written including them. Czech uses the acute (á é í ó ú ý), the háček (č ď ě ň ř š ť ž), and for one letter (ů) the ring.
- Polish has the following letters: ą ć ę ł ń ó ś ź ż. These are considered to be separate letters, each of them is placed in alphabet right after its Latin counterpart (i.e. ą between a and b), ź and ż are placed after z in this order.
- The Slovak alphabet uses the acute (á é í ó ú ý ĺ ŕ), caron (č ď ľ ň š ť ž), umlaut (ä) and circumflex accent (ô).
- Slovenian: has the symbols č, š and ž, which are considered separate letters and are listed as such in dictionaries and other contexts in which words are listed according to alphabetical order. :* Latvian has the following letters: ā ē ī ū ŗ ļ ķ ņ ģ š ž č.
- Lithuanian. In general usage, where letters appear with the caron (č, š and ž) they are considered as separate letters from c, s or z and collated separately; letters with the ogonek (ą, ę, į and ų), the macron (ū) and the superdot (ė) are considered as separate letters as well, but not given a unique collation order.:* Estonian has a distinct letter õ which contains a tilde. Estonian "dotted vowels" ä, ö, ü are similar to German, but these are also distinct letters, not like German umlauted letters. All four have their own place in the alphabet, between w and x. Carons in š or ž appear only in foreign proper names and loanwords. Also these are distinct letters, placed in the alphabet between s and t.
- Finnish uses dotted vowels (ä and ö). As in Swedish and Estonian, these are regarded as individual letters, rather than vowel + umlaut combinations (as happens in German). It also uses the characters å, š and ž in foreign names and loanwords. In the Finnish alphabet, å, ä and ö collate as separate letters after z, the others as variants of their base letter.
- Hungarian uses the umlaut, the acute and double acute accent (unique to Hungarian): ö ü, á é í ó ú and ő ű. The acute accent indicates the long form of a vowel (in case of i/í, o/ó, u/ú) while the double acute performs the same function for ö and ü. The acute accent can also indicate a different sound (more open, like in case of a/á, e/é). Both long and short forms of the vowels are listed separately in the Hungarian alphabet but members of the pairs a/á, e/é, i/í, o/ó, ö/ő, u/ú and ü/ű are collated in dictionaries as the same letter.
- Livonian has the following letters: ā, ä, , , ē, ī, ļ, ņ, ō, , , õ, , ŗ, š, , ū, ž.:* Azerbaijani includes the distinct Turkish alphabet letters Ç, Ğ, I, İ, Ö, Ş and Ü.
- Crimean Tatar includes the distinct Turkish alphabet letters Ç, Ğ, I, İ, Ö, Ş and Ü. Unlike Standard Turkish (but like Cypriot Turkish), Crimean Tatar also has the letter Ñ.
- Gagauz includes the distinct Turkish alphabet letters Ç, Ğ, I, İ, Ö, Ş and Ü. Unlike Turkish, Gagauz also has the letters Ä, Ê and Ţ. Ţ is derived from the Romanian alphabet for the same sound.
- Turkish uses a G with a breve (Ğ), two letters with a diaeresis (Ö and Ü, representing two rounded front vowels), two letters with a cedilla (Ç and Ş, representing the affricate /tʃ/ and the fricative /ʃ/), and also possesses a dotted capital İ (and a dotless lowercase ı representing a high unrounded back vowel). In Turkish each of these are separate letters, rather than versions of other letters, where dotted capital İ and lower case i are the same letter, as are dotless capital I and lowercase ı. Typographically, Ç and Ş are often rendered with a subdot, as in ; when a hook is used, it tends to have more a comma shape than the usual cedilla. The new Azerbaijani, Crimean Tatar, and Gagauz alphabets are based on the Turkish alphabet and its same diacriticized letters, with some additions.:*Albanian has two special letters Ç and Ё upper and lowercase. They are placed next to the most similar letters in the alphabet, c and e correspondingly.
- Esperanto has the symbols ŭ, ĉ, ĝ, ĥ, ĵ and ŝ, which are included in the alphabet, and considered separate letters.
- Hawaiian uses the kahakô or macron over vowels, although there is some disagreement over considering them as individual letters. The kahakô over a vowel can completely change the meaning of a word that is spelled the same but without the kahakô.
- Maltese uses a C, G, and Z with a dot over them (Ċ, Ġ, Ż), and also has an H with an extra horizontal bar. For uppercase H, the extra bar is written slightly above the usual bar. For lowercase H, the extra bar is written crossing the vertical, like a t, and not touching the lower part (Ħ, ħ). The above characters are considered separate letters. The letter 'c' without a dot has fallen out of use due to redundancy. 'Ċ' is pronounced like the English 'ch' and 'k' is used as a hard c as in 'cat'. The digraph 'għ' (called għajn after the Arabic letter name ʻayn for ع) is considered separate, and sometimes ordered after 'g', whilst in other volumes it is placed between 'n' and 'o' (the Latin letter 'o' originally evolved from the shape of Phoenician ʻayin which was traditionally collated after Phoenician nūn).
- Vietnamese uses the horn diacritic for the letters ơ and ư; the circumflex for the letters â, ê, and ô; the breve for the letter ă; and a bar through the letter đ.
- Belarusian has a letter ў.
- Belarusian, Bulgarian, Russian and Ukrainian have the letter й.
- Belarusian and Russian have the letter ё. In Russian, this letter is usually replaced in print by е, although it has a different pronunciation. Ё is still used in children's books and in handwriting. A minimal pair is все (vse, "all" pl.) and всё (vsio, "everything" n. sg.). In Belarusian, ё is a distinct letter, and replacement by е is a mistake.
- Ukrainian has the letter ï.
- Macedonian has the letters ќ and ѓ.
Languages with diacritics that do not produce new lettersThe following is a list of languages with letter-diacritic combinations that are not considered independent letters.
- Afrikaans uses diaeresis to mark vowels that are pronounced separately and not as one would expect where they occur together, for example voel (to feel) as opposed to voël (bird). The circumflex is used in ê, î, ô and û generally to indicate long close-mid, as opposed to open-mid vowels, for example in the words wêreld (world) and môre (morning, tomorrow). The acute accent is used to add emphasis in the same way as underlining or writing in bold or italics in English, for example Dit is jóú boek (It is your book). The grave accent is used to distinguish between words that are different only in placement of the stress, for example appel (apple) and appèl (appeal) and in a few cases where it makes no difference to the pronunciation but distinguishes between homophones. The two most usual cases of the latter are the in the sayings òf... òf (either... or) and nòg... nòg (neither... nor) to distinguish them from of (or) and nog (again, still).
- Aymara uses a diacritical horn over p, q, t, k, ch.
- Catalan has the following composite characters: à, ç, é, è, í, ï, ó, ò, ú, ü, l·l. The acute and the grave accent indicate stress and vowel height, the cedilla marks the result of a historical palatalization, the diaeresis mark indicates either a hiatus, or that the letter u is pronounced when the graphemes gü, qü are followed by e or i, the interpunct (·) distinguishes the different values of ll/l·l.
- Czech has the following composite characters: á, ď, é, ě, í, ň, ó, ť, ú, ů, ý.
- Dutch uses the diaeresis. For example in ruïne it means that the u and the i are separately pronounced in their usual way, and not in the way that the combination ui is normally pronounced. Thus it works as a separation sign and not as an indication for an alternative version of the i. Diacritics can be used for emphasis (érg koud for very cold) or for disambiguation between a number of words that are spelled the same when context doesn't indicate the correct meaning (één appel = one apple, een appel = an apple; vóórkomen = to occur, voorkómen = to prevent). Grave and acute accents are used on a very small number of words, mostly loanwords. The ç also appears in some loanwords.
- English is one of the few European languages that do not regularly use diacritical marks. Exceptions are unassimilated foreign loanwords, including borrowings from French and increasingly Spanish; however, the diacritic is also often omitted from such words. Loanwords that frequently appear with the diacritic in English include café, résumé (a usage that helps distinguish it from the verb resume, though the former is often miswritten resumé), and naïveté (see List of English words with diacritics). In older practice (and even among some orthographically conservative modern writers) one may see examples such as élite and rôle. English once used the diaeresis more often than not in words such as coöperate and zoölogy, but this practice has become far less common (The New Yorker's house style is one of the few major publications to retain this feature, and various individual writers still use it). The acute and grave accents are occasionally used in poetry and lyrics: the acute to indicate stress overtly where it might be ambiguous (rébel vs. rebél) or nonstandard for metrical reasons (caléndar), the grave to indicate that an ordinarily silent or elided syllable is pronounced (warnèd, parlìament). In certain personal names such as Renée and Zoë, the diacritical marks are included more often than omitted.
- Faroese. Non-Faroese accented letters are not added to the Faroese alphabet. These include é, ö, ü, å and recently also letters like š, ł, and ć.
- French uses the grave accent (accent grave), the acute accent (accent aigu), the circumflex (accent circonflexe), the cedilla (cédille) and the diaeresis (tréma).
- Galician vowels can bear a grave accent (á, é, í, ó, ú) to indicate stress or difference between two otherwise same written words (é, '(he/she) is' vs. e, 'and'), but trema is only used with ï and ü to show diaeresis in pronunciation. Only in foreign words Galician may use of another diacritics as ç (widely used in the Middle Age) ê or à.
- German uses the three so-called umlauts ä, ö and ü. These diacritics indicate vowel changes. For instance the word Ofen /'o:fən/ (English: oven) has the plural Öfen /'ø:fən/ (ovens). The sign originated in a superscript e; a handwritten Sütterlin e resembles two parallel vertical lines, like an umlaut. - Besides, there exists the diacritic ß, the so-called "Es-Zett". In some positions, it is used instead of ss. For example the verb essen /'ɛsn/ (to eat) has the past tense form aß /'aːs/ (ate). The sign originated from ancient German writing as well where there existed two different types of an s. In Switzerland ß is not used and replaced by ss.
- The International Phonetic Alphabet uses diacritic symbols and diacritic letters to indicate phonetic features or secondary articulations.
- Maltese sometimes uses diacritics on some vowels to indicate stress or long vowels, but this is restricted to pronunciation assistance in dictionaries.
- Occitan has the following composite characters: á, à, ç, é, è, í, ï, ó, ò, ú, ü, n·h, s·h. The acute and the grave accent indicate stress and vowel height, the cedilla marks the result of a historical palatalization, the diaeresis mark indicates either a hiatus, or that the letter u is pronounced when the graphemes gü, qü are followed by e or i, the interpunct (·) distinguishes the different values of nh/n·h and sh/s·h.
- Portuguese has the following composite characters: à, á, â, ã, ç, é, ê, í, ó, ô, õ, ú, ü. The acute and the circumflex accent indicate stress and vowel height, the grave accent indicates the crasis, the tilde represents nasalization, and the cedilla marks the result of a historical palatalization. In Brazilian Portuguese, the diaeresis mark indicates that the letter u is pronounced when the graphemes gü, qü are followed by e or i.
- Slovak has the acute (á, é, í, ĺ, ó, ŕ, ú, ý), the caron (č, ď, dž, ľ, ň, š, ť, ž), the circumflex (only above o - ô) and the diaeresis (only above a - ä).
- Spanish uses the acute accent and the diaeresis. The acute is used on a vowel in a stressed syllable in words with irregular stress patterns. It can also be used to "break up" a diphthong as in tío (pronounced /'tio/, rather than /tjo/ as it would be without the accent). Moreover, the acute can be used to distinguish words that otherwise are spelt alike, such as si ("if") and sí ("yes"), and also to distinguish interrogative and exclamative pronouns from homophones with a different grammatical function, such as donde/¿dónde? ("where"/"where?") or como/¿cómo? ("as"/"how?") The diaeresis is used only over u (ü) so that it be pronounced /w/ in the combinations gue and gui (where u is normally silent), for example ambigüedad. In poetry, the diaeresis may be used on i and u as a way to force a hiatus. Very rarely, the "dotted l" may also be found, especially in loanwords, in order to indicate that the letters "ll" should not be pronounced as the Castilian letter "elle," but rather, as the letter "ele," in much the way that a diaeresis might be used to "break" a diphthong in other languages. This is accent mark, however, derives from other Iberian languages.
- Swedish uses the acute accent to show non-standard stress, for example in kafé (café) and resumé (résumé). This occasionally helps resolve ambiguities, such as ide (hibernation) versus idé (idea). In these words, the acute accent is not optional. Some proper names use non-standard diacritics, such as Carolina Klüft and Staël von Holstein. For foreign loanwords the original accents are strongly recommended, unless the word has been infused into the language, in which case they are optional. Hence crème fraîche but ampere.
- Welsh uses the circumflex, diaeresis, acute and grave accents on its seven vowels a, e, i, o, u, w, y. The most common is the circumflex (which it calls to bach, meaning "little roof", or acen grom "crooked accent", or hirnod "long sign") to denote a long vowel, usually to disambiguate it from a similar word with a short vowel. The rarer grave accent has the opposite effect, shortening vowel sounds which would usually be pronounced long. The acute accent and diaeresis are also occasionally used, to denote stress and vowel separation respectively. The w-circumflex and the y-circumflex are among the most commonly accented characters in Welsh, but unusual in languages generally, and were until recently very hard to obtain in word-processed and HTML documents.
Several languages which are not written with the Roman alphabet are transliterated, or romanized, using diacritics. Examples:
- Sanskrit, as well as many of its descendants, like Hindi and Bengali, uses a lossless transliteration system for representing words in the Roman alphabet. This includes several letters with diacritical markings, such as horizontal lines above vowels (ā, ī, ū), dots above and below consonants (ṛ, ḥ, ṃ, ṇ, ṣ, ṭ, ḍ) as well as a few others (ś, ñ).
- Alphabets derived from the Latin
- Latin alphabet
- Collating sequence
- Combining character
- Heavy metal umlaut
- List of English words with diacritics
- List of Latin letters
- List of U.S. cities with diacritics
- Typing accents
- :Category:Latin-derived alphabets
- :Category:Specific letter-diacritic combinations
- Orthographic diacritics and multilingual computing, by J. C. Wells
- Notes on the use of the diacritics, by Markus Lång
- Marked up text to diacritics - online converter
- Diacritics Project — All you need to design a font with correct accents
- Entering International Characters (in Linux, KDE)
- Standard Character Set for Macintosh PDF at Adobe.com
- Keyboard Help — Learn how to create world language accent marks and other diacritics on a computer
- Multipad — A text editor which makes it easier to type diacritics.
diacritic in Tosk Albanian: Diakritisches Zeichen
diacritic in Arabic: فتحة لاتينية
diacritic in Asturian: Diacríticu
diacritic in Min Nan: Phiat-im hû-hō
diacritic in Breton: Sinoù diakritek
diacritic in Catalan: Signe diacrític
diacritic in Czech: Diakritické znaménko
diacritic in Danish: Accenttegn
diacritic in German: Diakritisches Zeichen
diacritic in Spanish: Signo diacrítico
diacritic in Esperanto: Diakrita signo
diacritic in French: Diacritique
diacritic in Galician: Diacrítico
diacritic in Korean: 발음 구별 기호
diacritic in Interlingua (International Auxiliary Language Association): Signo diacritic
diacritic in Italian: Segno diacritico
diacritic in Hebrew: סימן דיאקריטי
diacritic in Haitian: Dyakritik
diacritic in Lithuanian: Diakritiniai ženklai
diacritic in Hungarian: Diakritikus jel
diacritic in Dutch: Diakritisch teken
diacritic in Japanese: ダイアクリティカルマーク
diacritic in Norwegian: Diakritisk tegn
diacritic in Norwegian Nynorsk: Aksentteikn
diacritic in Polish: Znaki diakrytyczne
diacritic in Portuguese: Diacrítico
diacritic in Romanian: Semn diacritic
diacritic in Russian: Диакритический знак
diacritic in Slovak: Diakritické znamienko
diacritic in Serbo-Croatian: Dijakritički znak
diacritic in Finnish: Tarke
diacritic in Swedish: Diakritiskt tecken
diacritic in Tagalog: Tuldik
diacritic in Turkish: Diyakritik işaretler
diacritic in Walloon: Diyacritike
diacritic in Chinese: 变音符号