History[edit]Interlinear text in Toussaint–Langenscheidt Spanisch, a Spanish-language textbook for German speakers, 1910
Interlinear glosses have been used for a variety of purposes over a long period of time. One common usage has been to annotate bilingual textbooks for language education. This sort of interlinearization serves to help make the meaning of a source text explicit without attempting to formally model the structural characteristics of the source language.
Such annotations have occasionally been expressed not through interlinear layout, but rather through enumeration of words in the object and meta language. One such example is Wilhelm von Humboldt's annotation of Classical Nahuatl:[1]
1
ni-
1
ich
2
c-
3
mache
3
chihui
2
es
4
-lia
4
für
5
in
5
der
6
no-
6
mein
7
piltzin
7
Sohn
8
ce
8
ein
9
calli
9
Haus
This "inline" style allows examples to be included within the flow of text, and for the word order of the target language to be written in an order which approximates the target language syntax. (In the gloss here, mache es is reordered from the corresponding source order to approximate German syntax more naturally.) Even so, this approach requires the readers to "re-align" the correspondences between source and target forms.
More modern 19th- and 20th-century approaches took to glossing vertically, aligning the same sort of word-by-word content in such a way that the metalanguage terms were placed vertically below the source language terms. In this style, the given example might be rendered thus (here English gloss):
ni-
I
c-
it
chihui
make
-lia
for
in
to-the
no-
my
piltzin
son
ce
a
calli
house
"I made my son a house."
Here word ordering is determined by the syntax of the object language.
Finally, modern linguists have adopted the practice of using abbreviated grammatical category labels. A 2008 publication which repeats this example labels it as follows:[2]
ni-c-chihui-lia
1sg.subj-3sg.obj-mach-appl
in
det
no-piltzin
1sg.poss-Sohn
ce
ein
calli
Haus
This approach is denser and also requires effort to read, but it is less reliant on the grammatical structure of the metalanguage for expressing the semantics of the target forms.
In computing, special text markers are provided in the Specials Unicode block to indicate the start and end of interlinear glosses.
Structure[edit]
Though there is no formal specification for the IGT format, the Leipzig Glossing Rules[3] are a set of guidelines that aim to standardize the format as much as possible.
An interlinear text for linguistics will commonly consist of some or all of the following, usually in this order, from top to bottom:
The original orthography (typically in italic or bold italic),a conventional transliteration into the Latin alphabet,a phonetic transcription,a morphophonemic transliteration,a word-by-word or morpheme-by-morpheme gloss, where morphemes within a word are separated by hyphens or other punctuation,
and finally
a free translation, which may be placed in a separate paragraph or on the facing page if the structures of the languages are too different for it to follow the text line by line.
As an example, the following Taiwanese Minnan clause has been transcribed with five lines of text:
1. the standard pe̍h-ōe-jī transliteration,2. a gloss using tone numbers for the surface tones,3. a gloss showing the underlying tones in citation form (before undergoing tone sandhi),4. a morpheme-by-morpheme gloss in English, and5. an English translation:[4]
(1.)
(2.)
(3.)
(4.)
goá
goa1
goa2
I
iáu-boē
iau1-boe3
iau2-boe7
not-yet
koat-tēng
koat2-teng3
koat4-teng7
decide
tang-sî
tang7-si5
tang1-si5
when
boeh
boeh2
boeh4
want
tńg-khì
tng1-khi3.
tng2-khi3.
return.
(5.) "I have not yet decided when I shall return."
Word-by-word alignment. According to the Leipzig Glossing Rules, it is standard to left-align the words in the object language with the corresponding words in the metalanguage; this alignment can be seen between lines (1-3) and line (4).
Morpheme-by-morpheme correspondence. At the sub-word level, segmentable morphemes are separated by hyphens, both in the example and in the gloss. There should be the same number of hyphens in the example and in the gloss, as shown in the following example:
Gila
now
abur-u-n
they-obl-gen
ferma
farm
hamišaluǧ
forever
güǧüna
behind
amuqʼ-da-č
stay-fut-neg
'Now their farm will not stay behind forever.'
Grammatical category labels. In amuqʼ-da-č, the stem (amuq) is translated into the corresponding English lexeme (stay) while the inflectional affixes (da) and (č) are inflectional affixes representing future tense and negation. These inflectional affixes are glossed as FUT and NEG; a list of standard abbreviations for grammatical categories that are widely used in linguistics can be found in the Leipzig Glossing Rules.
One-to-many correspondences. When a single object-language element corresponds to several metalanguage elements, they are separated by periods.[3] E.g.,
çık-mak
come.out-inf
'to come out'
Non-overt elements. if the morpheme-by-morpheme gloss (middle line) contains an element that does not correspond to an overt element in the example, a standard strategy is to include an overt "ø" in the object-language text,[3] which is separated by a hyphen like an overt element would be:
puer-ø
boy-nom
'boy'
Reduplication is treated similarly to affixation but with a tilde (instead of the standard hyphen) that connects the copied element to the stem:[3]
bi~bili
ipfv~buy
'is buying'
Punctuation[edit]
In interlinear morphological glosses, various forms of punctuation separate the glosses. Typically, the words are aligned with their glosses; within words, a hyphen is used when a boundary is marked in both the text and its gloss, a period when a boundary appears in only one. That is, there should be the same number of words separated with spaces in the text and its gloss, as well as the same number of hyphenated morphemes within a word and its gloss. This is the basic system, and can be applied universally. For example:
Odadan hızlı çıktım.
oda-dan
room-abl
room-from
hız-lı
speed-com
speed-with
çık-tı-m
go.out-pfv-1sg
go_out-perfective-I
Turkish
'I left the room quickly.'
An underscore may be used instead of a period, as in go_out-PFV, when a single word in the source language happens to correspond to a phrase in the glossing language, though a period would still be used for other situations, such as Greek oikíais house.FEM.PL.DAT 'to the houses'.
However, sometimes finer distinctions may be made. For example, clitics may be separated with a double hyphen (or, for ease of typing, an equal sign) rather than a hyphen. A French example:
Je t'aime.
je⹀te⹀aime
I⹀you⹀love
(French)
'I love you.'
Affixes which cause discontinuity (infixes, circumfixes, transfixes, etc.) may be set off by angle brackets, and reduplication with tildes, rather than with hyphens:
sulat, susulat, sumulat, sumusulat (verbal declensions) (Tagalog)
sulat
write
su~sulat
contemplative mood~write
s⟨um⟩ulat
⟨agent trigger.past⟩write
s⟨um⟩u~sulat
⟨agent trigger⟩contemplative~write
(See affix for other examples.)
Morphemes which cannot be easily separated out, such as umlaut, may be marked with a backslash rather than a period:
unser-n
our-dat.pl
Väter-n
father\pl-dat.pl
(German)
'to our fathers' (the singular of Väter 'fathers' is Vater)
A few other conventions which are sometimes seen are illustrated in the Leipzig Glossing Rules.[3]
Interlinear gloss resources[edit]
Efforts have been undertaken to digitize IGT for hundreds of the world's languages.[5]
Online Database of Interlinear Text
History[edit]Interlinear text in Toussaint–Langenscheidt Spanisch, a Spanish-language textbook for German speakers, 1910
Interlinear glosses have been used for a variety of purposes over a long period of time. One common usage has been to annotate bilingual textbooks for language education. This sort of interlinearization serves to help make the meaning of a source text explicit without attempting to formally model the structural characteristics of the source language.
Such annotations have occasionally been expressed not through interlinear layout, but rather through enumeration of words in the object and meta language. One such example is Wilhelm von Humboldt's annotation of Classical Nahuatl:[1]
1
ni-
1
ich
2
c-
3
mache
3
chihui
2
es
4
-lia
4
für
5
in
5
der
6
no-
6
mein
7
piltzin
7
Sohn
8
ce
8
ein
9
calli
9
Haus
This "inline" style allows examples to be included within the flow of text, and for the word order of the target language to be written in an order which approximates the target language syntax. (In the gloss here, mache es is reordered from the corresponding source order to approximate German syntax more naturally.) Even so, this approach requires the readers to "re-align" the correspondences between source and target forms.
More modern 19th- and 20th-century approaches took to glossing vertically, aligning the same sort of word-by-word content in such a way that the metalanguage terms were placed vertically below the source language terms. In this style, the given example might be rendered thus (here English gloss):
ni-
I
c-
it
chihui
make
-lia
for
in
to-the
no-
my
piltzin
son
ce
a
calli
house
"I made my son a house."
Here word ordering is determined by the syntax of the object language.
Finally, modern linguists have adopted the practice of using abbreviated grammatical category labels. A 2008 publication which repeats this example labels it as follows:[2]
ni-c-chihui-lia
1sg.subj-3sg.obj-mach-appl
in
det
no-piltzin
1sg.poss-Sohn
ce
ein
calli
Haus
This approach is denser and also requires effort to read, but it is less reliant on the grammatical structure of the metalanguage for expressing the semantics of the target forms.
In computing, special text markers are provided in the Specials Unicode block to indicate the start and end of interlinear glosses.
Structure[edit]
Though there is no formal specification for the IGT format, the Leipzig Glossing Rules[3] are a set of guidelines that aim to standardize the format as much as possible.
An interlinear text for linguistics will commonly consist of some or all of the following, usually in this order, from top to bottom:
The original orthography (typically in italic or bold italic),a conventional transliteration into the Latin alphabet,a phonetic transcription,a morphophonemic transliteration,a word-by-word or morpheme-by-morpheme gloss, where morphemes within a word are separated by hyphens or other punctuation,
and finally
a free translation, which may be placed in a separate paragraph or on the facing page if the structures of the languages are too different for it to follow the text line by line.
As an example, the following Taiwanese Minnan clause has been transcribed with five lines of text:
1. the standard pe̍h-ōe-jī transliteration,2. a gloss using tone numbers for the surface tones,3. a gloss showing the underlying tones in citation form (before undergoing tone sandhi),4. a morpheme-by-morpheme gloss in English, and5. an English translation:[4]
(1.)
(2.)
(3.)
(4.)
goá
goa1
goa2
I
iáu-boē
iau1-boe3
iau2-boe7
not-yet
koat-tēng
koat2-teng3
koat4-teng7
decide
tang-sî
tang7-si5
tang1-si5
when
boeh
boeh2
boeh4
want
tńg-khì
tng1-khi3.
tng2-khi3.
return.
(5.) "I have not yet decided when I shall return."
Word-by-word alignment. According to the Leipzig Glossing Rules, it is standard to left-align the words in the object language with the corresponding words in the metalanguage; this alignment can be seen between lines (1-3) and line (4).
Morpheme-by-morpheme correspondence. At the sub-word level, segmentable morphemes are separated by hyphens, both in the example and in the gloss. There should be the same number of hyphens in the example and in the gloss, as shown in the following example:
Gila
now
abur-u-n
they-obl-gen
ferma
farm
hamišaluǧ
forever
güǧüna
behind
amuqʼ-da-č
stay-fut-neg
'Now their farm will not stay behind forever.'
Grammatical category labels. In amuqʼ-da-č, the stem (amuq) is translated into the corresponding English lexeme (stay) while the inflectional affixes (da) and (č) are inflectional affixes representing future tense and negation. These inflectional affixes are glossed as FUT and NEG; a list of standard abbreviations for grammatical categories that are widely used in linguistics can be found in the Leipzig Glossing Rules.
One-to-many correspondences. When a single object-language element corresponds to several metalanguage elements, they are separated by periods.[3] E.g.,
çık-mak
come.out-inf
'to come out'
Non-overt elements. if the morpheme-by-morpheme gloss (middle line) contains an element that does not correspond to an overt element in the example, a standard strategy is to include an overt "ø" in the object-language text,[3] which is separated by a hyphen like an overt element would be:
puer-ø
boy-nom
'boy'
Reduplication is treated similarly to affixation but with a tilde (instead of the standard hyphen) that connects the copied element to the stem:[3]
bi~bili
ipfv~buy
'is buying'
Punctuation[edit]
In interlinear morphological glosses, various forms of punctuation separate the glosses. Typically, the words are aligned with their glosses; within words, a hyphen is used when a boundary is marked in both the text and its gloss, a period when a boundary appears in only one. That is, there should be the same number of words separated with spaces in the text and its gloss, as well as the same number of hyphenated morphemes within a word and its gloss. This is the basic system, and can be applied universally. For example:
Odadan hızlı çıktım.
oda-dan
room-abl
room-from
hız-lı
speed-com
speed-with
çık-tı-m
go.out-pfv-1sg
go_out-perfective-I
Turkish
'I left the room quickly.'
An underscore may be used instead of a period, as in go_out-PFV, when a single word in the source language happens to correspond to a phrase in the glossing language, though a period would still be used for other situations, such as Greek oikíais house.FEM.PL.DAT 'to the houses'.
However, sometimes finer distinctions may be made. For example, clitics may be separated with a double hyphen (or, for ease of typing, an equal sign) rather than a hyphen. A French example:
Je t'aime.
je⹀te⹀aime
I⹀you⹀love
(French)
'I love you.'
Affixes which cause discontinuity (infixes, circumfixes, transfixes, etc.) may be set off by angle brackets, and reduplication with tildes, rather than with hyphens:
sulat, susulat, sumulat, sumusulat (verbal declensions) (Tagalog)
sulat
write
su~sulat
contemplative mood~write
s⟨um⟩ulat
⟨agent trigger.past⟩write
s⟨um⟩u~sulat
⟨agent trigger⟩contemplative~write
(See affix for other examples.)
Morphemes which cannot be easily separated out, such as umlaut, may be marked with a backslash rather than a period:
unser-n
our-dat.pl
Väter-n
father\pl-dat.pl
History[edit]Interlinear text in Toussaint–Langenscheidt Spanisch, a Spanish-language textbook for German speakers, 1910
Interlinear glosses have been used for a variety of purposes over a long period of time. One common usage has been to annotate bilingual textbooks for language education. This sort of interlinearization serves to help make the meaning of a source text explicit without attempting to formally model the structural characteristics of the source language.
Such annotations have occasionally been expressed not through interlinear layout, but rather through enumeration of words in the object and meta language. One such example is Wilhelm von Humboldt's annotation of Classical Nahuatl:[1]
1
ni-
1
ich
2
c-
3
mache
3
chihui
2
es
4
-lia
4
für
5
in
5
der
6
no-
6
mein
7
piltzin
7
Sohn
8
ce
8
ein
9
calli
9
Haus
This "inline" style allows examples to be included within the flow of text, and for the word order of the target language to be written in an order which approximates the target language syntax. (In the gloss here, mache es is reordered from the corresponding source order to approximate German syntax more naturally.) Even so, this approach requires the readers to "re-align" the correspondences between source and target forms.
More modern 19th- and 20th-century approaches took to glossing vertically, aligning the same sort of word-by-word content in such a way that the metalanguage terms were placed vertically below the source language terms. In this style, the given example might be rendered thus (here English gloss):
ni-
I
c-
it
chihui
make
-lia
for
in
to-the
no-
my
piltzin
son
ce
a
calli
house
"I made my son a house."
Here word ordering is determined by the syntax of the object language.
Finally, modern linguists have adopted the practice of using abbreviated grammatical category labels. A 2008 publication which repeats this example labels it as follows:[2]
ni-c-chihui-lia
1sg.subj-3sg.obj-mach-appl
in
det
no-piltzin
1sg.poss-Sohn
ce
ein
calli
Haus
This approach is denser and also requires effort to read, but it is less reliant on the grammatical structure of the metalanguage for expressing the semantics of the target forms.
In computing, special text markers are provided in the Specials Unicode block to indicate the start and end of interlinear glosses.
Structure[edit]
Though there is no formal specification for the IGT format, the Leipzig Glossing Rules[3] are a set of guidelines that aim to standardize the format as much as possible.
An interlinear text for linguistics will commonly consist of some or all of the following, usually in this order, from top to bottom:
The original orthography (typically in italic or bold italic),a conventional transliteration into the Latin alphabet,a phonetic transcription,a morphophonemic transliteration,a word-by-word or morpheme-by-morpheme gloss, where morphemes within a word are separated by hyphens or other punctuation,
and finally
a free translation, which may be placed in a separate paragraph or on the facing page if the structures of the languages are too different for it to follow the text line by line.
As an example, the following Taiwanese Minnan clause has been transcribed with five lines of text:
1. the standard pe̍h-ōe-jī transliteration,2. a gloss using tone numbers for the surface tones,3. a gloss showing the underlying tones in citation form (before undergoing tone sandhi),4. a morpheme-by-morpheme gloss in English, and5. an English translation:[4]
(1.)
(2.)
(3.)
(4.)
goá
goa1
goa2
I
iáu-boē
iau1-boe3
iau2-boe7
not-yet
koat-tēng
koat2-teng3
koat4-teng7
decide
tang-sî
tang7-si5
tang1-si5
when
boeh
boeh2
boeh4
want
tńg-khì
tng1-khi3.
tng2-khi3.
return.
(5.) "I have not yet decided when I shall return."
Word-by-word alignment. According to the Leipzig Glossing Rules, it is standard to left-align the words in the object language with the corresponding words in the metalanguage; this alignment can be seen between lines (1-3) and line (4).
Morpheme-by-morpheme correspondence. At the sub-word level, segmentable morphemes are separated by hyphens, both in the example and in the gloss. There should be the same number of hyphens in the example and in the gloss, as shown in the following example:
Gila
now
abur-u-n
they-obl-gen
ferma
farm
hamišaluǧ
forever
güǧüna
behind
amuqʼ-da-č
stay-fut-neg
'Now their farm will not stay behind forever.'
Grammatical category labels. In amuqʼ-da-č, the stem (amuq) is translated into the corresponding English lexeme (stay) while the inflectional affixes (da) and (č) are inflectional affixes representing future tense and negation. These inflectional affixes are glossed as FUT and NEG; a list of standard abbreviations for grammatical categories that are widely used in linguistics can be found in the Leipzig Glossing Rules.
One-to-many correspondences. When a single object-language element corresponds to several metalanguage elements, they are separated by periods.[3] E.g.,
çık-mak
come.out-inf
'to come out'
Non-overt elements. if the morpheme-by-morpheme gloss (middle line) contains an element that does not correspond to an overt element in the example, a standard strategy is to include an overt "ø" in the object-language text,[3] which is separated by a hyphen like an overt element would be:
puer-ø
boy-nom
'boy'
Reduplication is treated similarly to affixation but with a tilde (instead of the standard hyphen) that connects the copied element to the stem:[3]
bi~bili
ipfv~buy
'is buying'
Punctuation[edit]
In interlinear morphological glosses, various forms of punctuation separate the glosses. Typically, the words are aligned with their glosses; within words, a hyphen is used when a boundary is marked in both the text and its gloss, a period when a boundary appears in only one. That is, there should be the same number of words separated with spaces in the text and its gloss, as well as the same number of hyphenated morphemes within a word and its gloss. This is the basic system, and can be applied universally. For example:
Odadan hızlı çıktım.
oda-dan
room-abl
room-from
hız-lı
speed-com
speed-with
çık-tı-m
go.out-pfv-1sg
go_out-perfective-I
Turkish
'I left the room quickly.'
An underscore may be used instead of a period, as in go_out-PFV, when a single word in the source language happens to correspond to a phrase in the glossing language, though a period would still be used for other situations, such as Greek oikíais house.FEM.PL.DAT 'to the houses'.
However, sometimes finer distinctions may be made. For example, clitics may be separated with a double hyphen (or, for ease of typing, an equal sign) rather than a hyphen. A French example:
Je t'aime.
je⹀te⹀aime
I⹀you⹀love
(French)
'I love you.'
Affixes which cause discontinuity (infixes, circumfixes, transfixes, etc.) may be set off by angle brackets, and reduplication with tildes, rather than with hyphens:
sulat, susulat, sumulat, sumusulat (verbal declensions) (Tagalog)
sulat
write
su~sulat
contemplative mood~write
s⟨um⟩ulat
⟨agent trigger.past⟩write
s⟨um⟩u~sulat
⟨agent trigger⟩contemplative~write
(See affix for other examples.)
Morphemes which cannot be easily separated out, such as umlaut, may be marked with a backslash rather than a period:
unser-n
our-dat.pl
Väter-n
father\pl-dat.pl
(German)
'to our fathers' (the singular of Väter 'fathers' is Vater)
A few other conventions which are sometimes seen are illustrated in the Leipzig Glossing Rules.[3]
Interlinear gloss resources[edit]
Efforts have been undertaken to digitize IGT for hundreds of the world's languages.[5]
Online Database of Interlinear Text
(German)
'to our fathers' (the singular of Väter 'fathers' is Vater)
A few other conventions which are sometimes seen are illustrated in the Leipzig Glossing Rules.[3]
Interlinear gloss resources[edit]
Efforts have been undertaken to digitize IGT for hundreds of the world's languages.[5]
Online Database of Interlinear Text
History[edit]Interlinear text in Toussaint–Langenscheidt Spanisch, a Spanish-language textbook for German speakers, 1910
Interlinear glosses have been used for a variety of purposes over a long period of time. One common usage has been to annotate bilingual textbooks for language education. This sort of interlinearization serves to help make the meaning of a source text explicit without attempting to formally model the structural characteristics of the source language.
Such annotations have occasionally been expressed not through interlinear layout, but rather through enumeration of words in the object and meta language. One such example is Wilhelm von Humboldt's annotation of Classical Nahuatl:[1]
1
ni-
1
ich
2
c-
3
mache
3
chihui
2
es
4
-lia
4
für
5
in
5
der
6
no-
6
mein
7
piltzin
7
Sohn
8
ce
8
ein
9
calli
9
Haus
This "inline" style allows examples to be included within the flow of text, and for the word order of the target language to be written in an order which approximates the target language syntax. (In the gloss here, mache es is reordered from the corresponding source order to approximate German syntax more naturally.) Even so, this approach requires the readers to "re-align" the correspondences between source and target forms.
More modern 19th- and 20th-century approaches took to glossing vertically, aligning the same sort of word-by-word content in such a way that the metalanguage terms were placed vertically below the source language terms. In this style, the given example might be rendered thus (here English gloss):
ni-
I
c-
it
chihui
make
-lia
for
in
to-the
no-
my
piltzin
son
ce
a
calli
house
"I made my son a house."
Here word ordering is determined by the syntax of the object language.
Finally, modern linguists have adopted the practice of using abbreviated grammatical category labels. A 2008 publication which repeats this example labels it as follows:[2]
ni-c-chihui-lia
1sg.subj-3sg.obj-mach-appl
in
det
no-piltzin
1sg.poss-Sohn
ce
ein
calli
Haus
This approach is denser and also requires effort to read, but it is less reliant on the grammatical structure of the metalanguage for expressing the semantics of the target forms.
In computing, special text markers are provided in the Specials Unicode block to indicate the start and end of interlinear glosses.
Structure[edit]
Though there is no formal specification for the IGT format, the Leipzig Glossing Rules[3] are a set of guidelines that aim to standardize the format as much as possible.
An interlinear text for linguistics will commonly consist of some or all of the following, usually in this order, from top to bottom:
The original orthography (typically in italic or bold italic),a conventional transliteration into the Latin alphabet,a phonetic transcription,a morphophonemic transliteration,a word-by-word or morpheme-by-morpheme gloss, where morphemes within a word are separated by hyphens or other punctuation,
and finally
a free translation, which may be placed in a separate paragraph or on the facing page if the structures of the languages are too different for it to follow the text line by line.
As an example, the following Taiwanese Minnan clause has been transcribed with five lines of text:
1. the standard pe̍h-ōe-jī transliteration,2. a gloss using tone numbers for the surface tones,3. a gloss showing the underlying tones in citation form (before undergoing tone sandhi),4. a morpheme-by-morpheme gloss in English, and5. an English translation:[4]
(1.)
(2.)
(3.)
(4.)
goá
goa1
goa2
I
iáu-boē
iau1-boe3
iau2-boe7
not-yet
koat-tēng
koat2-teng3
koat4-teng7
decide
tang-sî
tang7-si5
tang1-si5
when
boeh
boeh2
boeh4
want
tńg-khì
tng1-khi3.
tng2-khi3.
return.
(5.) "I have not yet decided when I shall return."
Word-by-word alignment. According to the Leipzig Glossing Rules, it is standard to left-align the words in the object language with the corresponding words in the metalanguage; this alignment can be seen between lines (1-3) and line (4).
Morpheme-by-morpheme correspondence. At the sub-word level, segmentable morphemes are separated by hyphens, both in the example and in the gloss. There should be the same number of hyphens in the example and in the gloss, as shown in the following example:
Gila
now
abur-u-n
they-obl-gen
ferma
farm
hamišaluǧ
forever
güǧüna
behind
amuqʼ-da-č
stay-fut-neg
'Now their farm will not stay behind forever.'
Grammatical category labels. In amuqʼ-da-č, the stem (amuq) is translated into the corresponding English lexeme (stay) while the inflectional affixes (da) and (č) are inflectional affixes representing future tense and negation. These inflectional affixes are glossed as FUT and NEG; a list of standard abbreviations for grammatical categories that are widely used in linguistics can be found in the Leipzig Glossing Rules.
One-to-many correspondences. When a single object-language element corresponds to several metalanguage elements, they are separated by periods.[3] E.g.,
çık-mak
come.out-inf
'to come out'
Non-overt elements. if the morpheme-by-morpheme gloss (middle line) contains an element that does not correspond to an overt element in the example, a standard strategy is to include an overt "ø" in the object-language text,[3] which is separated by a hyphen like an overt element would be:
puer-ø
boy-nom
'boy'
Reduplication is treated similarly to affixation but with a tilde (instead of the standard hyphen) that connects the copied element to the stem:[3]
bi~bili
ipfv~buy
'is buying'
Punctuation[edit]
In interlinear morphological glosses, various forms of punctuation separate the glosses. Typically, the words are aligned with their glosses; within words, a hyphen is used when a boundary is marked in both the text and its gloss, a period when a boundary appears in only one. That is, there should be the same number of words separated with spaces in the text and its gloss, as well as the same number of hyphenated morphemes within a word and its gloss. This is the basic system, and can be applied universally. For example:
Odadan hızlı çıktım.
oda-dan
room-abl
room-from
hız-lı
speed-com
speed-with
çık-tı-m
go.out-pfv-1sg
go_out-perfective-I
Turkish
'I left the room quickly.'
An underscore may be used instead of a period, as in go_out-PFV, when a single word in the source language happens to correspond to a phrase in the glossing language, though a period would still be used for other situations, such as Greek oikíais house.FEM.PL.DAT 'to the houses'.
However, sometimes finer distinctions may be made. For example, clitics may be separated with a double hyphen (or, for ease of typing, an equal sign) rather than a hyphen. A French example:
Je t'aime.
je⹀te⹀aime
I⹀you⹀love
(French)
'I love you.'
Affixes which cause discontinuity (infixes, circumfixes, transfixes, etc.) may be set off by angle brackets, and reduplication with tildes, rather than with hyphens:
sulat, susulat, sumulat, sumusulat (verbal declensions) (Tagalog)
sulat
write
su~sulat
contemplative mood~write
s⟨um⟩ulat
⟨agent trigger.past⟩write
s⟨um⟩u~sulat
⟨agent trigger⟩contemplative~write
(See affix for other examples.)
Morphemes which cannot be easily separated out, such as umlaut, may be marked with a backslash rather than a period:
unser-n
our-dat.pl
Väter-n
father\pl-dat.pl
(German)
'to our fathers' (the singular of Väter 'fathers' is Vater)
A few other conventions which are sometimes seen are illustrated in the Leipzig Glossing Rules.[3]
Interlinear gloss resources[edit]
Efforts have been undertaken to digitize IGT for hundreds of the world's languages.[5]
Online Database of Interlinear Text
SKOIVSICWMCDNVJXNVSNCOLCMSCM DKCSSC
History[edit]Interlinear text in Toussaint–Langenscheidt Spanisch, a Spanish-language textbook for German speakers, 1910
Interlinear glosses have been used for a variety of purposes over a long period of time. One common usage has been to annotate bilingual textbooks for language education. This sort of interlinearization serves to help make the meaning of a source text explicit without attempting to formally model the structural characteristics of the source language.
Such annotations have occasionally been expressed not through interlinear layout, but rather through enumeration of words in the object and meta language. One such example is Wilhelm von Humboldt's annotation of Classical Nahuatl:[1]
1
ni-
1
ich
2
c-
3
mache
3
chihui
2
es
4
-lia
4
für
5
in
5
der
6
no-
6
mein
7
piltzin
7
Sohn
8
ce
8
ein
9
calli
9
Haus
This "inline" style allows examples to be included within the flow of text, and for the word order of the target language to be written in an order which approximates the target language syntax. (In the gloss here, mache es is reordered from the corresponding source order to approximate German syntax more naturally.) Even so, this approach requires the readers to "re-align" the correspondences between source and target forms.
More modern 19th- and 20th-century approaches took to glossing vertically, aligning the same sort of word-by-word content in such a way that the metalanguage terms were placed vertically below the source language terms. In this style, the given example might be rendered thus (here English gloss):
ni-
I
c-
it
chihui
make
-lia
for
in
to-the
no-
my
piltzin
son
ce
a
calli
house
"I made my son a house."
Here word ordering is determined by the syntax of the object language.
Finally, modern linguists have adopted the practice of using abbreviated grammatical category labels. A 2008 publication which repeats this example labels it as follows:[2]
ni-c-chihui-lia
1sg.subj-3sg.obj-mach-appl
in
det
no-piltzin
1sg.poss-Sohn
ce
ein
calli
Haus
This approach is denser and also requires effort to read, but it is less reliant on the grammatical structure of the metalanguage for expressing the semantics of the target forms.
In computing, special text markers are provided in the Specials Unicode block to indicate the start and end of interlinear glosses.
Structure[edit]
Though there is no formal specification for the IGT format, the Leipzig Glossing Rules[3] are a set of guidelines that aim to standardize the format as much as possible.
An interlinear text for linguistics will commonly consist of some or all of the following, usually in this order, from top to bottom:
The original orthography (typically in italic or bold italic),a conventional transliteration into the Latin alphabet,a phonetic transcription,a morphophonemic transliteration,a word-by-word or morpheme-by-morpheme gloss, where morphemes within a word are separated by hyphens or other punctuation,
and finally
a free translation, which may be placed in a separate paragraph or on the facing page if the structures of the languages are too different for it to follow the text line by line.
As an example, the following Taiwanese Minnan clause has been transcribed with five lines of text:
1. the standard pe̍h-ōe-jī transliteration,2. a gloss using tone numbers for the surface tones,3. a gloss showing the underlying tones in citation form (before undergoing tone sandhi),4. a morpheme-by-morpheme gloss in English, and5. an English translation:[4]
(1.)
(2.)
(3.)
(4.)
goá
goa1
goa2
I
iáu-boē
iau1-boe3
iau2-boe7
not-yet
koat-tēng
koat2-teng3
koat4-teng7
decide
tang-sî
tang7-si5
tang1-si5
when
boeh
boeh2
boeh4
want
tńg-khì
tng1-khi3.
tng2-khi3.
return.
(5.) "I have not yet decided when I shall return."
Word-by-word alignment. According to the Leipzig Glossing Rules, it is standard to left-align the words in the object language with the corresponding words in the metalanguage; this alignment can be seen between lines (1-3) and line (4).
Morpheme-by-morpheme correspondence. At the sub-word level, segmentable morphemes are separated by hyphens, both in the example and in the gloss. There should be the same number of hyphens in the example and in the gloss, as shown in the following example:
Gila
now
abur-u-n
they-obl-gen
ferma
farm
hamišaluǧ
forever
güǧüna
behind
amuqʼ-da-č
stay-fut-neg
'Now their farm will not stay behind forever.'
Grammatical category labels. In amuqʼ-da-č, the stem (amuq) is translated into the corresponding English lexeme (stay) while the inflectional affixes (da) and (č) are inflectional affixes representing future tense and negation. These inflectional affixes are glossed as FUT and NEG; a list of standard abbreviations for grammatical categories that are widely used in linguistics can be found in the Leipzig Glossing Rules.
One-to-many correspondences. When a single object-language element corresponds to several metalanguage elements, they are separated by periods.[3] E.g.,
çık-mak
come.out-inf
'to come out'
Non-overt elements. if the morpheme-by-morpheme gloss (middle line) contains an element that does not correspond to an overt element in the example, a standard strategy is to include an overt "ø" in the object-language text,[3] which is separated by a hyphen like an overt element would be:
puer-ø
boy-nom
'boy'
Reduplication is treated similarly to affixation but with a tilde (instead of the standard hyphen) that connects the copied element to the stem:[3]
bi~bili
ipfv~buy
'is buying'
Punctuation[edit]
In interlinear morphological glosses, various forms of punctuation separate the glosses. Typically, the words are aligned with their glosses; within words, a hyphen is used when a boundary is marked in both the text and its gloss, a period when a boundary appears in only one. That is, there should be the same number of words separated with spaces in the text and its gloss, as well as the same number of hyphenated morphemes within a word and its gloss. This is the basic system, and can be applied universally. For example:
Odadan hızlı çıktım.
oda-dan
room-abl
room-from
hız-lı
speed-com
speed-with
çık-tı-m
go.out-pfv-1sg
go_out-perfective-I
Turkish
'I left the room quickly.'
An underscore may be used instead of a period, as in go_out-PFV, when a single word in the source language happens to correspond to a phrase in the glossing language, though a period would still be used for other situations, such as Greek oikíais house.FEM.PL.DAT 'to the houses'.
However, sometimes finer distinctions may be made. For example, clitics may be separated with a double hyphen (or, for ease of typing, an equal sign) rather than a hyphen. A French example:
Je t'aime.
je⹀te⹀aime
I⹀you⹀love
(French)
'I love you.'
Affixes which cause discontinuity (infixes, circumfixes, transfixes, etc.) may be set off by angle brackets, and reduplication with tildes, rather than with hyphens:
sulat, susulat, sumulat, sumusulat (verbal declensions) (Tagalog)
sulat
write
su~sulat
contemplative mood~write
s⟨um⟩ulat
⟨agent trigger.past⟩write
s⟨um⟩u~sulat
⟨agent trigger⟩contemplative~write
(See affix for other examples.)
Morphemes which cannot be easily separated out, such as umlaut, may be marked with a backslash rather than a period:
unser-n
our-dat.pl
Väter-n
father\pl-dat.pl
(German)
'to our fathers' (the singular of Väter 'fathers' is Vater)
A few other conventions which are sometimes seen are illustrated in the Leipzig Glossing Rules.[3]
Interlinear gloss resources[edit]
Efforts have been undertaken to digitize IGT for hundreds of the world's languages.[5]
Online Database of Interlinear Text