...
Annotation Guidelines
for English-Dutch Machine Translation Quality Assessment
version 1.3.3
LT3 Technical Report - LT3 15-03
Arda Tezcan, Lieve Macken, Joke Daems & Laura Van Brussel
arda.tezcan@ugent.be, lieve.macken@ugent.be, joke.daems@ugent.be, laura.vanbrussel@ugent.be
LT3- Language and Translation Technology Team
Department of Translation, Interpreting and Communication
Ghent University
URL: http://www.lt3.ugent.be1
August, 2015
Assessing translation quality is a very complex task, and depending on the goal of the assessment, a different approach is needed. We propose a categorization of most typical translation errors, which are aimed to be used specifically in the machine translation (MT) context.
Categorization
The categories are divided into two main groups: Accuracy and Fluency. Accuracy is concerned with the relationship between source text and target text, whereas Fluency is concerned with the construction of the target text and language. While accuracy errors are visible on source and target text level, fluency errors are visible already on target text level. If we can detect an error by looking at the target text only, we will annotate it as a fluency error. If have to look at the source text to detect an error, we will annotate it as an accuracy error. As a result, accuracy and fluency errors can be annotated for the same text as one type of error can occur together with the other type.
The technical report 2.0
This technical report contains a possible classification of translation problems for the translation of texts from English to Dutch and guidelines on how to annotate these problems. Though tuned to suit the needs of the Dutch and English language, the categorization allows for customization to suit different language-pair needs.
These guidelines detail the annotation process with the brat rapid annotation tool.
Hover over the word 'brat' at the top right hand corner to be able to select 'log in' and use your username and password to log in.
You are now ready to start annotating. Just double-click a word or click and drag to select smaller/larger pieces of text and the tool will give you an overview of the possible categories.
The categories are listed below 'entity type'. After selecting the appropriate (sub)category, select the correct (sub)subcategory from the drop-down menu below 'entity attributes'. Make sure to select a (sub)subcategory and an attribute when possible!
Just click 'ok' when you're done or 'cancel' when you've selected a piece of text that you didn't want to select. To change an annotation, double-click the label above the word. You can change the category, subcategory and notes, you can decide to delete the annotation or you can move the annotation. To move an annotation, first select 'move' and then select the text span where you want the annotation to move to.
! Be careful when changing the category of an annotation: sometimes the tool remembers the first chosen subcategory alongside the new subcategory (even when the first subcategory belongs to a different main category than the second). If this happens, simply delete your annotation and make a new one with the correct subcategory.
You can select a word or span more than once, so it is possible to assign different problem categories to the same word.
Linking spans
When annotating accuracy errors (see below) you will need to link words from source sentence to target sentence. You can think of linking as the process of aligning words in the source sentence to the corresponding translation in the target sentence. You can link annotations by selecting two different annotations of the same category and then you link them together with an arrow. You do this by clicking the first annotation (in source sentence) and dragging your mouse pointer to the second annotation (in target sentence). You'll see an arrow appear which contains the error category on it. The guidelines contain information on when you are allowed or required to insert a link between annotations.
Adding fragments to your annotations
When annotating words which belong to the same specific MT error and which are not adjacent, you can annotate them without including the words in between. You can do this by selecting your first annotation (the first part of your annotation) and clicking on "Add frag.". This will allow you to annotate a second span of words in a different part of the sentence. When you make your second annotation, the two annotation will be linked to each other with a dashed line. General annotation rules↩
If you are not sure whether or not something is an error, consult external sources. You are perfectly allowed to use a dictionary or a search engine to look things up. Some useful sites that you could consult are:
http://www.vandale.be/
http://taaladvies.net/
http://woordenlijst.org
http://www.vrt.be/taal/
If you would like to search for a word in one of these resources, you can do this also by double-clicking the corresponding word in brat, and clicking on the link for the available resources on the "Search" section within the annotation window. You can refer to external sources in the 'notes' section to support your decisions. It's also allowed to look back to previous texts, to check how you annotated the same problem in a different translation.
Step 1: Annotating Accuracy Errors↩
The texts that you are about to annotate are the results of a machine translation task from English to Dutch. To be able to judge the quality of the translations, they will be marked for two important error types: Accuracy and Fluency errors. Accuracy errors are concerned with the relationship between source text and target text, whereas fluency errors are concerned with the target text and language. The goal of the current assignment is to annotate translations for accuracy errors. Fluency errors will be dealt with separately.
Accuracy errors can be described as errors which lead to a target text that does not reflect the same information as the source text. This means that all misinterpretations, contradictions, meaning shifts, additions or deletions are potential errors.
Please remember that problems regarding only the conventions of the Dutch language (where meaning transfer from source to target has been successful) are not the focus of accuracy and should be handled as fluency.
A detailed explanation of each category can be found below the overview. The information consists of the category name in the brat-tool (the color is always blue for accuracy annotations), followed by a definition, important remarks, guidelines for annotation and examples. The words that should be annotated are highlighted and the information after the arrow sign is an example of a possible annotation note.
Accuracy Errors | ||||||||
---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Definition: Target text is not present in the source.
Annotation: Annotate the added target text which is not present in the source. You do not need to annotate any source text and therefore no linking is required either (since the text is not present in source). However, if you cannot separately annotate the target word(s) due to being a part of a compound or a specific phrase, select multiple words or the phrase. In this case annotate the source words that are covered in target annotation and link the source text to target with an arrow.
e.g.:
EN: ... if you wanted stylish car wheels ...NL: als je een stijvolle autowielen wilde ...
-> no mention of "een" in the source text.
e.g.:
EN: ... offered as part of an optional equipment package.NL: aangeboden als onderdeel van een optioneel pakket aangeboden.
-> 'offered' has already been translated as 'aangeboden' in the correct location. The second 'aangeboden' is an addition.
e.g.:
EN: Each quarter the managers scrutinise the composition of the portfolio.NL: Elk kwartaal worden de managers onderzoekt de samenstelling van de portefeuille.
-> a corresponding source text cannot be found for 'worden'.
e.g.:
EN: the postal services should provide the sector with new opportunities.NL: de postdiensten van de sector van nieuwe mogelijkheden moet voorzien
-> a corresponding source text cannot be found for 'van'.
Definition: Source content cannot be found in target.
Annotation: Annotate the source text that is omitted in target text.
e.g.:
EN:When I started Interbrew was ranked 17th in the world.NL: Toen ik begon stond 17e in the wereld.
-> 'Interbrew' is omitted in translation.
e.g.:
EN: infectious diseasesNL: ziektes
-> 'infectious' is omitted in translation.
Definition: The source text is not translated (but was copied to target) when it should have been translated into Dutch.
Annotation: Annotate the source text that is copied to the target. If you cannot separately annotate the source or target word(s) due to being a part of a compound or a specific phrase, select multiple words or the phrase. Annotate the copy in target text. Link annotation in the source text to the annotation in the target text with an arrow.
e.g.:
EN: a multiblanking pressNL: een multiblanking drukpers
-> 'multiblanking' is not translated into Dutch, when it should be.
e.g.:
EN: Arachnophobia is extreme or irrational fear of spiders.NL: Arachnophobia is extreme of irrationele angst voor spinnen.
-> 'Arachnophobia' should have been translated as 'Arachnofobie' but instead the English word is used in target sentence.
e.g.:
EN: broadcast solutionsNL: broadcast-oplossingen
-> 'broadcast' should have been translated into Dutch.
Definition: The source content is unnecessarily translated into target language when it should have been left untranslated.
Annotation: Annotate the source text that is translated but should not have been translated. If you cannot separately annotate the source or target word(s) due to being a part of a compound or a specific phrase, select multiple words or the phrase. Annotate the translation in target text. Link annotation in the source text to the annotation in the target text with an arrow.
e.g.:
EN: President George W. BushNL: President George W. Struik
-> Being a name 'Bush' should not be translated. The name as a whole should be annotated even though only a part of it has been unnecessarily translated.
e.g.:
EN: The World's Local Brewer©NL:Local Brewer van de Wereld©
-> A brand name should not be translated. Please annotate whole words and include any special characters which are a part of the word in your annotation (such as the copyright symbol "©").
Definition Source content has been translated (when it should be translated) but the translation is incorrect.
5.1. Multi-Word Expression (MWE)↩
Definition: The translation is incorrect (and often too literal) because the English sentence contained a multi-word expression such as an idiom, a proverb, a collocation, a compound or a phrasal verb. Idiomatic expressions and proverbs are indicated with a paragraph sign (¶) in van Daele. See an example for the idiom "call it a day" here.
Annotation: Annotate the source multi-word expression that is incorrectly translated. Annotate the corresponding translation in target text. Link annotation in the source text to the annotation in the target text with an arrow.
e.g.:
EN: After this last exercise, we can call it a day.NL: Na deze laatste oefening kunnen we het een dag noemen.
-> 'Call it a day' means to declare the end of a task and should not be literally translated. Please annotate the idioms as a whole to show the error clearly.
e.g.:
EN: To put it bluntly, time is shortNL: Om het botweg te zeggen, de tijd is kort
-> 'time is short' is an idiom which should not be translated literally.
e.g.:
EN: A word of caution!NL: Een woord van voorzichtigheid!
-> "Voorzichtig!"
Definition: The translation represents an incorrect lexical category (Part-of-Speech) of the corresponding source text.
Annotation: Annotate the source text that is translated with wrong part of speech. If you cannot separately annotate the source or target word(s) due to being a part of a compound or a specific phrase, select multiple words or the phrase. Annotate the translation in target text. Link annotation in the source text to the annotation in the target text with an arrow.
e.g.:
EN: SOLFA endevours to improve healthNL: SOLFA inspanningen om de gezondheid te verbeteren
-> 'endevours' is used as a verb not a noun.
e.g.:
EN: We will only use projectors.NL: We zullen alleen projectoren gebruik.
-> 'use' is translated to 'gebruik' as a noun and not as verb.
e.g.:
EN:Company1 develops life critical applications.NL:Company1 ontwikkelt het leven applicaties.
-> 'life' is translated into 'het leven' as noun and not as adjective (additionally, there is a "missing" translation for the word 'critical').
e.g.:
EN:measures only 4cm by 4cm.NL:maatregelen slechts 4cm van 4cm.
-> 'measures' is translated into 'maatregelen' as noun and not as a verb.
e.g.:
EN: Before, when the animals could talk...NL: Voordat, toen de dieren konden spreken...
-> 'Before' is used as an adverb here, not as a conjunction.
5.3. Word Sense Disambiguation (Sense)↩
Definition: The target content refers to a different (and a wrong) sense of the source content.
Annotation: Annotate the source text that is translated with the wrong sense. If you cannot separately annotate the source or target word(s) due to being a part of a compound or a specific phrase, select multiple words or the phrase. Annotate the translation in target text. Link annotation in the source text to the annotation in the target text with an arrow.
5.3.1 Function Word
Determiners, prepositions, auxiliaries, conjunctions, pronouns.
e.g.:
EN: it is on de table. context.NL: het is aan de tafel.
->Should be "op de tafel".
5.3.2 Content Word
Nouns, verbs, adjectives, adverbs.
e.g.:
EN: Climate change aggravates the threats of infectious diseases by further stressing habitatsNL: Klimaatverandering verhoogt het risico op besmettelijke ziektes door meer de nadruk te leggen op habitats
->'stress' can mean 'nadruk leggen op', but here 'onder druk zetten' is meant.
e.g.:
EN: a problematic child deliveryNL: een problematische kind aflevering
-> 'child' and 'delivery' could be translated as 'kind' and 'aflevering' in general. However, in this context this is not a correct interpretation.
e.g.:
EN: neighbours made fires in the garden.NL: buren maakte branden in de tuin.
-> 'fires' can be translated as 'branden' in the right context. However, in this context it should be translated as 'vuur'. Please note that other potential errors are not highlighted in this example and should be annotated in the actual text.
Definition: The translation is incorrect due to the partial translation of a Dutch separable verb.
Annotation: Annotate the source text that is mistranslated. If you cannot separately annotate the source or target word(s) due to being a part of a compound or a specific phrase, select multiple words or the phrase. Annotate the translation in target text. Link annotation in the source text to the annotation in the target text with an arrow.
e.g.:
EN: He returned home with a dog.NL: hij terug naar huis met een hond.
-> Translation of 'returned' is partial as 'terug' instead of 'terugkeren'.
-> Please note that this example also leads to a fluency error of type 'Grammar -> Missing Words'. Please see annotation guidelines for 'Grammar -> Missing Words' error for details.
e.g.:
EN: We take 400 000 tonnes of steel annuallyNL: We nemen 400 000 ton staal per jaar
-> Translation of 'take' is partial as 'nemen' instead of 'nemen ... af (afnemen)'
-> Please note that this example also leads to a fluency error of type 'Grammar -> Missing Words'. Please see annotation guidelines for 'Grammar -> Missing Words' error for details.
e.g.:
EN: to continue to attractNL:te blijven trekken
-> Translation of 'attract' is partial as 'trekken' instead of 'aantrekken'
-> Please note that this example also leads to a fluency error of type 'Grammar -> Lexical Choice'. Please see annotation guidelines for 'Grammar -> Lexical Choice' error for details.
Definition: The meaning of one or more translated words is not related in any way to the meaning of the source word(s) and does not make any sense in the context.
Annotation: Annotate the unrelated word(s) in the target text and their corresponding word(s) in the source. Link annotation in the source text to the annotation in the target text with an arrow.
e.g.:
EN: Their legs had swollen on the long march southNL: Hun benen waren op de lange march zuidelijk gezuiverd
e.g.:
EN: all other attempts to curb his eating had failed and brain surgery was the last resortNL: alle andere pogingen om zijn eetlust te bestrijden waren mislukt en de herschikking was de laatste uitweg
-> Please note that this example also leads to a fluency error of type 'Grammar -> Lexical Choice'. Please see annotation guidelines for 'Grammar -> Lexical Choice' error for details.
Definition: The translation is incorrect but the problem cannot be captured with any of the subcategories above.
Annotation: Annotate the source text that is mistranslated. If you cannot separately annotate the source or target word(s) due to being a part of a compound or a specific phrase, select multiple words or the phrase. Annotate the translation in target text. Link annotation in the source text to the annotation in the target text with an arrow.
e.g.:
EN: I brought the books back to the libraryNL: Ik bracht het boek terug naar de bibliotheek
-> Root 'boek' is correct but singular 'boek' cannot be a translation of 'books', which is plural.
e.g.:
EN: We receive a lot of requestsNL: We kregen veel verzoeken
-> Root of the translation is correct but it carries a past tense whereas 'receive' in source is in present tense.
e.g.:
EN: control of the projectorNL: bediening van de projectie
-> Root of the translation is correct but it carries a different meaning than the source. The translation should be 'projector' or 'projectietoestel' to refer to 'the object which projects'.
e.g.:
EN: This has put extra pressureNL: Dit heeft extra drup zetten
-> Root of the translation is correct but it is in infinitive form whereas it should carry past tense as 'gezeten'.
NL: hij stuurde haar brieven
-> 'letters (brieven)' is a hyponym of 'messages' and there was no mention of 'letters' in the text. 'brieven' cannot be a translation of 'messages'.
e.g.:
EN:they have reached an agreement.NL:ze hebben een overleg bereikt.
-> 'overleg' cannot be a translation of 'agreement'.
e.g.:
EN: cement floors and corrugated metal roofsNL: betonvloeren en golfkarton metalen daken
-> corrugated: (of a material or surface) shaped into a series of parallel ridges and grooves so as to give added rigidity and strength.
-> It has nothing to do with a 'golfkarton'
e.g.:
EN: He paid her 500 euros.NL: Hij betaalde haar 50 euro.
Definition: Mechanical transfer errors which are not related to content transfer.
Annotation: Annotate the source text or punctuation which contains the wrong mechanical transfer (depending on the type of the error). If you cannot separately annotate the source or target word(s) due to being a part of a compound or a specific phrase, select multiple words or the phrase. Annotate the translation in target text. Link annotation in the source text to the annotation in the target text with an arrow.
Definition: Errors related to the transfer of capitalization rules from source to target.
Annotation: Annotate the source text which is subject to the wrong capitalization transfer. Annotate the corresponding text in target. Link annotation in the source text to the annotation in the target text with an arrow.
e.g.:
EN: BBL PublicationNL: bbl-publicatie.
-> Capitalization rules of the source text is not transferred to target text. Please note that the distinction here with the "Grammar > Orthography > Capitalization" is that we need to see the source text to detect the capitalization error. This error is not visible only on the target text level (which are represented as fluency errors).
Definition: Errors related to the transfer of punctuation from source to target.
Annotation: Annotate the source punctuation which is subject to the wrong mechanical transfer. Annotate the corresponding text in target (if the punctuation is not missing in target). Link annotation in the source text to the annotation in the target text with an arrow.
e.g.:
EN: He paid her "500" euros.NL: Hij betaalde haar "500 euro".
-> Quotation mark is transferred to a wrong location in the target sentence.
Definition: Errors related to the transfer of other mechanical aspects.
Annotation: Annotate the source text which is subject to the wrong mechanical transfer. Annotate the corresponding text in target (if available). Link annotation in the source text to the annotation in the target text with an arrow.
Definition: The translation does not match the predefined bilingual terminology requirements.
Annotation Select the words in source text with error. Select the corresponding translation in target text and 'link' source annotation to target annotation with an arrow.
Definition: Errors that are present in the source segment.
Annotation Select the words in source text with error. Select the corresponding translation in target text and 'link' source annotation to target annotation with an arrow.
Be careful! Marking source errors and the corresponding translation does not mean that other observed errors should be skipped. Source errors often lead to errors in the target word as well, which should be annotated separately.
e.g.:
EN: ... the acquisition s routeNL: ... de overname s route
-> Spelling error for the source word 'acquisitions'. This source error also causes a 'Orthography > Spelling' error of the target word, which should be annotated separately.
e.g.:
EN: This manœuvre illustrates ...NL: ... de manœuvre illustreert ...
-> HTML entity in source instead of the corresponding character it represents. This source error also causes a 'Lexicon > Non-Existing or Foreign Word' error, which should be annotated separately.
9. Other Accuracy errors (ACC)↩
Definition: Other errors regarding the relationship between source and target text, which do not belong to any of the accuracy error categories above.
Annotation Select the words in source text with error. Select the corresponding translation in target text and 'link' source annotation to target annotation with an arrow.
Step 2: Annotating Fluency Errors↩
The goal of the current assignment is to annotate fluency errors. Accuracy have been dealt with in the previous section.
Fluency errors can be described as errors which relate to the construction of the target language. A good translation should read as a native Dutch text. This includes respecting the conventions of the language (grammar, lexicon, orthography).
Please remember that errors of translation and meaning transfer (accuracy) are not the focus of fluency error annotations and should be handled as accuracy errors.
Following is an overview of all the (sub)categories for fluency errors and guidelines on how to annotate these issues within the brat-tool.
A detailed explanation of each category can be found below the overview. The information consists of the category name in the brat-tool (the color is always red for fluency annotations), followed by a definition, important remarks, guidelines for annotation and examples. The words that should be annotated are highlighted and the information after the arrow sign is an example of a possible annotation note.
Fluency Errors | ||||
---|---|---|---|---|
|
||||
|
|
|||
|
|
|
||
|
|
|
Definition: Errors regarding the grammatical rules of the Dutch language.
Definition: The syntax of a multi-word expression is wrong even though the individual word choices are correct. The text needs a combination of corrections such as reordering, addition and/or removing function words. Please keep in mind that the same text can also include other problems such as "Lexicon" or "Orthography" but these errors should be annotated independent from the grammar errors.
e.g.:
NL: Company1 bouwde een 600 meter spoorlijn-> should be rephrased as "spoorlijn van 600 meter"
NL: de Company1 Research Industry Gent onderzoeksteam moest eerst onderzoeken ...-> should be rephrased as "het onderzoeksteam van Company1 Research Industry Gent"
NL: Company1 personeelsleden bezochten ook fabrikanten en gebruikers van transformatoren-> should be rephrased as "personeelsleden van Company1 "
NL:om hogere capaciteit generatoren op hogere torens te installeren-> should be rephrased as "generatoren met hogere capaciteit"
Definition: A word(s) is used with incorrect form.
Definition: Two or more words do not agree with respect to gender, number, person.
Annotation: Select the words in target sentence that do not agree. If these words are not adjacent, select adjacent ones first and use "add frag." to connect to the other words that cause the agreement error.
e.g.:
NL: de hoofd-> wrong article
NL: een slimme meisje-> een 'slim meisje'
NL: onze planeet...ondergaan-> 'planeet' is singular, verb should be singular as well
NL: Ze heeft een contact verbod aangevraagd die hem verplicht om...-> should be 'contactverbod' --- 'dat'. The words that disagree are annotated, within the same annotation, as two fragments.
-> You can annotate 'contact verbod' also as a "spelling > compound" error separately.
NL: ik wordt-> should be ik 'word'.
NL: De weinig plaatsen zijn vaak obruikbaar.-> should be 'weinige plaatsen'
NL: Een van de belangrijkere bendeleiders waren net gearresteerd.-> should be 'bendeleiders was'
NL: boeiende Schots accent-> should be 'boeiend' --- 'accent'. The words that disagree are annotated, within the same annotation, as two fragments.
Definition: Other word form errors, which are not agreement errors.
Annotation: Select the words in target sentence that cause word form error. If these words are not adjacent, select adjacent ones first and use "add frag." to connect to the other words that cause this error.
e.g.:
NL: We kunnen het staal dwars gesneden-> 'snijden'
NL: In 2006, milieuvriendelijke energie gedekt ...-> 'dekte' in this sentence.
NL: Bekaert uitbreidingsprogramma in Karawang-> 'Bekaerts'
e.g.:
NL: zodat vorderingen van schijnbare genezing tezien in een geschikte context-> verb form, should be 'gezien worden'
NL: Ontgind-> verb form, should be 'ontgonnen'
Definition: Wrong word order.
Be careful! If the word order is grammatically correct, but another word order would be better, do not annotate it as a word order error!
Annotation: Annotate the words in target sentence where a word order error can be seen. Any word that needs a reordering should be within your annotation. If you would like to annotate words that are not adjacent but belong to the same word error, annotate first set of adjacent words and select "add frag." to add other words to your annotation. Indicate the correct order of words in 'notes' section.
There will be different types of word errors, where some words might need to switch places, some might need to move to a different location in the sentence or other more complex type of errors. Make sure you annotate all relevant words that belong to the same error and please remember to include the text with correct word order in your notes.e.g.:
NL: Ten derde klimaatverandering kan de temperatuur verhogen-> 'kan' and 'klimaatverandering' need to switch places. -> the two words are annotated as two fractions of the same annotation
NL: maar om echt te helpen een bepaalde groep mensen-> maar om 'een bepaalde groep mensen echt te helpen'. The two word sections are annotated as two fractions of the same annotation being word sets that need to switch places.
NL: omdat het merk is minder bekend-> 'minder bekend is'. The two sections are annotated as two fractions of the same annotation as word sets that need to switch places.
1.3.1 Repetitions
Definition: One or more words are unnecessary repeated in the target sentence.
Annotation: Select the words that are repeated in the target text.
e.g.:
NL: Op de oevers hebben de orchideeën op de oevers weer een verschijning gemaakt.NL: Het 55 vierkante meter LED scherm ligt boven het centraal station van Glasgow, op de hoek van Union Street . Het scherm heeft een oppervlakte van 55 meter en is direct zichtbaar op Renfield Street.
1.3.2 Other
Definition: One or more extra words make the target grammatically incorrect.
Be careful An 'extra word' error (fluency) can be caused by an 'addition' error (accuracy). If this is the case annotate both type of errors. Remember that not all 'extra word' errors (fluency) mean that there is an 'addition' error (accuracy) and/or the other way around.
Annotation: Select the words in target text which should not be present.
e.g.:
NL: Ze heeft het aan de politie te verteld.NL: de de hond
-> 'de' will not be marked as 'addition' (transfer) (unless it is repeated in the source sentence too)
Definition: One or more missing words make the target grammatically incorrect.
Be careful: Span Only select 'missing words' if a whole structure (article, constituent, preposition ...) is missing.
Be careful (2): Omissions A 'missing word' (fluency) can be caused by a 'Omission' (transfer). If this is the case annotate both type of errors. Remember that not all 'missing word' errors (transfer) mean that there is an 'Omission' error (transfer) and/or the other way around.
Annotation: Select (only) the preceding word for the correct location of the missing word. As an exception, if the missing word should appear at the beginning of the sentence, select the first word of the sentence and add 'first word' to your comments to indicate that this is an exception. Please remember to include the missing word and the correct target text for your annotation in your notes as well.
Be careful (3): LinksIf a part of a separable verb is missing (normally, you annotated this word as Accuracy > Mistranslation > Partial), create two fluency annotations Fluency > Grammar Syntax > Missing > Function or Content word) and link them. First create an annotation for the word which is also annotated as Accuracy > Mistranslation > Partial). Next create annotation preceding the missing word. Then draw a link from the first annotation to the second annotation. If the missing word is independent from other words in the sentence, annotate only the location of the missing word.
1.4.1 Function Word
Determiners, prepositions, auxiliaries, conjunctions, pronouns
e.g.:
NL: VN-milieuprogramma waarschuwt-> missing article, should be 'het VN-milieuprogramma waarschuwt'. Since the missing word is the first word in the sentence, we mark the first word in the sentence and add the comment 'first word'
NL: Hij begon het verzenden van berichten-> hij begon 'met' het verzenden
NL: Ze abonneerde zich het nieuwe tijdschrift-> ... zich 'op' het nieuwe tijdschrift
e.g.:
EN: We take 400 000 tonnes of steel annuallyNL: We nemen 400 000 ton staal per jaar
-> Translation of 'take' is partial as 'nemen' instead of 'nemen ... af (afnemen)'. Annotate 'nemen' first as the verb which is missing a word. Annotate the location of the missing word afterwards as 'jaar' (preceding word for the correct location) and link the verb to the location with a relationship. The arrow of the relationship should point the location.
1.4.2 Content Word
Nouns, verbs, adjectives, adverbs.
e.g.:
NL: Door middel van Update, hebben we altijd blijven u op de hoogte-> missing verb: "houden"
Definition: Other grammar errors which do not belong to any of the subcategories above.
Annotation Select the words in target sentence where a grammar error is identified.
Definition: Errors regarding the use of the lexicon in the Dutch language.
2.1. Non-existing or Foreign Word↩
Definition: The word(s) is not a part of the Dutch lexicon or is a foreign word. This error often occurs when the source word(s) is not translated into Dutch. On the other hand the MT system can also generate that does not belong to either source or target language.
Annotation: Select the whole word in target sentence.
e.g.:
NL: anamorphic lens-> "anamorphic" does not belong to the Dutch lexicon.
e.g.:
NL:Hij trachtte de pil door te sliken ...-> "sliken" does not belong to the Dutch lexicon. It should probably be "slikken".
NL:Post oorlog Singapore...-> Even though the individual words exist in the Dutch lexicon as 'post' and 'oorlog', 'post oorlog' as a multi-word expression does not. "Naoorlogse"? We should check the source to annotate potential "Mistranslation" errors independently.
Definition: The word(s) is a part of the Dutch lexicon but another word(s) should be used for generating a correct Dutch sentence.
Annotation: Select the whole word in target sentence.
Determiners, prepositions, auxiliaries, conjunctions, pronouns.
e.g.:
NL: ... op de Japanse markt-> "op" is a wrong lexical choice in this case.
e.g.:
NL: slechts 4 cm van 4 cm-> "van" is a wrong lexical choice in this case. "op" should be used here.
NL: In het begine waren we als immigranten.-> "als" is a wrong lexical choice here. "zoals" should be used instead.
NL: Inmiddels hebben een aantal fabrikanten al overgestapt als immigranten.-> "hebben" is a wrong lexical choice here. "zijn" should be used instead.
Nouns, verbs, adjectives, adverbs.
e.g.:
NL: profielen voor de gele goederen industrie-> "grondverzetmachines"
NL: het nieuws was in de druk-> "pers"? "drukpers"? "druk" is a wrong lexical choice.
NL: de gondel huisvesting van de generator-> "huisvesting" is a wrong lexical choice in this context.
NL: een snijkant show-> We don't use "snijkant" to refer to a show. "Allermodernste"? We should check the source to annotate potential "Mistranslation" errors separately.
Definition: Errors according to the methodology of writing Dutch language.
Be careful! If there is more than one type of orthography errors in one word, select the word twice: once for each type. For example, when a compound is split up and spelled with a capital letter, you select the word once for 'capitalization' and once for 'compound'.
Definition: Errors related to spelling.
Annotation: Select the whole word in target sentence.
Definition: Errors related to spelling of compounds.
Be careful! When a punctuation mark is causing a compound error, annotate it as 'Spelling & Capitalization -> compound' and not as 'Punctuation'.
e.g.:
NL: anti-semitische-> antisemitische
NL: groei - en leerlijnen-> no space between first part of a compound and hyphen.
NL: technologie en media bedrijven
-> 'technologie- en mediabedrijven'.
NL: brood-rooster-> Should be 'broodrooster'.
Definition: Errors related to the use of diacritics
Annotation: Select the entire word which contains issue(s) of diacritics.
e.g.:
NL: de voetbalwedstrijd tussen Zuid-Korea en Belgie-> België
Definition: Other spelling error(s) which do not belong to any of the subcategories above.
Annotation: Select the entire word which contains spelling error(s).
-> There shouldn't be a space between "Artois" and the "registered trademark" symbol. Please annotate the smallest number of words which show the error clearly. Include 'Stella'(the whole name of the brand) and the 'registered trademark' symbol in your annotation for this example.
NL: Sedrin Prime Beer ®, Sedrin Pure Draft ® en Sedrin Ice Beer ®-> There shouldn't be a space between the brand names and the 'registered trademark' symbols. Please annotate each error separately and not all three errors as a single annotation.
NL: BBL / ING Renta Fund-> There shouldn't be spaces around the 'slash' character. Please annotate the words affected by this issue as well as a whole.
Definition: Errors related to capitalization in Dutch language. Transfer of capitalization rules from source text is handled separately.
Annotation: Select the whole word in target sentence.
Be careful! Orthography errors are visible on the target language level, without the need for checking the source language. If a capitalization error is caused by wrong transfer of capitalization rules from the source text, then this should be annotated as "Transfer > Mechanical > Capitalization".
e.g.:
NL: afrika-> 'Afrika'
e.g.:
NL: de Joodse bars-> 'joodse'
Definition: Errors related to punctuation.
Annotation: Select the punctuation mark or symbol which is used unnecessarily or is placed incorrectly in the Dutch language. If a punctuation is missing, select the word preceding the correct location of the missing punctuation. As an exception, if the missing punctuation needs to appear at the beginning of the sentence, select the first word of the sentence and add 'first word' in your comments to indicate that this is an exception. Remember to add your comments in 'notes' section to provide the correct text.
Be careful! Orthography errors are visible on the target language level, without the need for checking the source language. If a punctuation error is caused by wrong transfer, then this should be annotated as "Transfer > Mechanical > Punctuation".
e.g.:
NL: Hij zei: Ik heb het niet gedaan."-> missing quotation mark pair.
NL: Verenigde Staten (VS-> missing bracket pair.
NL: Afrikaanse vrouwen, willen ...-> Unnecessary (and wrong) use of comma.
NL: ... zegt hij-> A 'dot' is missing at the end of the sentence. We mark the word preceding the correct location of the punctuation mark.
NL: 'Wonder'middeltjes-> quotation mark should be at the end of the word.
Definition: Other orthography errors which do not fall under any of the subcategories above.
Definition A combination of errors which make it difficult to annotate fluency errors separately.
Be careful! Please first try to annotate specific errors if they can be identified. If it becomes difficult to identify specific errors, there is a chain of errors that affect each other and that the structure can be corrected in many different ways or the structure should be completely rephrased , then select 'multiple errors'.
Annotation Select the whole span of words that contain multiple errors.
e.g.:
NL: zodat ze kunnen het opzetten van kleine bedrijven
-> rephrase as "zodat ze kleine bedrijven kunnen opzetten"
-> Transfer errors for the same section should still be annotated separately. For example: 'set-up' > 'het opzetten' as 'Misinterpretation > POS' error.
NL: Sneeuwval vertraagt het drukke verkeer op de snelweg tussen Bratislava en Brno, maar we nog steeds lukt ...-> rephrase as "het lukt ons nog steeds"
NL: Daarom werken we aan een betere dienstverlening door het versterken van onze supply chain.-> rephrase as "onze productieketen te versterken"
Definition Other fluency errors, which do not belong to any of the above fluency error categories.