Differential effect of correct name translation on human and automated judgments of translation acceptability a pilot study
This study proffers two important findings: (1) automated machine translation (MT) evaluation is insensitive to the cognitive gravitas of proper names, contributing to its weak modeling of human judgments of higher quality MT output, and (2) there is a "new" methodology that produces superior measurement of translation acceptability. Twenty Arabic sentences, each with average name density of 3.7 names in 22 words, were translated into English with a research-grade MT system, to produce a 20-output-sentence Control Stimulus Set. Manual correction of 25% of the name translations resulted in an Enhanced Stimulus Set. A Magnitude Estimation (ME) methodology task had each of two teams of five subjects judge Control and Enhanced Sets against human reference translations. As is customary in ME studies, subjects made direct numerical estimations of the magnitude of the stimuli, in this case the degree to which sentences in the Sets conveyed the meaning in the reference sentences. Average estimates for Control and Enhanced Sets were 4.57 and 6.16, respectively, a 34.8% difference. Automated evaluation with the Metric for Evaluation of Translation with Explicit word ORdering (METEOR) produced scores of .446 and .546, a 22% difference. ME detected a differential effect, a finding which suggests that weighting proper name rendering in automated evaluation systems may improve correlations with human judgments on higher quality output.
"This study proffers two important findings: (1) automated machine translation (MT) evaluation is insensitive to the cognitive gravitas of proper names, contributing to its weak modeling of human judgments of higher quality MT output, and (2) there is a "new" methodology that produces superior measurement of translation acceptability. Twenty Arabic sentences, each with average name density of 3.7 names in 22 words, were translated into English with a research-grade MT system, to produce a 20-output-sentence Control Stimulus Set. Manual correction of 25% of the name translations resulted in an Enhanced Stimulus Set. A Magnitude Estimation (ME) methodology task had each of two teams of five subjects judge Control and Enhanced Sets against human reference translations. As is customary in ME studies, subjects made direct numerical estimations of the magnitude of the stimuli, in this case the degree to which sentences in the Sets conveyed the meaning in the reference sentences. Average estimates for Control and Enhanced Sets were 4.57 and 6.16, respectively, a 34.8% difference. Automated evaluation with the Metric for Evaluation of Translation with Explicit word ORdering (METEOR) produced scores of .446 and .546, a 22% difference. ME detected a differential effect, a finding which suggests that weighting proper name rendering in automated evaluation systems may improve correlations with human judgments on higher quality output."@en
This is a placeholder reference for a Topic entity, related to a WorldCat Entity. Over time, these references will be replaced with persistent URIs to VIAF, FAST, WorldCat, and other Linked Data resources.
This is a placeholder reference for a Topic entity, related to a WorldCat Entity. Over time, these references will be replaced with persistent URIs to VIAF, FAST, WorldCat, and other Linked Data resources.
This is a placeholder reference for a Topic entity, related to a WorldCat Entity. Over time, these references will be replaced with persistent URIs to VIAF, FAST, WorldCat, and other Linked Data resources.
This is a placeholder reference for a Topic entity, related to a WorldCat Entity. Over time, these references will be replaced with persistent URIs to VIAF, FAST, WorldCat, and other Linked Data resources.
This is a placeholder reference for a Topic entity, related to a WorldCat Entity. Over time, these references will be replaced with persistent URIs to VIAF, FAST, WorldCat, and other Linked Data resources.
This is a placeholder reference for a Topic entity, related to a WorldCat Entity. Over time, these references will be replaced with persistent URIs to VIAF, FAST, WorldCat, and other Linked Data resources.
This is a placeholder reference for a Topic entity, related to a WorldCat Entity. Over time, these references will be replaced with persistent URIs to VIAF, FAST, WorldCat, and other Linked Data resources.
This is a placeholder reference for a Topic entity, related to a WorldCat Entity. Over time, these references will be replaced with persistent URIs to VIAF, FAST, WorldCat, and other Linked Data resources.
This is a placeholder reference for a Topic entity, related to a WorldCat Entity. Over time, these references will be replaced with persistent URIs to VIAF, FAST, WorldCat, and other Linked Data resources.
This is a placeholder reference for a Topic entity, related to a WorldCat Entity. Over time, these references will be replaced with persistent URIs to VIAF, FAST, WorldCat, and other Linked Data resources.
This is a placeholder reference for a Topic entity, related to a WorldCat Entity. Over time, these references will be replaced with persistent URIs to VIAF, FAST, WorldCat, and other Linked Data resources.
This is a placeholder reference for a Topic entity, related to a WorldCat Entity. Over time, these references will be replaced with persistent URIs to VIAF, FAST, WorldCat, and other Linked Data resources.