Translation Memory

One form of software assistance for translating texts is what is known as "translation memory". This is a means of storing a bilingual corpus, such that if a text is to be translated, and the same phrases have been translated before, these previous translations can be drawn upon for inspiration. There are various free or proprietry software packages available for this, however I have produced a rudimentary translation memory software for English to Cornish translation, using the Python Natural Language Processing Toolkit.

This works by finding bigrams and trigrams in the input English sentence, and comparing these to those in a bilingual corpus, at present the example sentences in Wella Brown's Cornish textbook Skeul an Yeth 1 which has been made availble online by Kesva an Taves Kernewek.

Example usage

There is the option to either look for all bigrams and trigrams, or only include those that have at least one word that is not found in the NLTK 'stopwords' list of common English words. If the 'all trigrams and bigrams' option is used, a large number of trivial bigrams such as 'in the' may be matched. Sentences with trigram matches are listed first, followed by the bigram matches.

The translation memory program is discussed a little further on my blog: