I have developed the translation memory software a little further as part of my
TaklowKernewek tools.
It now has a GUI:
![](https://3.bp.blogspot.com/-MOZ2IV5jvXQ/V7d7Lr8-ZmI/AAAAAAAA29E/qCJVsPwChfAq_T0-Ehaa97K28tG5rXhdwCLcB/s640/kovtreylyans_snowdonbelarus.png) |
Using only bigrams and trigrams from the corpus that contain at least one non stopword (based on NLTK stopwords corpus). |
![](https://4.bp.blogspot.com/-6QSD8RPdgnI/V7d7MIjcQ0I/AAAAAAAA29I/2lv6Xk8UCtIj1InidXGzNZK2zbvyUnLLACLcB/s640/kovtreylyans_snowdonbelarus2.png) |
Showing all bigrams and trigrams outputs a long list of sentences containing ('is', 'the'). |
![](https://4.bp.blogspot.com/-l7CW4GLgtOw/V7eAPvJqAhI/AAAAAAAA29Y/YGJa0MekQ741Caj9HGeox9BZbkZOUj2VQCLcB/s640/kovtreylyans_traintruro.png) |
Sentences in the corpus that contain multiple trigrams in common with the input are ranked highest, and similarly with bigrams. |
After improvement to the text wrapping of the output sentences to split longer lines: