NLP Kernewek
For some text to feed into these programs, I recommend the website by Howlsedhes Services offering the historical Cornish language texts in digital plain-text form for download.
I have a version of Gwreans An Bys (The Creation of the World, 1611) that I stripped out most of the extraneous characters like line numbers and comments.
The Python Natural Language Processing Toolkit has a number of methods of corpus analysis, including creating frequency distributions, conditional frequency distributions, lists of co-locations found within a text etc. I have created a script cornish_corpus.py
which runs a few of these analyses on the Cornish texts above, along with two samples of revived Cornish, the Solempnyta short story by Benjamin Bruch and some Lord of the Rings chapters translated by Jerry Jefferies. It is available in my Bitbucket repository.
The below is a selection of the output of the cornish_corpus.py
script. The co-locations and words of high frequency often correlate with the characters of the drama, and the theme of the text.
Text: Improved version Bewnans Meryasek KK version from Stokes...
Collocations:
pur wir; dhy hwi; Comes venetensis; Yesu Arloedh; heb falladow;
Tertius tortor; Secundus tortor; Primus tortor; pub eur; Episcopus
Kernow; Yesu Krist; wosa hemma; Rag kerensa; heb ahwer; kuv kolonn;
deun alemma; pur dhiogel; pub termyn; heb namm; heb wow
number of words = 26815
number of different words = 4664
Lengths of words in descending order of frequency: [(3, 5094), (2, 4813), (4, 3857), (5, 3270), (1, 3078), (6, 2636), (7, 1697), (8, 1180), (9, 612), (10, 385), (11, 115), (12, 57), (13, 18), (14, 2), (18, 1)]
Top 50 words: ['a', 'y', 'n', 'dhe', 'ha', 'yn', 'an', 'ow', 'my', 'yw', 'c', 'ny', 'na', 're', 's', 'dha', 'omma', 'pur', 'ni', 'm', 'rag', 'meryasek', 'ma', 'sur', 'krist', 'yesu', 'bys', 'th', 'hwi', 'mar', 'heb', 'arloedh', 'oll', 'ev', 'vynn', 'gans', 'yma', 'dyw', 'vydh', 'lemmyn', 'vy', 'maria', 'den', 'ty', 'wir', 'dell', 'eus', 'meriadocus', 'dhymm', 'sertan']
Top 50 words of 4 or more letters: ['omma', 'meryasek', 'krist', 'yesu', 'arloedh', 'vynn', 'gans', 'vydh', 'lemmyn', 'maria', 'dell', 'meriadocus', 'dhymm', 'sertan', 'meur', 'dhymmo', 'dhyn', 'dhis', 'finit', 'episcopus', 'agas', 'comes', 'primus', 'secundus', 'nyns', 'yredi', 'orth', 'henna', 'prest', 'syrr', 'agan', 'devri', 'tortor', 'dhywgh', 'nevra', 'gweres', 'alemma', 'hanow', 'bydh', 'bynytha', 'deun', 'dhodho', 'epskop', 'hemma', 'lies', 'descendit', 'dhiso', 'lowena', 'mones', 'aredy']
Text: Osta karer Arloedh An Bysowyer Wel ottomma dha...
Collocations:
Yth esa; yth esa; dhe vos; dhe ves; haval orth; Unn Bysow; medh
Gandalf; Bag End; Nyns eus; dro dhe; leveris Gandalf; wovynnas Frodo;
neb kas; pup prys; medh Frodo; fatell wrug; dann gel; dell dybav;
Parkow Gladen; res dhis
number of words = 11147
number of different words = 1966
Lengths of words in descending order of frequency: [(2, 2309), (3, 2004), (1, 1526), (4, 1442), (5, 1342), (6, 976), (7, 772), (8, 395), (9, 185), (10, 129), (11, 36), (12, 10), (13, 9), (17, 5), (15, 3), (18, 2), (14, 1), (16, 1)]
Top 50 words: ['a', 'an', 'ev', 'y', 'yn', 'ha', 'n', 'dhe', 'hag', 'mes', 'o', 'ow', 'na', 'yw', 'ny', 'frodo', 'bysow', 'esa', 'vy', 'yth', 're', 'my', 'nyns', 'gans', 'wrug', 'dell', 'bos', 'rag', 'i', 'oll', 'gandalf', 'vos', 'bylbo', 'orth', 'po', 'mar', 'termyn', 'henna', 'dre', 'leveris', 'meur', 'dhodho', 'medh', 'aga', 'es', 'pan', 'pur', 'dres', 'ta', 'yma']
Top 50 words of 4 or more letters: ['frodo', 'bysow', 'nyns', 'gans', 'wrug', 'dell', 'gandalf', 'bylbo', 'orth', 'termyn', 'henna', 'leveris', 'meur', 'dhodho', 'medh', 'dres', 'arta', 'kever', 'nerth', 'dhymm', 'diworth', 'golum', 'shayr', 'tewl', 'haval', 'hobytow', 'hwir', 'nebes', 'wosa', 'henn', 'honan', 'lemmyn', 'yndella', 'arall', 'kyns', 'vydh', 'hwath', 'ganso', 'klywes', 'pyth', 'woer', 'drefenn', 'elfow', 'leverel', 'owth', 'ytho', 'dhis', 'nans', 'nevra', 'orto']
Text: THE TREGEAR HOMILIES KK Version made from Christopher...
Collocations: Building collocations list
Folio Homily; keth sam; dhe vos; kepar dell; Spyrys Sans; mab den;
agan Savyour; Homily JHESUS; katholik eglos; pub eur; mar veur; heb
diwedh; vab den; Yesu Krist; dre reson; fatell wrug; agan honan;
Savyour Yesu; Katholik Eglos; res dhyn
number of words = 40897
number of different words = 5246
Lengths of words in descending order of frequency: [(2, 8508), (3, 7334), (1, 5121), (4, 5001), (5, 4461), (6, 3516), (7, 2555), (8, 2112), (9, 1009), (10, 637), (11, 317), (12, 155), (13, 99), (14, 43), (15, 17), (16, 5), (17, 3), (19, 2), (18, 1), (20, 1)]
Top 50 words: ['a', 'ha', 'an', 'n', 'dhe', 'y', 'yn', 'yw', 'ow', 'ni', 'ev', 'ma', 'na', 'rag', 'krist', 'agan', 'wrug', 's', 'oll', 'dre', 'yma', 'eglos', 'dyw', 'gans', 'hag', 'bonner', 'fatell', 'henna', 'et', 'kepar', 'den', 'leverel', 'vos', 'aga', 'yth', 'mar', 'keth', 're', 'honan', 'dell', 'bos', 'i', 'in', 'vydh', 'folio', 'ny', 'o', 'de', 'homily', 'nyns']
Top 50 words of 4 or more letters: ['krist', 'agan', 'wrug', 'eglos', 'gans', 'bonner', 'fatell', 'henna', 'kepar', 'leverel', 'keth', 'honan', 'dell', 'vydh', 'folio', 'homily', 'nyns', 'dhyn', 'dhyw', 'ynwedh', 'korf', 'savyour', 'rakhenna', 'hemma', 'henn', 'dhiworth', 'katholik', 'onan', 'geryow', 'pyth', 'hwath', 'arloedh', 'peder', 'chaptra', 'gwrys', 'omma', 'yndella', 'skryptor', 'lemmyn', 'bobel', 'sans', 'arall', 'dhodho', 'goes', 'leveris', 'lies', 'spyrys', 'agas', 'powl', 'termyn']
treuslytherennaGUI.py
program now provides the facility to customize the output via the GUI.