Doctor Angelicus 32

23 avril 2017

I continue checking the words in my database of church latin texts.

I recall that this database is constituted of more that 300 "unit" texts of about 200K or 300K bytes each. The main part of the database is constituted by the complete works ot Thomas Aquinas

The purpose is to use a program of my own which "recognizes" all the words in the database. "Recognize" means that the program is able to tell the nature of the word: name of a place, name of a person, other names such as titles of works ... and ordinary words of latin language.

In the case of latin words, the program must be able to tell if the word is a noun, an adjective, a verb ... with all precisions about the case in the declension of nouns, the mode, the time, the person of verbs... To shorten the work (I do all that alone) I used probabilistic methods based on ends of words, mainly of verbs. For example, words ending in -ero/-erim, -eris, -erit, -erimus, -eritis, -erint are probably future perfect active indicative or perfect active subjunctive. Of course I add in the dictionary being built the mention that a supplementary check is necessary.

Beginning some time ago with about 50,000 words that the program could not recognize, and treating words by increasing size, the situation is presently the following

all words of size up to 8 letters are recognized
I am treating words of 9 letters; I just divided these into (a) those for which there seems to be a little problem (misprint, error, greek words) ; these words are 212 in number (b) words that seem to be easy to check in dictionaries: 2111 words
remain 3303 words of size 10 letters and more

Affable Calcaire

Posté par affablecalcaire à 08:34 - Commentaires […] - Permalien [#]

Partager cet article

Vous aimez ?

0 vote

Doctor Angelicus 31

Kansas

Commentaires