Canalblog
Editer l'article Suivre ce blog Administration + Créer mon blog
Publicité
Calcaire au jour le jour
23 avril 2017

Doctor Angelicus 32

 

I continue checking the words in my database of church latin texts.

I recall that this database is constituted of more that 300 "unit" texts of about 200K or 300K bytes each. The main part of the database is constituted by the complete works ot Thomas Aquinas

The purpose is to use a program of my own which "recognizes" all the words in the database. "Recognize" means that the program is able to tell the nature of the word: name of a place, name of a person, other names such as titles of works ... and ordinary words of latin language.

In the case of latin words, the program must be able to tell if the word is a noun, an adjective, a verb ... with all precisions about the case in the declension of nouns, the mode, the time, the person of verbs... To shorten the work (I do all that alone) I used probabilistic methods based on ends of words, mainly of verbs. For example, words ending in -ero/-erim, -eris, -erit, -erimus, -eritis, -erint are probably future perfect active indicative or perfect active subjunctive. Of course I add in the dictionary being built the mention that a supplementary check is necessary.

Beginning some time ago with about 50,000 words that the program could not recognize, and treating words by increasing size, the situation is presently the following

  • all words of size up to 8 letters are recognized
  • I am treating words of 9 letters; I just divided these into (a) those for which there seems to be a little problem (misprint, error, greek words) ; these words are 212 in number (b) words that seem to be easy to check in dictionaries: 2111 words
  • remain 3303 words of size 10 letters and more

 

 

 

Affable Calcaire

 

 

Publicité
Publicité
Commentaires
Calcaire au jour le jour
Publicité
Archives
Publicité