Application of Human Language Technologies

"Human Language Technologies" denotes a set of software algorithms, tools and resources for processing texts written in natural languages. These types of activities can be seen as an example of the knowledge digitalization about language; however, because of their specifics and great applicability, we have chosen to separate those activities from other types of digitalization.

Application of human language technologies in biological and biotechnological sciences involves:

Generally speaking, all of these activities are very sophisticated. In biotechnology, ontologies are being used for annotating data such as sequences, genes, experiments etc. Annotated data can then be used in a number of ways: for linking databases, for complex search systems, for knowledge transfer and so on. There are several well developed ontologies in biological and agricultural domain, such as AGROVOC, Gene Ontology, EUROVOC, Plant Ontology and others.

Text mining is one of the most recent text processing developments. It includes techniques for extracting relevant, specific information and knowledge from large text corpora (scientific articles, encyclopedias, and it is especially valuable for medical and biological literature processing.

When it comes to Serbia and Serbian language, the situation is quite different. In our country, only a few researchers are familiar with these technologies. Researchers from other scientific fields who could potentially benefit from the human language technologies are mostly not aware of the benefits and possibilities that these techniques can give them. Moreover, there are not as many resources for Serbian as for other languages such as English. The reasons are economical (compared to English, a smaller number of people speak Serbian) and language-specific (a rich morphological system). Some of the resources developed for Serbian can be found on Human Language Technologies Group web site.