Automatic text classification for topic recognition and document analysis
Only context gives words their meaning. Based on previous knowledge and their ability of interpretation people can determine semantic context from contents very well and can interpret the actual information correctly. In order to find relevant information, the context sensitive interpretation of content is often crucial. Classic Enterprise Search and Information Retrieval solutions reach their limits here. Text classification is a key technology in order to determine the topics and contexts of documents beyond the word level and to make them usable. In order to determine the topics, it is not the individual words which are used, but rather automatically generated sets of words and multi-word terms.
Thus, for example, keywords for documents can be determined and topic pages can be automatically compiled for publishers and libraries. A document can, for example, be assigned to the topic of "foreign politics" without the word "foreign politics" occurring explicitly in the document text. The basis for the assignment can be terms like "State Department", "Embassies" or also names of foreign affairs politicians.
The quantity of incoming mail and emails that has to be dealt with by employees is very large and constantly growing. The TopicFinder can significantly reduce this time and cost intensive activity by automatically marking or filtering a spam document and forwarding customer requests automatically to the expert or departments that are most appropriate for the topic.
In order to be able to fulfill this task, TopicFinder must be specifically trained for the information needs of the customer. The use of TopicFinder is divided into a training phase and a productive phase. An administrative web application for training and evaluation is available for this. Multi-threading guarantees efficient training, scalable to large data quantities and complex taxonomies.
TopicFinder automatically assigns documents to topics based on single-level or multi-level hierarchical taxonomies. Thus, for example, messages from news tickers can be forwarded, depending on the content, to the sports, politics or economics departments or an indexing of news articles can be executed on the basis of a topical hierarchy.
For a weekly newsletter, technical articles can be filtered on the basis of a predefined newsletter profile and ordered according to relevance. TopicFinder analyzes all articles, sorts them according to significance and groups similar articles or even duplicates together in order to simplify the further processing and keep the flood of information as low as possible.