scispace - formally typeset
Search or ask a question
Author

Igor Ukrainczyk

Bio: Igor Ukrainczyk is an academic researcher. The author has contributed to research in topics: Feature (computer vision). The author has an hindex of 1, co-authored 2 publications receiving 346 citations.

Papers
More filters
Patent
25 May 2001
TL;DR: In this article, a method for automatically classifying text into categories is provided, where a plurality of tokens or features are manually or automatically associated with each category, and a weight is then coupled to each feature, wherein the weight indicates a degree of association between the feature and the category.
Abstract: A method is provided for automatically classifying text into categories. In operation, a plurality of tokens or features are manually or automatically associated with each category. A weight is then coupled to each feature, wherein the weight indicates a degree of association between the feature and the category. Next, a document is parsed into a plurality of unique tokens with associated counts, wherein the counts are indicative of the number of times the feature appears in the document. A category score representative of a sum of products of each feature count in the document times the corresponding feature weight in the category for each document is then computed. Next, the category scores are sorted by perspective, and a document is classified into a particular category, provided the category score exceeds a predetermined threshold.

346 citations

Patent
25 May 2001
TL;DR: In this article, a procedure for classification automatique de texte en categories is presented, in which on associe a caracteristique, a coefficient de ponderation caracterique d'un niveau d'association entre the carcharacteristique and the categorie is computed.
Abstract: La presente invention concerne un procede de classification automatique de texte en categories. A cet effet, on associe manuellement ou automatiquement a chaque categorie une pluralite d'entites lexicales ou de caracteristiques, puis on associe a chaque caracteristique un coefficient de ponderation caracteristique d'un niveau d'association entre la caracteristique et la categorie. On prend ensuite un document et on le ventile en une pluralite d'entites lexicales uniques auxquelles sont associes des denombrements caracteristiques du nombre d'occurrences de la caracteristique dans le document. On calcule alors un resultat par categories representatif d'une somme de produits de chaque denombrement de caracteristiques dans le document multiplie par le coefficient de ponderation correspondant dans la categorie de chaque document. Il ne reste plus qu'a trier par perspective les resultats par categorie et a classifier le document en une categorie particuliere, dans la mesure ou le resultat par categorie depasse un seuil defini.

Cited by
More filters
Proceedings ArticleDOI
23 Oct 2006
TL;DR: New optimization and estimation techniques to address two fundamental problems in machine learning are developed that serve as the basis for the automatic linguistic indexing of pictures - real time (ALIPR) system of fully automatic and high-speed annotation for online pictures.
Abstract: Automated annotation of digital pictures has been a highly challenging problem for computer scientists since the invention of computers. The capability of annotating pictures by computers can lead to breakthroughs in a wide range of applications including Web image search, online picture-sharing communities, and scientific experiments. In our work, by advancing statistical modeling and optimization techniques, we can train computers about hundreds of semantic concepts using example pictures from each concept. The ALIPR (Automatic Linguistic Indexing of Pictures -Real Time)system of fully automatic and high speed annotation for online pictures has been constructed. Thousands of pictures from an Internet photo-sharing site, unrelated to the source of those pictures used in the training process, have been tested. The experimental results show that a single computer processor can suggest annotation terms in real-time and with good accuracy.

504 citations

Patent
16 Oct 2008
TL;DR: In this paper, the authors present an entity recognition and disambiguation system (ERDS) that, based upon input of a text segment, automatically determines which entities are being referred to by the text using both natural language processing techniques and analysis of information gleaned from contextual data in the surrounding text.
Abstract: Methods and systems for entity recognition and disambiguation using natural language processing techniques are provided. Example embodiments provide an entity recognition and disambiguation system (ERDS) and process that, based upon input of a text segment, automatically determines which entities are being referred to by the text using both natural language processing techniques and analysis of information gleaned from contextual data in the surrounding text. In at least some embodiments, supplemental or related information that can be used to assist in the recognition and/or disambiguation process can be retrieved from knowledge repositories such as an ontology knowledge base. In one embodiment, the ERDS comprises a linguistic analysis engine, a knowledge analysis engine, and a disambiguation engine that cooperate to identify candidate entities from a knowledge repository and determine which of the candidates best matches the one or more detected entities in a text segment using context information.

451 citations

Patent
16 Oct 2007
TL;DR: In this paper, a cooperative conversational voice user interface is presented, which builds upon short-term and long-term shared knowledge to generate one or more explicit and/or implicit hypotheses about an intent of a user utterance.
Abstract: A cooperative conversational voice user interface is provided. The cooperative conversational voice user interface may build upon short-term and long-term shared knowledge to generate one or more explicit and/or implicit hypotheses about an intent of a user utterance. The hypotheses may be ranked based on varying degrees of certainty, and an adaptive response may be generated for the user. Responses may be worded based on the degrees of certainty and to frame an appropriate domain for a subsequent utterance. In one implementation, misrecognitions may be tolerated, and conversational course may be corrected based on subsequent utterances and/or responses.

413 citations

Patent
05 Oct 2007
TL;DR: In this article, a method for searching time series data over a network is proposed, which involves employing a computing device to gather at least one stream of time-series data from an information processing environment and arrange the data for searching.
Abstract: A method for searching time series data over a network. The method comprises employing a computing device to gather at least one stream of time series data from an information processing environment and arrange the at least one stream of time series data for searching, wherein the computing device performs actions, including: aggregating data in the at least one time series data stream based on a domain that corresponds to the at least one time series data stream; time stamping the at least one time series data stream to generate at least one time stamped event that includes at least a portion of the aggregated data; segmenting each time stamped event into a plurality of segments; time indexing the time stamped events to create time bucketed indices based on the time stamps and segments; and employing the computing device to search the time bucketed indices based on a received time series data search request.

347 citations

Patent
22 Feb 2010
TL;DR: In this article, a system and method for processing multi-modal device interactions in a natural language voice services environment is presented, in which context relating to the non-voice interaction and the natural language utterance may be extracted and combined to determine an intent of the multidomal device interaction, and a request may then be routed to one or more of the electronic devices based on the determined intent.
Abstract: A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic devices. The multi-modal device interactions may include a non-voice interaction with at least one of the electronic devices or an application associated therewith, and may further include a natural language utterance relating to the non-voice interaction. Context relating to the non-voice interaction and the natural language utterance may be extracted and combined to determine an intent of the multi-modal device interaction, and a request may then be routed to one or more of the electronic devices based on the determined intent of the multi-modal device interaction.

321 citations