scispace - formally typeset
Patent

Efficient duplicate detection for machine learning data sets

Reads0
Chats0
TLDR
In this paper, a machine learning service is made that an analysis to detect whether at least a portion of contents of one or more observation records of a first data set are duplicated in a second set of observation records is to be performed.
Abstract
At a machine learning service, a determination is made that an analysis to detect whether at least a portion of contents of one or more observation records of a first data set are duplicated in a second set of observation records is to be performed. A duplication metric is obtained, indicative of a non-zero probability that one or more observation records of the second set are duplicates of respective observation records of the first set. In response to determining that the duplication metric meets a threshold criterion, one or more responsive actions are initiated, such as the transmission of a notification to a client of the service.

read more

Citations
More filters
Patent

Interactive interfaces for machine learning model evaluations

TL;DR: In this article, a first data set corresponding to an evaluation run of a model is generated at a machine learning service for display via an interactive interface, which includes a prediction quality metric.
Patent

Optimized training of linear machine learning models

TL;DR: In this article, a linear prediction model is used to generate predictions using respective parameters assigned to a plurality of features derived from observation records of the data source, and the parameter values are stored in a parameter vector.
Patent

Data processing systems and methods for performing privacy assessments and monitoring of new versions of computer code for privacy compliance

TL;DR: In this article, the authors present a system that collects and/or uses personal data, and then automatically analyzes the computer code to identify one or more privacy-related attributes that may impact privacy assessment standards.
Patent

Data processing systems and methods for efficiently assessing the risk of privacy campaigns

TL;DR: In this paper, the authors provide a centralized repository of templates of privacy-related question/answer pairings for various vendors, products (e.g., software products), and services.
Patent

Data processing systems and methods for operationalizing privacy compliance and assessing the risk of various respective privacy campaigns

TL;DR: In this paper, the authors present a system to assess and display a relative risk associated with each campaign and automatically set, monitor, and facilitate the timely completion of an audit schedule for each campaign.
References
More filters
Posted Content

A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning

TL;DR: Bayesian optimization as mentioned in this paper employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function, which permits a utility-based selection of the next observation to make on the objective functions, which must take into account both exploration (sampling from areas of high uncertainty) and exploitation, sampling areas likely to offer improvement over the current best observation.
Journal ArticleDOI

Context-Aware Recommender Systems

TL;DR: An overview of the multifaceted notion of context is provided, several approaches for incorporating contextual information in recommendation process are discussed, and the usage of such approaches in several application areas where different types of contexts are exploited are illustrated.
Posted Content

Practical Bayesian Optimization of Machine Learning Algorithms

TL;DR: In this paper, a learning algorithm's generalization performance is modeled as a sample from a Gaussian process and the tractable posterior distribution induced by the GP leads to efficient use of the information gathered by previous experiments, enabling optimal choices about what parameters to try next.
Patent

Automatic software production system

José Iborra, +1 more
TL;DR: In this article, an automated software production system is provided, in which system requirements are captured, converted into a formal specification, and validated for correctness and completeness, and a translator is provided to automatically generate a complete, robust software application based on the validated formal specification.
Journal ArticleDOI

A comparative analysis of methods for pruning decision trees

TL;DR: A comparative study of six well-known pruning methods with the aim of understanding their theoretical foundations, their computational complexity, and the strengths and weaknesses of their formulation, and an objective evaluation of the tendency to overprune/underprune observed in each method is made.
Related Papers (5)