scispace - formally typeset
Search or ask a question
Author

Kit Pang Szeto

Bio: Kit Pang Szeto is an academic researcher from Salesforce.com. The author has contributed to research in topics: Cloud computing & Software deployment. The author has an hindex of 3, co-authored 7 publications receiving 105 citations.

Papers
More filters
Proceedings ArticleDOI
27 Oct 2013
TL;DR: PredictionIO, an open source machine learning server that comes with a step-by-step graphical user interface for developers to evaluate, compare and deploy scalable learning algorithms, tune hyperparameters of algorithms manually or automatically and evaluate model training status is presented.
Abstract: One of the biggest challenges for software developers to build real-world predictive applications with machine learning is the steep learning curve of data processing frameworks, learning algorithms and scalable system infrastructure. We present PredictionIO, an open source machine learning server that comes with a step-by-step graphical user interface for developers to (i) evaluate, compare and deploy scalable learning algorithms, (ii) tune hyperparameters of algorithms manually or automatically and (iii) evaluate model training status. The system also comes with an Application Programming Interface (API) to communicate with software applications for data collection and prediction retrieval. The whole infrastructure of PredictionIO is horizontally scalable with a distributed computing component based on Hadoop. The demonstration shows a live example and workflows of building real-world predictive applications with the graphical user interface of PredictionIO, from data collection, algorithm tuning and selection, model training and re-training to real-time prediction querying.

34 citations

Patent
18 Mar 2016
TL;DR: In this article, the authors present methods and systems of tracking the deployment of a predictive engine for machine learning, including steps to deploy an engine variant of the predictive engine based on an engine parameter set.
Abstract: Methods and systems of tracking the deployment of a predictive engine for machine learning are disclosed, including steps to deploy an engine variant of the predictive engine based on an engine parameter set, wherein the engine parameter set identifies at least one data source and at least one algorithm; receive one or more queries to the deployed engine variant from one or more end-user devices, and in response, generate predicted results; receive one or more actual results corresponding to the predicted results; associate the queries, the predicted results, and the actual results with a replay tag; and record them with the corresponding deployed engine variant. The present invention helps data scientists and developers develop, deploy, and debug machine learning systems.

31 citations

Patent
19 Sep 2019
TL;DR: In this article, the authors describe a multi-tenant access control system for data access and processing, where the system may determine tenant-specific paths for retrieving the data objects from the data store, and initialize a number of virtual computing engines for accessing the data.
Abstract: Methods, systems, and devices for data access and processing are described. To set up secure environments for data processing (e.g., including machine learning), an access control system may first receive approval from an authorized user (e.g., an approver) granting access to data objects in a multi-tenant data store. The system may determine tenant-specific paths for retrieving the data objects from the data store, and may initialize a number of virtual computing engines for accessing the data. Each computing engine may be tenant-specific based on the path(s) used by that computing engine, and each may include an access role defining the data objects or data object types accessible by that computing engine. By accessing the requested data objects according to the tenant-specific path prefixes and access roles, the virtual computing engines may securely maintain separate environments for different tenants and may only allow user access to approved tenant data.

3 citations

Patent
10 Nov 2017
TL;DR: In this article, the authors describe methods, systems, and devices for multi-tenant workflow processing in a multi-cloud environment, where a set of pre-defined batch processes (e.g., workflow templates) and tenant-specific configurations for instantiating and executing tenant specific batch processes for each tenant of a user are described.
Abstract: Methods, systems, and devices for multi-tenant workflow processing are described. In some cases, a cloud platform may utilize a set of pre-defined batch processes (e.g., workflow templates) and tenant-specific configurations for instantiating and executing tenant-specific batch processes for each tenant of a user. As such, the cloud platform may utilize common data process workflows for each tenant, where a configuration specifies tenant-specific information for the common data process workflows. The workflow templates may include a set of job definitions (e.g., actions for a server to execute) and a schedule defining the frequency for running the templates for a specific project. The configurations may indicate a tenant to execute the workflow templates for, and may include tenant-specific information to override default template information. The cloud platform or a designated server or server cluster may instantiate and execute workflows based on one or more combinations of configurations and indicated workflow templates.

3 citations


Cited by
More filters
Proceedings ArticleDOI
01 Dec 2015
TL;DR: This paper proposes an architecture to create a flexible and scalable machine learning as a service, using real-world sensor and weather data by running different algorithms at the same time.
Abstract: The demand for knowledge extraction has been increasing. With the growing amount of data being generated by global data sources (e.g., social media and mobile apps) and the popularization of context-specific data (e.g., the Internet of Things), companies and researchers need to connect all these data and extract valuable information. Machine learning has been gaining much attention in data mining, leveraging the birth of new solutions. This paper proposes an architecture to create a flexible and scalable machine learning as a service. An open source solution was implemented and presented. As a case study, a forecast of electricity demand was generated using real-world sensor and weather data by running different algorithms at the same time.

281 citations

Proceedings ArticleDOI
01 Nov 2017
TL;DR: While network researchers should approach MLaaS systems with caution, they can achieve results comparable to standalone classifiers if they have sufficient insight into key decisions like classifiers and feature selection.
Abstract: Machine learning classifiers are basic research tools used in numerous types of network analysis and modeling. To reduce the need for domain expertise and costs of running local ML classifiers, network researchers can instead rely on centralized Machine Learning as a Service (MLaaS) platforms. In this paper, we evaluate the effectiveness of MLaaS systems ranging from fully-automated, turnkey systems to fully-customizable systems, and find that with more user control comes greater risk. Good decisions produce even higher performance, and poor decisions result in harsher performance penalties. We also find that server side optimizations help fully-automated systems outperform default settings on competitors, but still lag far behind well-tuned MLaaS systems which compare favorably to standalone ML libraries. Finally, we find classifier choice is the dominating factor in determining model performance, and that users can approximate the performance of an optimal classifier choice by experimenting with a small subset of random classifiers. While network researchers should approach MLaaS systems with caution, they can achieve results comparable to standalone classifiers if they have sufficient insight into key decisions like classifiers and feature selection.

68 citations

Journal ArticleDOI
TL;DR: A distributed architecture to provide machine learning practitioners with a set of tools and cloud services that cover the whole machine learning development cycle, ranging from the models creation, training, validation and testing to the models serving as a service, sharing and publication is proposed.
Abstract: In this paper we propose a distributed architecture to provide machine learning practitioners with a set of tools and cloud services that cover the whole machine learning development cycle: ranging from the models creation, training, validation and testing to the models serving as a service, sharing and publication. In such respect, the DEEP-Hybrid-DataCloud framework allows transparent access to existing e-Infrastructures, effectively exploiting distributed resources for the most compute-intensive tasks coming from the machine learning development cycle. Moreover, it provides scientists with a set of Cloud-oriented services to make their models publicly available, by adopting a serverless architecture and a DevOps approach, allowing an easy share, publish and deploy of the developed models.

53 citations

Proceedings ArticleDOI
02 Feb 2018
TL;DR: This work proposes OpenRec, an open and modular Python framework that supports extensible and adaptable research in recommender systems and demonstrates that OpenRec provides adaptability, modularity and reusability while maintaining training efficiency and recommendation accuracy.
Abstract: With the increasing demand for deeper understanding of users» preferences, recommender systems have gone beyond simple user-item filtering and are increasingly sophisticated, comprised of multiple components for analyzing and fusing diverse information. Unfortunately, existing frameworks do not adequately support extensibility and adaptability and consequently pose significant challenges to rapid, iterative, and systematic, experimentation. In this work, we propose OpenRec, an open and modular Python framework that supports extensible and adaptable research in recommender systems. Each recommender is modeled as a computational graph that consists of a structured ensemble of reusable modules connected through a set of well-defined interfaces. We present the architecture of OpenRec and demonstrate that OpenRec provides adaptability, modularity and reusability while maintaining training efficiency and recommendation accuracy. Our case study illustrates how OpenRec can support an efficient design process to prototype and benchmark alternative approaches with inter-changeable modules and enable development and evaluation of new algorithms.

52 citations

Patent
05 Jun 2015
TL;DR: In this paper, a workflow execution engine can assign one or more computing environments from a candidate pool to execute the operator instances based on the interdependency graph of operator instances for a workflow run.
Abstract: Some embodiments include a method of machine learner workflow processing. For example, a workflow execution engine can receive an interdependency graph of operator instances for a workflow run. The operator instances can be associated with one or more operator types. The workflow execution engine can assign one or more computing environments from a candidate pool to execute the operator instances based on the interdependency graph. The workflow execution engine can generate a schedule plan of one or more execution requests associated with the operator instances. The workflow execution engine can distribute code packages associated the operator instances to the assigned computing environments. The workflow execution engine can maintain a memoization repository to cache one or more outputs of the operator instances upon completion of the execution requests.

49 citations