scispace - formally typeset
Search or ask a question

Showing papers on "Task (computing) published in 2008"


Proceedings ArticleDOI
08 Dec 2008
TL;DR: A new scheduling algorithm, Longest Approximate Time to End (LATE), that is highly robust to heterogeneity and can improve Hadoop response times by a factor of 2 in clusters of 200 virtual machines on EC2.
Abstract: MapReduce is emerging as an important programming model for large-scale data-parallel applications such as web indexing, data mining, and scientific simulation. Hadoop is an open-source implementation of MapReduce enjoying wide adoption and is often used for short jobs where low response time is critical. Hadoop's performance is closely tied to its task scheduler, which implicitly assumes that cluster nodes are homogeneous and tasks make progress linearly, and uses these assumptions to decide when to speculatively re-execute tasks that appear to be stragglers. In practice, the homogeneity assumptions do not always hold. An especially compelling setting where this occurs is a virtualized data center, such as Amazon's Elastic Compute Cloud (EC2). We show that Hadoop's scheduler can cause severe performance degradation in heterogeneous environments. We design a new scheduling algorithm, Longest Approximate Time to End (LATE), that is highly robust to heterogeneity. LATE can improve Hadoop response times by a factor of 2 in clusters of 200 virtual machines on EC2.

1,801 citations


Proceedings ArticleDOI
26 Oct 2008
TL;DR: This is the first work to identify, measure and automatically segment sequences of user queries into their hierarchical structure, and paves the way for evaluating search engines in terms of user task completion.
Abstract: Most analysis of web search relevance and performance takes a single query as the unit of search engine interaction. When studies attempt to group queries together by task or session, a timeout is typically used to identify the boundary. However, users query search engines in order to accomplish tasks at a variety of granularities, issuing multiple queries as they attempt to accomplish tasks. In this work we study real sessions manually labeled into hierarchical tasks, and show that timeouts, whatever their length, are of limited utility in identifying task boundaries, achieving a maximum precision of only 70%. We report on properties of this search task hierarchy, as seen in a random sample of user interactions from a major web search engine's log, annotated by human editors, learning that 17% of tasks are interleaved, and 20% are hierarchically organized. No previous work has analyzed or addressed automatic identification of interleaved and hierarchically organized search tasks. We propose and evaluate a method for the automated segmentation of users' query streams into hierarchical units. Our classifiers can improve on timeout segmentation, as well as other previously published approaches, bringing the accuracy up to 92% for identifying fine-grained task boundaries, and 89-97% for identifying pairs of queries from the same task when tasks are interleaved hierarchically. This is the first work to identify, measure and automatically segment sequences of user queries into their hierarchical structure. The ability to perform this kind of segmentation paves the way for evaluating search engines in terms of user task completion.

447 citations


Proceedings Article
08 Dec 2008
TL;DR: In this article, the authors assume that tasks are clustered into groups, which are unknown beforehand, and that tasks within a group have similar weight vectors, resulting in a new convex optimization formulation for multi-task learning.
Abstract: In multi-task learning several related tasks are considered simultaneously, with the hope that by an appropriate sharing of information across tasks, each task may benefit from the others. In the context of learning linear functions for supervised classification or regression, this can be achieved by including a priori information about the weight vectors associated with the tasks, and how they are expected to be related to each other. In this paper, we assume that tasks are clustered into groups, which are unknown beforehand, and that tasks within a group have similar weight vectors. We design a new spectral norm that encodes this a priori assumption, without the prior knowledge of the partition of tasks into groups, resulting in a new convex optimization formulation for multi-task learning. We show in simulations on synthetic examples and on the IEDB MHC-I binding dataset, that our approach outperforms well-known convex methods for multi-task learning, as well as related non-convex methods dedicated to the same problem.

413 citations


Posted Content
TL;DR: A new spectral norm is designed that encodes this a priori assumption that tasks are clustered into groups, which are unknown beforehand, and that tasks within a group have similar weight vectors, resulting in a new convex optimization formulation for multi-task learning.
Abstract: In multi-task learning several related tasks are considered simultaneously, with the hope that by an appropriate sharing of information across tasks, each task may benefit from the others. In the context of learning linear functions for supervised classification or regression, this can be achieved by including a priori information about the weight vectors associated with the tasks, and how they are expected to be related to each other. In this paper, we assume that tasks are clustered into groups, which are unknown beforehand, and that tasks within a group have similar weight vectors. We design a new spectral norm that encodes this a priori assumption, without the prior knowledge of the partition of tasks into groups, resulting in a new convex optimization formulation for multi-task learning. We show in simulations on synthetic examples and on the IEDB MHC-I binding dataset, that our approach outperforms well-known convex methods for multi-task learning, as well as related non convex methods dedicated to the same problem.

409 citations


Journal ArticleDOI
TL;DR: Differences were found across tasks and writers with the reading-to-write task eliciting a more interactive process for some writers and writing-only tasks requiring more initial and less online planning.

182 citations


Proceedings ArticleDOI
05 Jul 2008
TL;DR: A novel algorithm is introduced that transfers samples from the source tasks that are mostly similar to the target task, and is empirically show that, following the proposed approach, the transfer of samples is effective in reducing the learning complexity.
Abstract: The main objective of transfer in reinforcement learning is to reduce the complexity of learning the solution of a target task by effectively reusing the knowledge retained from solving a set of source tasks. In this paper, we introduce a novel algorithm that transfers samples (i.e., tuples 〈s, a, s', r〉) from source to target tasks. Under the assumption that tasks have similar transition models and reward functions, we propose a method to select samples from the source tasks that are mostly similar to the target task, and, then, to use them as input for batch reinforcement-learning algorithms. As a result, the number of samples an agent needs to collect from the target task to learn its solution is reduced. We empirically show that, following the proposed approach, the transfer of samples is effective in reducing the learning complexity, even when some source tasks are significantly different from the target task.

173 citations


Journal ArticleDOI
TL;DR: The problem presented in this paper adds sequence-dependent setup time considerations to the classical SALBP in the following way: whenever a task is assigned next to another at the same workstation, a setup time must be added to compute the global workstation time.

153 citations


Proceedings ArticleDOI
15 Nov 2008
TL;DR: This work proposes a new cut-off technique that, using information from the application collected at runtime, decides which tasks should be pruned to improve the performance of the application.
Abstract: In task parallel languages, an important factor for achieving a good performance is the use of a cut-off technique to reduce the number of tasks created. Using a cut-off to avoid an excessive number of tasks helps the runtime system to reduce the total overhead associated with task creation, particularlt if the tasks are fine grain. Unfortunately, the best cut-off technique its usually dependent on the application structure or even the input data of the application. We propose a new cut-off technique that, using information from the application collected at runtime, decides which tasks should be pruned to improve the performance of the application. This technique does not rely on the programmer to determine the cut-off technique that is best suited for the application. We have implemented this cut-off in the context of the new OpenMP tasking model. Our evaluation, with a variety of applications, shows that our adaptive cut-off is able to make good decisions and most of the time matches the optimal cut-off that could be set by hand by a programmer.

93 citations


Patent
04 Nov 2008
TL;DR: In this paper, a method and an apparatus for a parallel computing program calling APIs (application programming interfaces) in a host processor to perform a data processing task in parallel among compute units are described.
Abstract: A method and an apparatus for a parallel computing program calling APIs (application programming interfaces) in a host processor to perform a data processing task in parallel among compute units are described. The compute units are coupled to the host processor including central processing units (CPUs) and graphic processing units (GPUs). A program object corresponding to a source code for the data processing task is generated in a memory coupled to the host processor according to the API calls. Executable codes for the compute units are generated from the program object according to the API calls to be loaded for concurrent execution among the compute units to perform the data processing task.

90 citations


DOI
01 Jan 2008
TL;DR: A method to analyze the duration and the cost of sequences of integration and test-diagnose-fix tasks and indicates that choosing a di??erent test sequence can reduce the test duration by 30% to 70%.
Abstract: Complex manufacturing machines, like ASML wafer scanners, consist of thousands of components like electronic boards, software, mechanical parts and optics. These components of multiple disciplines are assembled or integrated into modules. The modules are integrated into sub-systems forming the system, according to an integration plan. Components as well as modules, sub-systems, and systems, can be tested, diagnosed and ??xed, according to a test-diagnose- fix plan. An increase in the number of components results in an increase of the number of tasks in these plans. Moreover, the effort required to obtain a sequence that describes in which order the tasks should be executed also increases. The duration and the cost of a sequence depends on the quality of the system. In this project we introduce a method to analyze the duration and the cost of sequences of integration and test-diagnose-fix tasks. The method uses test-diagnose-fixed models to analyze the performance of sequences. The basic elements in such a model are: a) test, diagnose and fix tasks with their costs and durations, b) fault states, c) the coverage of test tasks on fault states, d) failure probabilities of fault states. These elements can be obtained for components, modules or sub-systems of multiple disciplines. Three case studies have been performed using this method. The outcome of the analysis indicates that choosing a di??erent test sequence can reduce the test duration by 30% to 70%. In addition, three techniques have been developed to improve integration and test-diagnose-fix sequences: ? To reduce the execution time of test-diagnose-fix sequences an algorithm has been developed to determine a new test task with an optimal coverage w.r.t. the fault states. The algorithm selects the new test task based on the maximum information gain. A test sequence, including the new test case, improves the test duration of the test-diagnose-fix task, because faults can be detected earlier. ? To reduce the execution time of test-diagnose-fix sequences an adapted hypergraph partitioning algorithm has been developed. The algorithm partitions a test-diagnose-??x task into smaller tasks which can be executed in parallel. The result of a case study is a reduction of the test duration by 30% with a concomitant increase of 30% in the test cost. ? The impact of the choice of the system architecture on the execution time and planning effort of integration and test-diagnose-fix sequences is investigated

90 citations


Book ChapterDOI
12 May 2008
TL;DR: An extension to allow the runtime detection of dependencies between generated tasks, broading the range of application that can benefit from tasking or improving the performance when loadbalancing or locality are critical issues for performance is proposed.
Abstract: Tasking in OpenMP 3.0 has been conceived to handle the dynamicgeneration of unstructured parallelism. New directives have beenadded allowing the user to identify units of independent work (tasks) andto define points to wait for the completion of tasks (task barriers). Inthis paper we propose an extension to allow the runtime detection of dependenciesbetween generated tasks, broading the range of applicationsthat can benefit from tasking or improving the performance when loadbalancing or locality are critical issues for performance. Furthermore thepaper describes our proof-of-concept implementation (SMP Superscalar)and shows preliminary performance results on an SGI Altix 4700.

Journal ArticleDOI
TL;DR: This work provides a way in which working memory and episodic memory may be included in the reinforcement learning framework, then simulating the successful acquisition and performance of six behavioral tasks that require WM or episodicMemory for correct performance.
Abstract: The mechanisms of goal-directed behavior have been studied using reinforcement learning theory, but these theoretical techniques have not often been used to address the role of memory systems in performing behavioral tasks. The present work addresses this shortcoming by providing a way in which working memory and episodic memory may be included in the reinforcement learning framework, then simulating the successful acquisition and performance of six behavioral tasks, drawn from or inspired by the rat experimental literature, that require working memory or episodic memory for correct performance. With no delay imposed during the tasks, simulations with working memory can solve all of the tasks at above the chance level. When a delay is imposed, simulations with both episodic memory and working memory can solve all of the tasks except a disambiguation of odor sequences task.

Patent
15 Jan 2008
TL;DR: A priority based scheduling system for a server prioritizes multiple tasks that are defined using various constraints, which may include relationships defined between different tasks, performance parameters for each task, and completion constraints as mentioned in this paper.
Abstract: A priority based scheduling system for a server prioritizes multiple tasks that are defined using various constraints, which may include relationships defined between different tasks, performance parameters for each task, and completion constraints. The system may track actual performance of a task and update the performance parameters over time. Some embodiments may include a status monitoring agent that may detect that a monitored network parameter has changed that may cause a scheduled task to be raised or lowered in priority. The system may be used to schedule and execute one time tasks as well as recurring tasks, and may execute those tasks during a rigid or flexible periodic time window. Many of the tasks may be pausable and resumable, and such tasks may be performed in increments over successive time windows.

Patent
22 Aug 2008
TL;DR: In this paper, a system and method to control the allocation of processor (or state machine) execution resources to individual tasks executing in computer systems is described, which is accomplished through workload metering shaping, and workload prioritization which gives preference to tasks based on configured priorities.
Abstract: A system and method to control the allocation of processor (or state machine) execution resources to individual tasks executing in computer systems is described. By controlling the allocation of execution resources, to all tasks, each task may be provided with throughput and response time guarantees. This control is accomplished through workload metering shaping which delays the execution of tasks that have used their workload allocation until sufficient time has passed to accumulate credit for execution (accumulate credit over time to perform their allocated work) and workload prioritization which gives preference to tasks based on configured priorities.

Journal ArticleDOI
09 Sep 2008-PLOS ONE
TL;DR: Using this methodology, a probable functional arrangement of neural systems engaged during different timing behaviors is revealed, which shows a prominent segregation of explicit and implicit timing tasks, and a clear grouping between single and multiple interval paradigms.
Abstract: In the present study we determined the performance interrelations of ten different tasks that involved the processing of temporal intervals in the subsecond range, using multidimensional analyses. Twenty human subjects executed the following explicit timing tasks: interval categorization and discrimination (perceptual tasks), and single and multiple interval tapping (production tasks). In addition, the subjects performed a continuous circle-drawing task that has been considered an implicit timing paradigm, since time is an emergent property of the produced spatial trajectory. All tasks could be also classified as single or multiple interval paradigms. Auditory or visual markers were used to define the intervals. Performance variability, a measure that reflects the temporal and non-temporal processes for each task, was used to construct a dissimilarity matrix that quantifies the distances between pairs of tasks. Hierarchical clustering and multidimensional scaling were carried out on the dissimilarity matrix, and the results showed a prominent segregation of explicit and implicit timing tasks, and a clear grouping between single and multiple interval paradigms. In contrast, other variables such as the marker modality were not as crucial to explain the performance between tasks. Thus, using this methodology we revealed a probable functional arrangement of neural systems engaged during different timing behaviors.

Proceedings ArticleDOI
01 Dec 2008
TL;DR: This paper presents an heuristic to optimize the number of machines that should be allocated to process tasks so that for a given budget the speedups are maximal and evaluates the ratios between number of allocated hosts, charged times, speedups and processing times.
Abstract: The use of utility on-demand computing infrastructures, such as Amazon's Elastic Clouds [1], is a viable solution to speed lengthy parallel computing problems to those without access to other cluster or grid infrastructures. With a suitable middleware, bag-of-tasks problems could be easily deployed over a pool of virtual computers created on such infrastructures.In bag-of-tasks problems, as there is no communication between tasks, the number of concurrent tasks is allowed to vary over time. In a utility computing infrastructure, if too many virtual computers are created, the speedups are high but may not be cost effective; if too few computers are created, the cost is low but speedups fall below expectations. Without previous knowledge of the processing time of each task, it is difficult to determine how many machines should be created.In this paper, we present an heuristic to optimize the number of machines that should be allocated to process tasks so that for a given budget the speedups are maximal. We have simulated the proposed heuristics against real and theoretical workloads and evaluated the ratios between number of allocated hosts, charged times, speedups and processing times. With the proposed heuristics, it is possible to obtain speedups in line with the number of allocated computers, while being charged approximately the same predefined budget.

Journal ArticleDOI
TL;DR: The (partial) probability generating functions of the number of customers present when the system is occupied with a U-task as well as when it acts as an M/M/∞ queue are derived and explicit expressions for the corresponding mean queue sizes are obtained.
Abstract: A system is operating as an M/M/∞ queue. However, when it becomes empty, it is assigned to perform another task, the duration U of which is random. Customers arriving while the system is unavailable for service (i.e., occupied with a U-task) become impatient: Each individual activates an “impatience timer” having random duration T such that if the system does not become available by the time the timer expires, the customer leaves the system never to return. When the system completes a U-task and there are waiting customers, each one is taken immediately into service. We analyze both multiple and single U-task scenarios and consider both exponentially and generally distributed task and impatience times. We derive the (partial) probability generating functions of the number of customers present when the system is occupied with a U-task as well as when it acts as an M/M/∞ queue and we obtain explicit expressions for the corresponding mean queue sizes. We further calculate the mean length of a busy period, the mean cycle time, and the quality of service measure: proportion of customers being served.

Journal ArticleDOI
TL;DR: This paper proposes an approach to automatically generate at run time a functional configuration of a network robot system to perform a given task in a given environment, and to dynamically change this configuration in response to failures, based on artificial intelligence planning techniques.

Patent
10 Oct 2008
TL;DR: In this paper, the authors propose a system that facilitates software development by providing a software factory based on an instance of a metamodel, which supports the definition of viewpoints with a viewpoint comprising one or more work product types.
Abstract: A system that facilitates software development by providing a software factory based on an instance of a metamodel. The metamodel supports the definition of one or more viewpoints with a viewpoint comprising one or more work product types, templates for one or more tasks supporting the creation and modification of instances of the viewpoints and work product types, and templates for workstreams comprising one or more tasks and relationships between them. The metamodel supports definition of relationship(s) among viewpoints and/or between viewpoint(s) and work product type(s), and operation(s) that can be performed across relationship(s). Additionally, asset(s), if any, available to particular task(s) can further be defined as supported by the metamodel.

Patent
29 Mar 2008
TL;DR: One embodiment of a programmable device embodying a program of executable instructions can be found in this paper, where multiple tasks or symbols are assigned to each of a number of motion groups.
Abstract: One embodiment a programmable device embodying a program of executable instructions to perform steps including assigning multiple tasks or symbols to each of a number of motion groups; segmenting motion data from sensor(s); matching the segments to motion groups; composing and then selecting task(s) or symbol sequence(s) from the task(s) and/or symbol(s) assigned to the matched motion groups

Patent
Hiroaki Yamaoka1
15 May 2008
TL;DR: In this paper, the authors propose a method for reducing the aging of processor cores that have lower performance by determining performance levels for each of the processor cores and determining an allocation of the tasks to the processors.
Abstract: Systems and methods for improving the reliability of multiprocessors by reducing the aging of processor cores that have lower performance. One embodiment comprises a method implemented in a multiprocessor system having a plurality of processor cores. The method includes determining performance levels for each of the processor cores and determining an allocation of the tasks to the processor cores that substantially minimizes aging of a lowest-performing one of the operating processor cores. The allocation may be based on task priority, task weight, heat generated, or combinations of these factors. The method may also include identifying processor cores whose performance levels are below a threshold level and shutting down these processor cores. If the number of processor cores that are still active is less than a threshold number, the multiprocessor system may be shut down, or a warning may be provided to a user.

Patent
23 Oct 2008
TL;DR: In this paper, the effects of bias temperature instability (BTI) resulting from task execution are distributed among the processor cores in a more equal fashion than if tasks are scheduled according to a fixed order.
Abstract: A data processing device assigns tasks to processor cores in a more distributed fashion. In one embodiment, the data processing device can schedule tasks for execution amongst the processor cores in a pseudo-random fashion. In another embodiment, the processor core can schedule tasks for execution amongst the processor cores based on the relative amount of historical utilization of each processor core. In either case, the effects of bias temperature instability (BTI) resulting from task execution are distributed among the processor cores in a more equal fashion than if tasks are scheduled according to a fixed order. Accordingly, the useful lifetime of the processor unit can be extended.

Patent
13 Jun 2008
TL;DR: A communication and workflow management system and method for integrating a wide range of health care organization workflow management functions, generated by automated systems, manual and automated events associated with patients and staff interactions, through input-output devices such that requests and dispatch requests can be handled locally or over a widely distributed network, and can be tracked and escalated as required.
Abstract: A communication and workflow management system and method is provided for integrating a wide range of health care organization workflow management functions, generated by automated systems, manual and automated events associated with patients and staff interactions, through input-output devices such that requests and dispatch requests can be handled locally or over a widely distributed network, and can be tracked and escalated as required. The invention features a rules engine and database that identifies and defines resources, patients, tasks, and task handling. The invention uses extensive logic for the assignment of tasks and communication with resources that can execute tasks, tracking, completion of task, and escalation of tasks. The communication system can be integrated with staff and equipment tracking for automated closure of tasks.

Patent
11 Jan 2008
TL;DR: One or more functions are exposed by a mobile device to a host connected to the mobile device, and a function of the one or more function is executed at the mobile devices in response to a request from the host, wherein the function is associated with a host task.
Abstract: One or more functions are exposed by a mobile device to a host connected to the mobile device A function of the one or more functions is executed at the mobile device in response to a request from the host, wherein the function is associated with a host task The result of the function is returned to the host

Proceedings ArticleDOI
16 Jul 2008
TL;DR: New schemes for efficient automatic task distribution between CPU and GPU are presented and tests and results of implementing those schemes are included with a test case and with a real-time system.
Abstract: The increase of computational power of programmable GPU (graphics processing unit) brings new concepts for using these devices for generic processing. Hence, with the use of the CPU and the GPU for data processing come new ideas that deals with distribution of tasks among CPU and GPU, such as automatic distribution. The importance of the automatic distribution of tasks between CPU and GPU lies in three facts. First, automatic task distribution enables the applications to use the best of both processors. Second, the developer does not have to decide which processor will do the work, allowing the automatic task distribution system to choose the best option for the moment. And third, sometimes, the application can be slowed down by other processes if the CPU or GPU is already overloaded. Based on these facts, this paper presents new schemes for efficient automatic task distribution between CPU and GPU. This paper also includes tests and results of implementing those schemes with a test case and with a real-time system.

Patent
Wenlong Li1, Xiaofeng Tong1, Aamer Jaleel1
30 Jun 2008
TL;DR: In this paper, a task scheduler is used to assign tasks of a multithreaded application to one or more cores of the plurality of cores of a multi-core processor.
Abstract: Device, system, and method of executing multithreaded applications. Some embodiments include a task scheduler to receive application information related to one or more parameters of at least one multithreaded application to be executed by a multi-core processor including a plurality of cores and, based on the application information and based on architecture information related to an arrangement of the plurality of cores, to assign one or more tasks of the multithreaded application to one or more cores of the plurality of cores. Other embodiments are described and claimed.

Patent
30 May 2008
TL;DR: In this article, techniques for generating a distributed stream processing application are provided. The techniques include obtaining a declarative description of one or more data stream processing tasks, wherein the declareative description expresses at least one stream processing task, and generating one or multiple execution units from such a declareable description, where the execution units are deployable across multiple distributed computing nodes.
Abstract: Techniques for generating a distributed stream processing application are provided. The techniques include obtaining a declarative description of one or more data stream processing tasks, wherein the declarative description expresses at least one stream processing task, and generating one or more execution units from the declarative description of one or more data stream processing tasks, wherein the one or more execution units are deployable across one or more distributed computing nodes, and comprise a distributed data stream processing application.

Patent
26 Jun 2008
TL;DR: In this paper, an adaptive semi-synchronous parallel processing system and method for flow cytometry data analysis applications is presented. But the authors do not address the problem of queue assignment in a multiprocessor system.
Abstract: There is provided an adaptive semi- synchronous parallel processing system and method, which may be adapted to various data analysis applications such as flow cytometry systems. By identifying the relationship and memory dependencies between tasks that are necessary to complete an analysis, it is possible to significantly reduce the analysis processing time by selectively executing tasks after careful assignment of tasks to one or more processor queues, where the queue assignment is based on an optimal execution strategy. Further strategies are disclosed to address optimal processing once a task undergoes computation by a computational element in a multiprocessor system. Also disclosed is a technique to perform fluorescence compensation to correct spectral overlap between different detectors in a flow cytometry system due to emission characteristics of various fluorescent dyes.

Patent
15 Jul 2008
TL;DR: In this article, the authors present a shared memory map for code handling in a heterogeneous processing environment, where one of the processors is programmed to perform a dedicated code-handling task, such as perform just-in-time compilation or interpretation of interpreted language instructions.
Abstract: Code handling, such as interpreting language instructions or performing “just-in-time” compilation, is performed using a heterogeneous processing environment that shares a common memory. In a heterogeneous processing environment that includes a plurality of processors, one of the processors is programmed to perform a dedicated code-handling task, such as perform just-in-time compilation or interpretation of interpreted language instructions, such as Java. The other processors request code handling processing that is performed by the dedicated processor. Speed is achieved using a shared memory map so that the dedicated processor can quickly retrieve data provided by one of the other processors.

Patent
26 Sep 2008
TL;DR: In this article, a method and a system for job scheduling in application servers is presented, where a common metadata of a job is deployed, the job being a deployable software component.
Abstract: A method and a system for job scheduling in application servers. A common metadata of a job is deployed, the job being a deployable software component. An additional metadata of the job is further deployed. A scheduler task based on the additional metadata of the job is created, wherein the task is associated with a starting condition. The scheduler task is started at an occurrence of the starting condition, and, responsive to this an execution of an instance of the job is invoked asynchronously.