scispace - formally typeset
Search or ask a question

Showing papers by "Sugang Xu published in 2020"


Journal ArticleDOI
TL;DR: This work solves the optimization problem of joint progressive recovery to find the optimal sequence of network element and DC repairs with the objective to maximize cumulative weighted content reachability in the network, and proposes a scalable heuristic for scheduling the sequential repair of network nodes/links and DCs.
Abstract: Large-scale disasters affecting both network and datacenter (DC) infrastructures can cause severe disruptions in cloud-based services. During post-disaster recovery, repairs are usually carried out in stages in a progressive manner due to limited repair resource availability. The order in which network elements and DCs are repaired can significantly impact users’ reachability to important contents/services. We investigate joint progressive network and DC recovery in which network recovery and DC recovery are conducted in a coordinated manner such that users have access to the maximum possible amount of contents/services at each repair stage. We first solve the optimization problem of joint progressive recovery to find the optimal sequence of network element and DC repairs with the objective to maximize cumulative weighted content reachability in the network. We then propose a scalable heuristic for scheduling the sequential repair of network nodes/links and DCs. Our model assumes that, at each repair stage, one network node with adjacent links and one DC can be fully repaired; however, full recovery may not be guaranteed due to limited resource availability. Hence, we also propose a “resource-aware” approach (with two resource-allocation strategies, namely “selective allocation” and “adaptive allocation”), which considers both full and partial recovery of elements based on available resources at each stage. We show that, compared to disjoint progressive recovery approach, in which network recovery and DC recovery plans are independent, our joint progressive recovery approach provides significantly higher per-stage content reachability in the network.

18 citations


Journal ArticleDOI
TL;DR: This work proposes an approach for quick recreation of OPM and for achieving robust telemetry based on OpenConfig YANG that can tolerate low post-disaster bandwidth and can adapt the telemetry system following the changing conditions of the C/M-plane network.
Abstract: Optical performance monitoring (OPM) and the corresponding telemetry systems play an important role in modern optical transport networks based on software-defined networking (SDN). There have been extensive studies and standardization activities to build high-speed and high-accuracy OPM/telemetry systems that can ensure sufficient monitoring data for effective network control and management. However, current solutions for OPM/telemetry assume that control and management planes (C/M-plane) always provide sufficient bandwidth (BW) to deliver telemetry data. Unfortunately, in the event of several concurrent network failures (e.g., following a large-scale disaster), C/M-plane networks can become heavily degraded and/or unstable, and even experience isolation of some of their parts. Under such circumstances, the existing OPM systems would hardly function. To enhance resiliency and to ensure the quick recovery of OPM/telemetry in case of disaster, we propose an approach for quick recreation of OPM and for achieving robust telemetry based on OpenConfig YANG. Our proposal addresses three key problems: (1) how to quickly recreate the lost OPM capability, (2) how to address the mismatch between the high data rate of OPM and the low BW in the C/M-plane network, and (3) how to flexibly reconfigure the telemetry system to be adaptive to sudden BW changes in the C/M-plane network. We implement a testbed and experimentally demonstrate that our proposal can tolerate low post-disaster bandwidth and can adapt the telemetry system following the changing conditions of the C/M-plane network.

15 citations


Journal ArticleDOI
TL;DR: This work derives necessary and sufficient conditions and develops what it believes to be a novel mathematical formulation to map a virtual network over a physical network such that content connectivity for the virtual network is ensured against multiple link failures in the physical network.
Abstract: Network connectivity, i.e., the reachability of any network node from all other nodes, is often considered as the default network survivability metric against failures. However, in the case of a large-scale disaster disconnecting multiple network components, network connectivity may not be achievable. On the other hand, with the shifting service paradigm towards the cloud in today’s networks, most services can still be provided as long as at least a content replica is available in all disconnected network partitions. As a result, the concept of content connectivity has been introduced as a new network survivability metric under a large-scale disaster. Content connectivity is defined as the reachability of content from every node in a network under a specific failure scenario. In this work, we investigate how to ensure content connectivity in optical metro networks. We derive necessary and sufficient conditions and develop what we believe to be a novel mathematical formulation to map a virtual network over a physical network such that content connectivity for the virtual network is ensured against multiple link failures in the physical network. In our numerical results, obtained under various network settings, we compare the performance of mapping with content connectivity and network connectivity and show that mapping with content connectivity can guarantee higher survivability, lower network bandwidth utilization, and significant improvement of service availability.

13 citations


Proceedings ArticleDOI
01 Dec 2020
TL;DR: In this article, transfer learning across different lightpaths for failure-cause identification using OSNR traces collected over NICT's Sendai optical-network testbed was performed. But the authors did not investigate the performance of transfer learning on the target lightpath.
Abstract: We perform transfer learning across different lightpaths for failure-cause identification using OSNR traces collected over NICT's Sendai optical-network testbed. Results suggest that limited additional data on the target lightpath allow to achieve satisfactory accuracy.

13 citations


Proceedings ArticleDOI
08 Mar 2020
TL;DR: An SDN-based control for optical-multicast packet transmission is developed and multicast functionality is demonstrated by validating it using an application-layer network service for efficient content duplication in Optical Packet/Circuit Integrated (OPCI) network.
Abstract: We develop an SDN-based control for optical-multicast packet transmission and experimentally demonstrate multicast functionality by validating it using an application-layer network service for efficient content duplication in Optical Packet/Circuit Integrated (OPCI) network.

3 citations


Proceedings ArticleDOI
25 Mar 2020
TL;DR: Various approaches for rapid post-disaster recovery in optical networks (including legacy optical networks) employing disaggregated subsystems, namely, the emergency first-aid unit (FAU) with open application programming interfaces and protocols are discussed.
Abstract: Novel open and disaggregated optical-networking technologies promise to enhance multi-vendor interoperability thanks to their open interfaces in both data-plane and control/management-plane (C/M-plane). From the viewpoint of disaster resilience in optical networks, such interoperability will significantly improve the flexibility in product selection with regard to replacing damaged subsystems with products of different vendors. In this paper, we discuss various approaches for rapid post-disaster recovery in optical networks (including legacy optical networks) employing disaggregated subsystems, namely, the emergency first-aid unit (FAU) with open application programming interfaces and protocols. We address the following problems (and introduce the solutions that we are currently investigating): (1) how to take advantage of the new disaggregated resources and surviving legacy optical resources to achieve early recovery, (2) how to achieve integrated control of FAUs and non-FAU legacy ROADMs, and (3) how to quickly recreate the lost optical performance monitoring (OPM) capability with FAUs and perform a robust telemetry under the restricted bandwidth in the degraded C/M-plane networks.

3 citations


Proceedings ArticleDOI
08 Mar 2020
TL;DR: The functional-block-based model precisely describing the physical layer structures can act as a hardware abstraction layer for more abstracted models like OpenROADM.
Abstract: Automated mapping of real hardware composition onto a ROADM-based model is demonstrated. The functional-block-based model precisely describing the physical layer structures can act as a hardware abstraction layer for more abstracted models like OpenROADM.

1 citations


Proceedings ArticleDOI
08 Mar 2020
TL;DR: Updating an OpenROADM node and subsequent re-routing were automated using a mathematical component-based model, triggered by the addition of node components.
Abstract: Updating an OpenROADM node and subsequent re-routing were automated using a mathematical component-based model, triggered by the addition of node components. This process required only five minutes on an orchestrated testbed using SINET5 and a field optical network.

1 citations


Book ChapterDOI
16 Feb 2020
TL;DR: The evaluation results reveal that the proposal can significantly reduce the burden on recovery and the corresponding cost for carriers, resulting in fast and efficient disaster recovery.
Abstract: To achieve the fast recovery of optical transport networks following a disaster, we investigate a novel scheme to enable cooperation between carriers. Carriers can take advantage of their surviving or recovered optical resources to aid one another with emergency lightpath support to reduce efficiently the burden of recovery, which is heavy immediately after disasters. These lightpaths can be employed exclusively by the counterpart carriers to satisfy their highest priority traffic demands, such as safety confirmation and victim relief. In addition, we introduce an incentive to carriers to prompt cooperation. The carrier cooperation-planning problem is decomposed into eight tasks, and distributed to individual carriers and a third-party organization. During cooperation, the carriers’ confidential information can be strictly protected by employing a carrier optical network abstraction mechanism. The evaluation results reveal that our proposal can significantly reduce the burden on recovery and the corresponding cost for carriers, resulting in fast and efficient disaster recovery.