scispace - formally typeset
Search or ask a question

Showing papers on "Encoding (memory) published in 2021"


Journal ArticleDOI
TL;DR: In this paper, a dual deep encoding network was proposed to encode videos and queries into powerful dense representations of their own, which can represent the rich content of both modalities in a coarse-to-fine fashion.
Abstract: This paper attacks the challenging problem of video retrieval by text. In such a retrieval paradigm, an end user searches for unlabeled videos by ad-hoc queries described exclusively in the form of a natural-language sentence, with no visual example provided. Given videos as sequences of frames and queries as sequences of words, an effective sequence-to-sequence cross-modal matching is crucial. To that end, the two modalities need to be first encoded into real-valued vectors and then projected into a common space. In this paper we achieve this by proposing a dual deep encoding network that encodes videos and queries into powerful dense representations of their own. Our novelty is two-fold. First, different from prior art that resorts to a specific single-level encoder, the proposed network performs multi-level encoding that represents the rich content of both modalities in a coarse-to-fine fashion. Second, different from a conventional common space learning algorithm which is either concept based or latent space based, we introduce hybrid space learning which combines the high performance of the latent space and the good interpretability of the concept space. Dual encoding is conceptually simple, practically effective and end-to-end trained with hybrid space learning. Extensive experiments on four challenging video datasets show the viability of the new method.

105 citations


Proceedings ArticleDOI
Shengyu Huang1, Zan Gojcic1, Mikhail Usvyatsov1, Andreas Wieser1, Konrad Schindler1 
01 Jun 2021
TL;DR: In this paper, an overlap-attention block for early information exchange between the latent encodings of the two point clouds is introduced, which can predict which points are not only salient, but also lie in the overlap region between the point clouds.
Abstract: We introduce PREDATOR, a model for pairwise point-cloud registration with deep attention to the overlap region. Different from previous work, our model is specifically designed to handle (also) point-cloud pairs with low overlap. Its key novelty is an overlap-attention block for early information exchange between the latent encodings of the two point clouds. In this way the subsequent decoding of the latent representations into per-point features is conditioned on the respective other point cloud, and thus can predict which points are not only salient, but also lie in the overlap region between the two point clouds. The ability to focus on points that are relevant for matching greatly improves performance: PREDATOR raises the rate of successful registrations by more than 20% in the low-overlap scenario, and also sets a new state of the art for the 3DMatch benchmark with 89% registration recall. [Code release]

101 citations


Journal ArticleDOI
TL;DR: This article reduces the feature dimensionality of large-scale IoT network traffic data using the encoding phase of long short-term memory autoencoder (LAE), and the deep BLSTM model demonstrates robustness against model underfitting and overfitting and achieves good generalisation ability in binary and multiclass classification scenarios.
Abstract: Deep learning (DL) is an efficient method for botnet attack detection. However, the volume of network traffic data and memory space required is usually large. It is, therefore, almost impossible to implement the DL method in memory-constrained Internet-of-Things (IoT) devices. In this article, we reduce the feature dimensionality of large-scale IoT network traffic data using the encoding phase of long short-term memory autoencoder (LAE). In order to classify network traffic samples correctly, we analyze the long-term inter-related changes in the low-dimensional feature set produced by LAE using deep bidirectional long short-term memory (BLSTM). Extensive experiments are performed with the BoT-IoT data set to validate the effectiveness of the proposed hybrid DL method. Results show that LAE significantly reduced the memory space required for large-scale network traffic data storage by 91.89%, and it outperformed state-of-the-art feature dimensionality reduction methods by 18.92–27.03%. Despite the significant reduction in feature size, the deep BLSTM model demonstrates robustness against model underfitting and overfitting. It also achieves good generalisation ability in binary and multiclass classification scenarios.

90 citations


Journal ArticleDOI
TL;DR: This paper designs a circuit that requires only one memristor crossbar for each unit in the LSTM cell, and uses word2vector instead of one-hot encoding for the input data encoding and proves the effectiveness of the proposed MLSTM system on IMDB and SemEval datasets.
Abstract: This paper presents a complete solution for the hardware design of a memristor-based long short-term memory (MLSTM) network. Throughout the design process, we fully consider the external and internal structures of the long short-term memory (LSTM), both of which are efficiently implemented by memristor crossbars. In the specific design of the internal structure, the parameter sharing mechanism is used between the LSTM cells to minimize the hardware design scale. In particular, we designed a circuit that requires only one memristor crossbar for each unit in the LSTM cell. The activation function, including sigmoid and tanh (hyperbolic tangent function), involved in each unit is approximated by a piecewise function, which is designed with the corresponding hardware. To verify the effectiveness of the system we designed, we test it on IMDB and SemEval datasets. Considering the huge impact of the dimensions of the input data on the scale of the hardware design, we use word2vector instead of one-hot encoding for the input data encoding. With the parameter sharing mechanism, the transformed vectors are input in different periods, so only 65 memristive crossbars are needed in the entire system to complete the sentiment analysis of the input text. The experimental results verify the effectiveness of our proposed MLSTM system.

86 citations


Proceedings ArticleDOI
09 Feb 2021
TL;DR: SwiftNet as discussed by the authors compresses spatiotemporal redundancy in matching-based VOS via Pixel-Adaptive Memory (PAM), which adaptively triggers memory updates on frames where objects display noteworthy inter-frame variations.
Abstract: In this work we present SwiftNet for real-time semisupervised video object segmentation (one-shot VOS), which reports 77.8% $\mathcal{J}\& \mathcal{F}$ and 70 FPS on DAVIS 2017 validation dataset, leading all present solutions in overall accuracy and speed performance. We achieve this by elaborately compressing spatiotemporal redundancy in matching-based VOS via Pixel-Adaptive Memory (PAM). Temporally, PAM adaptively triggers memory updates on frames where objects display noteworthy inter-frame variations. Spatially, PAM selectively performs memory update and match on dynamic pixels while ignoring the static ones, significantly reducing redundant computations wasted on segmentation-irrelevant pixels. To promote efficient reference encoding, light-aggregation encoder is also introduced in SwiftNet deploying reversed sub-pixel. We hope SwiftNet could set a strong and efficient baseline for real-time VOS and facilitate its application in mobile vision. The source code of SwiftNet can be found at https://github.com/haochenheheda/SwiftNet.

75 citations


Proceedings ArticleDOI
20 Jun 2021
TL;DR: This work proposes a dynamic prototype unit (DPU) to encode the normal dynamics as prototypes in real time, free from extra memory cost, and introduces meta-learning to the authors' DPU to form a novel few-shot normalcy learner, namely Meta-Prototype Unit (MPU).
Abstract: Frame reconstruction (current or future frame) based on Auto-Encoder (AE) is a popular method for video anomaly detection. With models trained on the normal data, the reconstruction errors of anomalous scenes are usually much larger than those of normal ones. Previous methods introduced the memory bank into AE, for encoding diverse normal patterns across the training videos. However, they are memory-consuming and cannot cope with unseen new scenarios in the testing data. In this work, we propose a dynamic prototype unit (DPU) to encode the normal dynamics as prototypes in real time, free from extra memory cost. In addition, we introduce meta-learning to our DPU to form a novel few-shot normalcy learner, namely Meta-Prototype Unit (MPU). It enables the fast adaption capability on new scenes by only consuming a few iterations of update. Extensive experiments are conducted on various benchmarks. The superior performance over the state-of-the-art demonstrates the effectiveness of our method. Our code is available at https://github.com/ktr-hubrt/MPN/.

69 citations


Proceedings Article
29 Jul 2021
TL;DR: Li et al. as discussed by the authors proposed image RPE (iRPE), which considers directional relative distance modeling as well as the interactions between queries and relative position embeddings in self-attention mechanism.
Abstract: Relative position encoding (RPE) is important for transformer to capture sequence ordering of input tokens. General efficacy has been proven in natural language processing. However, in computer vision, its efficacy is not well studied and even remains controversial, e.g., whether relative position encoding can work equally well as absolute position? In order to clarify this, we first review existing relative position encoding methods and analyze their pros and cons when applied in vision transformers. We then propose new relative position encoding methods dedicated to 2D images, called image RPE (iRPE). Our methods consider directional relative distance modeling as well as the interactions between queries and relative position embeddings in self-attention mechanism. The proposed iRPE methods are simple and lightweight. They can be easily plugged into transformer blocks. Experiments demonstrate that solely due to the proposed encoding methods, DeiT and DETR obtain up to 1.5% (top-1 Acc) and 1.3% (mAP) stable improvements over their original versions on ImageNet and COCO respectively, without tuning any extra hyperparameters such as learning rate and weight decay. Our ablation and analysis also yield interesting findings, some of which run counter to previous understanding. Code and models are open-sourced at this https URL.

61 citations


Journal ArticleDOI
TL;DR: This work provides a foundation in spiking signal encoding and gives an overview over different application-oriented implementations which utilise the schemes.
Abstract: Biologically inspired spiking neural networks are increasingly popular in the field of artificial intelligence due to their ability to solve complex problems while being power efficient. They do so by leveraging the timing of discrete spikes as main information carrier. Though, industrial applications are still lacking, partially because the question of how to encode incoming data into discrete spike events cannot be uniformly answered. In this paper, we summarise the signal encoding schemes presented in the literature and propose a uniform nomenclature to prevent the vague usage of ambiguous definitions. Therefore we survey both, the theoretical foundations as well as applications of the encoding schemes. This work provides a foundation in spiking signal encoding and gives an overview over different application-oriented implementations which utilise the schemes.

53 citations


Journal ArticleDOI
TL;DR: A deep neural network model with an encoder-decoder architecture that translates images of math formulas into their LaTeX markup sequences and shows state-of-the-art performance on both sequence-based and image-based evaluation metrics.
Abstract: In this paper, we propose a deep neural network model with an encoder–decoder architecture that translates images of math formulas into their LaTeX markup sequences. The encoder is a convolutional neural network that transforms images into a group of feature maps. To better capture the spatial relationships of math symbols, the feature maps are augmented with 2D positional encoding before being unfolded into a vector. The decoder is a stacked bidirectional long short-term memory model integrated with the soft attention mechanism, which works as a language model to translate the encoder output into a sequence of LaTeX tokens. The neural network is trained in two steps. The first step is token-level training using the maximum likelihood estimation as the objective function. At completion of the token-level training, the sequence-level training objective function is employed to optimize the overall model based on the policy gradient algorithm from reinforcement learning. Our design also overcomes the exposure bias problem by closing the feedback loop in the decoder during sequence-level training, i.e., feeding in the predicted token instead of the ground truth token at every time step. The model is trained and evaluated on the IM2LATEX-100 K dataset and shows state-of-the-art performance on both sequence-based and image-based evaluation metrics.

47 citations


Journal ArticleDOI
TL;DR: In this article, a distributed representation for categorical features where each category is mapped to a distinct vector, and the properties of the vector are learned while training a neural network is proposed.
Abstract: Many machine learning algorithms and almost all deep learning architectures are incapable of processing plain texts in their raw form. This means that their input to the algorithms must be numerical in order to solve classification or regression problems. Hence, it is necessary to encode these categorical variables into numerical values using encoding techniques. Categorical features are common and often of high cardinality. One-hot encoding in such circumstances leads to very high dimensional vector representations, raising memory and computability concerns for machine learning models. This paper proposes a deep-learned embedding technique for categorical features encoding on categorical datasets. Our technique is a distributed representation for categorical features where each category is mapped to a distinct vector, and the properties of the vector are learned while training a neural network. First, we create a data vocabulary that includes only categorical data, and then we use word tokenization to make each categorical data a single word. After that, feature learning is introduced to map all of the categorical data from the vocabulary to word vectors. Three different datasets provided by the University of California Irvine (UCI) are used for training. The experimental results show that the proposed deep-learned embedding technique for categorical data provides a higher F1 score of 89% than 71% of one-hot encoding, in the case of the Long short-term memory (LSTM) model. Moreover, the deep-learned embedding technique uses less memory and generates fewer features than one-hot encoding.

47 citations


Proceedings ArticleDOI
11 Jun 2021
TL;DR: HR-NAS as mentioned in this paper adopts a multi-branch architecture that provides convolutional encoding of multiple feature resolutions and proposes an efficient fine-grained search strategy to train HR-NAS, which effectively explores the search space, and finds optimal architectures given various tasks and computation resources.
Abstract: High-resolution representations (HR) are essential for dense prediction tasks such as segmentation, detection, and pose estimation. Learning HR representations is typically ignored in previous Neural Architecture Search (NAS) methods that focus on image classification. This work proposes a novel NAS method, called HR-NAS, which is able to find efficient and accurate networks for different tasks, by effectively encoding multiscale contextual information while maintaining high-resolution representations. In HR-NAS, we renovate the NAS search space as well as its searching strategy. To better encode multiscale image contexts in the search space of HR-NAS, we first carefully design a lightweight transformer, whose computational complexity can be dynamically changed with respect to different objective functions and computation budgets. To maintain high-resolution representations of the learned networks, HR-NAS adopts a multi-branch architecture that provides convolutional encoding of multiple feature resolutions, inspired by HRNet [73]. Last, we proposed an efficient fine-grained search strategy to train HR-NAS, which effectively explores the search space, and finds optimal architectures given various tasks and computation resources. As shown in Fig. 1 (a), HR-NAS is capable of achieving state-of-the-art trade-offs between performance and FLOPs for three dense prediction tasks and an image classification task, given only small computational budgets. For example, HR-NAS surpasses SqueezeNAS [63] that is specially designed for semantic segmentation while improving efficiency by 45.9%. Code is available at https://github.com/dingmyu/HR-NAS.

Proceedings ArticleDOI
01 Feb 2021
TL;DR: In this paper, the authors proposed a novel architecture, LookHD, which enables real-time hyperdimensional computing (HDC) learning on low-power edge devices by exploiting computation reuse to memorize the encoding module and simplify its computation with single memory access.
Abstract: Today’s applications are using machine learning algorithms to analyze the data collected from a swarm of devices on the Internet of Things (IoT). However, most existing learning algorithms are overcomplex to enable real-time learning on IoT devices with limited resources and computing power. Recently, Hyperdimensional computing (HDC) is introduced as an alternative computing paradigm for enabling efficient and robust learning. HDC emulates the cognitive task by representing the values as patterns of neural activity in high-dimensional space. HDC first encodes all data points to high-dimensional vectors. It then efficiently performs the learning task using a well-defined set of operations. Existing HDC solutions have two main issues that hinder their deployments on low-power embedded devices: (i) the encoding module is costly, dominating 80% of the entire training performance, (ii) the HDC model size and the computation cost grow significantly with the number of classes in online inference.In this paper, we proposed a novel architecture, LookHD, which enables real-time HDC learning on low-power edge devices. LookHD exploits computation reuse to memorize the encoding module and simplify its computation with single memory access. LookHD also address the inference scalability by exploiting HDC governing mathematics that compresses the HDC trained model into a single hypervector. We present how the proposed architecture can be implemented on the existing low power architectures: ARM processor and FPGA design. We evaluate the efficiency of the proposed approach on a wide range of practical classification problems such as activity recognition, face recognition, and speech recognition. Our evaluations show that LookHD can achieve, on average, $ 28.3\times$ faster and $ 97.4\times$ more energy-efficient training as compared to the state-of-the-art HDC implemented on the FPGA. Similarly, in the inference, LookHD is $ 2.2\times$ faster, $ 4.1\times$ more energy-efficient, and has $ 6.3\times$ smaller model size than the same state-of-the-art algorithms.

Journal ArticleDOI
TL;DR: Zhou et al. as discussed by the authors leverage an autoencoder to encode input data and utilize three factors, hidden representation, reconstruction residual vector, and reconstruction error, as the new representation for the input data.
Abstract: Weakly supervised anomaly detection aims at learning an anomaly detector from a limited amount of labeled data and abundant unlabeled data. Recent works build deep neural networks for anomaly detection by discriminatively mapping the normal samples and abnormal samples to different regions in the feature space or fitting different distributions. However, due to the limited number of annotated anomaly samples, directly training networks with the discriminative loss may not be sufficient. To overcome this issue, this article proposes a novel strategy to transform the input data into a more meaningful representation that could be used for anomaly detection. Specifically, we leverage an autoencoder to encode the input data and utilize three factors, hidden representation, reconstruction residual vector, and reconstruction error, as the new representation for the input data. This representation amounts to encode a test sample with its projection on the training data manifold, its direction to its projection, and its distance to its projection. In addition to this encoding, we also propose a novel network architecture to seamlessly incorporate those three factors. From our extensive experiments, the benefits of the proposed strategy are clearly demonstrated by its superior performance over the competitive methods. Code is available at: https://github.com/yj-zhou/Feature_Encoding_with_AutoEncoders_for_Weakly-supervised_Anomaly_Detection.

Journal ArticleDOI
TL;DR: In this paper, a hybrid model called STL-CNN-PE which combines seasonal-trend decomposition procedure based on loess (STL) and one-dimensional convolutional neural networks (CNN) with positional encoding (PE) was proposed to forecast SWH efficiently and accurately.

Journal ArticleDOI
Qi Wang1
TL;DR: This review discusses the cultural foundation of human memory and details a model of a culturally saturated mnemonic system in which cultural elements constitute and condition various processes of remembering, focusing on memory representation, perceptual encoding, memory function, memory reconstruction, memory expression, and memory socialization.
Abstract: Human memory, as a product of the mind and brain, is inherently private and personal. Yet, arising from the interaction between the organism and its ecology in the course of phylogeny and ontogeny, human memory is also profoundly collective and cultural. In this review, I discuss the cultural foundation of human memory. I start by briefly reflecting on the conception of memory against a historical and cultural background. I then detail a model of a culturally saturated mnemonic system in which cultural elements constitute and condition various processes of remembering, focusing on memory representation, perceptual encoding, memory function, memory reconstruction, memory expression, and memory socialization. Then I discuss research on working memory, episodic memory, and autobiographical memory as examples that further demonstrate how cultural elements shape the processes and consequences of remembering and lay the foundation for human memory. I conclude by outlining some important future directions in memory research.

Journal ArticleDOI
TL;DR: In this paper, a compact, accurate, and bitwidth-programmable in-memory computing (IMC) static random access memory (SRAM) macro, named CAP-RAM, is presented for energy-efficient convolutional neural network (CNN) inference.
Abstract: A compact, accurate, and bitwidth-programmable in-memory computing (IMC) static random-access memory (SRAM) macro, named CAP-RAM, is presented for energy-efficient convolutional neural network (CNN) inference. It leverages a novel charge-domain multiply-and-accumulate (MAC) mechanism and circuitry to achieve superior linearity under process variations compared to conventional IMC designs. The adopted semi-parallel architecture efficiently stores filters from multiple CNN layers by sharing eight standard 6T SRAM cells with one charge-domain MAC circuit. Moreover, up to six levels of bit-width of weights with two encoding schemes and eight levels of input activations are supported. A 7-bit charge-injection SAR (ciSAR) analog-to-digital converter (ADC) getting rid of sample and hold (S&H) and input/reference buffers further improves the overall energy efficiency and throughput. A 65-nm prototype validates the excellent linearity and computing accuracy of CAP-RAM. A single $512\times 128$ macro stores a complete pruned and quantized CNN model to achieve 98.8% inference accuracy on the MNIST data set and 89.0% on the CIFAR-10 data set, with a 573.4-giga operations per second (GOPS) peak throughput and a 49.4-tera operations per second (TOPS)/W energy efficiency.

Journal ArticleDOI
TL;DR: The main novelty is that the proposed SCRL method takes the pairwise, intramodality, and nonpaired intermodality relationships into account simultaneously, thereby improving the semantic consistency of the learned representations for the RS image–voice retrieval.
Abstract: With the development of earth observation technology, massive amounts of remote sensing (RS) images are acquired. To find useful information from these images, cross-modal RS image-voice retrieval provides a new insight. This article aims to study the task of RS image-voice retrieval so as to search effective information from massive amounts of RS data. Existing methods for RS image-voice retrieval rely primarily on the pairwise relationship to narrow the heterogeneous semantic gap between images and voices. However, apart from the pairwise relationship included in the data sets, the intramodality and nonpaired intermodality relationships should also be considered simultaneously since the semantic consistency among nonpaired representations plays an important role in the RS image-voice retrieval task. Inspired by this, a semantics-consistent representation learning (SCRL) method is proposed for RS image-voice retrieval. The main novelty is that the proposed method takes the pairwise, intramodality, and nonpaired intermodality relationships into account simultaneously, thereby improving the semantic consistency of the learned representations for the RS image-voice retrieval. The proposed SCRL method consists of two main steps: 1) semantics encoding and 2) SCRL. First, an image encoding network is adopted to extract high-level image features with a transfer learning strategy, and a voice encoding network with dilated convolution is devised to obtain high-level voice features. Second, a consistent representation space is conducted by modeling the three kinds of relationships to narrow the heterogeneous semantic gap and learn semantics-consistent representations across two modalities. Extensive experimental results on three challenging RS image-voice data sets, including Sydney, UCM, and RSICD image-voice data sets, show the effectiveness of the proposed method.

Posted Content
TL;DR: In this article, Axial Fusion Transformer UNet (AFTer-UNet) is proposed, which takes both advantages of convolutional layers' capability of extracting detailed features and transformers' strength on long sequence modeling.
Abstract: Recent advances in transformer-based models have drawn attention to exploring these techniques in medical image segmentation, especially in conjunction with the U-Net model (or its variants), which has shown great success in medical image segmentation, under both 2D and 3D settings. Current 2D based methods either directly replace convolutional layers with pure transformers or consider a transformer as an additional intermediate encoder between the encoder and decoder of U-Net. However, these approaches only consider the attention encoding within one single slice and do not utilize the axial-axis information naturally provided by a 3D volume. In the 3D setting, convolution on volumetric data and transformers both consume large GPU memory. One has to either downsample the image or use cropped local patches to reduce GPU memory usage, which limits its performance. In this paper, we propose Axial Fusion Transformer UNet (AFTer-UNet), which takes both advantages of convolutional layers' capability of extracting detailed features and transformers' strength on long sequence modeling. It considers both intra-slice and inter-slice long-range cues to guide the segmentation. Meanwhile, it has fewer parameters and takes less GPU memory to train than the previous transformer-based models. Extensive experiments on three multi-organ segmentation datasets demonstrate that our method outperforms current state-of-the-art methods.

Journal ArticleDOI
TL;DR: Using ultra-high-field functional magnetic resonance imaging with an item-based visual recall task, an in-depth comparison of encoding and recall along a spectrum of granularity is conducted, suggesting visual recall is not merely a reactivation of encoding patterns, displaying a different representational structure and localization from encoding, despite some overlap.
Abstract: During memory recall and visual imagery, reinstatement is thought to occur as an echoing of the neural patterns during encoding. However, the precise information in these recall traces is relatively unknown, with previous work primarily investigating either broad distinctions or specific images, rarely bridging these levels of information. Using ultra-high-field (7T) functional magnetic resonance imaging with an item-based visual recall task, we conducted an in-depth comparison of encoding and recall along a spectrum of granularity, from coarse (scenes, objects) to mid (e.g., natural, manmade scenes) to fine (e.g., living room, cupcake) levels. In the scanner, participants viewed a trial-unique item, and after a distractor task, visually imagined the initial item. During encoding, we observed decodable information at all levels of granularity in category-selective visual cortex. In contrast, information during recall was primarily at the coarse level with fine-level information in some areas; there was no evidence of mid-level information. A closer look revealed segregation between voxels showing the strongest effects during encoding and those during recall, and peaks of encoding-recall similarity extended anterior to category-selective cortex. Collectively, these results suggest visual recall is not merely a reactivation of encoding patterns, displaying a different representational structure and localization from encoding, despite some overlap.

Journal ArticleDOI
TL;DR: This paper showed that prospective action plans do not emerge gradually during memory delays but are brought into memory early, in tandem with sensory encoding, which can make memories more effective and robust for serving ensuing behavior.
Abstract: Working memory serves as the buffer between past sensations and future behavior, making it vital to understand not only how we encode and retain sensory information in memory but also how we plan for its upcoming use. We ask when prospective action goals emerge alongside the encoding and retention of visual information in working memory. We show that prospective action plans do not emerge gradually during memory delays but are brought into memory early, in tandem with sensory encoding. This action encoding (i) precedes a second stage of action preparation that adapts to the time of expected memory utilization, (ii) occurs even ahead of an intervening motor task, and (iii) predicts visual memory-guided behavior several seconds later. By bringing prospective action plans into working memory at an early stage, the brain creates a dual (visual-motor) memory code that can make memories more effective and robust for serving ensuing behavior.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors provided an efficient approach to increase the key generation rate of MDI-QKD by adopting multiple degrees of freedom (DOFs) of single photons to generate keys.
Abstract: Measurement-device-independent quantum key distribution (MDI-QKD) provides us a powerful approach to resist all attacks at detection side. Besides the unconditional security, people also seek for high key generation rate, but MDI-QKD has relatively low key generation rate. In this paper, we provide an efficient approach to increase the key generation rate of MDI-QKD by adopting multiple degrees of freedom (DOFs) of single photons to generate keys. Compared with other high-dimension MDI-QKD protocols encoding in one DOF, our protocol is more flexible, for our protocol generating keys in independent subsystems and the detection failure or error in a DOF not affecting the information encoding in other DOFs. Based on above features, our MDI-QKD protocol may have potential application in future quantum communication field.

Journal ArticleDOI
24 Feb 2021
TL;DR: In this article, the authors compared domain-wall encoding with one-hot encoding for three different problems at different sizes of both the problem and the variables, and concluded that domainwall encoding yields superior performance against a variety of metrics.
Abstract: In this article, we experimentally test the performance of the recently proposed domain-wall encoding of discrete variables Chancellor, 2019, on Ising model flux qubit quantum annealers. We compare this encoding with the traditional one-hot methods and find that they outperform the one-hot encoding for three different problems at different sizes of both the problem and the variables. From these results, we conclude that the domain-wall encoding yields superior performance against a variety of metrics furthermore; we do not find a single metric by which one hot performs better. We even find that a 2000Q quantum annealer with a drastically less connected hardware graph but using the domain-wall encoding can outperform the next-generation Advantage processor if that processor uses one-hot encoding.

Journal ArticleDOI
14 Apr 2021
TL;DR: In this article, a dual-frequency biased coding (DFBC) method was proposed to tag targets in a SSVEP-based 48-character virtual speller, in which each target is encoded with a permutation sequence consisting of two permuted flickering periods that flash at different frequencies.
Abstract: How to encode as many targets as possible with a limited-frequency resource is a difficult problem in the practical use of a steady-state visual evoked potential (SSVEP) based brain-computer interface (BCI) speller. To solve this problem, this study developed a novel method called dual-frequency biased coding (DFBC) to tag targets in a SSVEP-based 48-character virtual speller, in which each target is encoded with a permutation sequence consisting of two permuted flickering periods that flash at different frequencies. The proposed paradigm was validated by 11 participants in an offline experiment and 7 participants in an online experiment. Three occipital channels (O1, Oz, and O2) were used to obtain the SSVEP signals for identifying the targets. Based on the coding characteristics of the DFBC method, the proposed approach has the ability of self-correction and thus achieves an accuracy of 76.6% and 79.3% for offline and online experiments, respectively, which outperforms the traditional multiple frequencies sequential coding (MFSC) method. This study demonstrates that DFBC is an efficient method for coding a high number of SSVEP targets with a small number of available frequencies.

Proceedings ArticleDOI
24 Aug 2021
TL;DR: In this paper, a new method is proposed that uses a specialized attention network and contextualized word representations to tackle the task of multi-modal video summarization, which consists of a contextualized video summary controller, multidirectional attention mechanisms, an interactive attention network, and a video summary generator.
Abstract: Traditional video summarization methods generate fixed video representations regardless of user interest. Therefore such methods limit users' expectations in content search and exploration scenarios. Multi-modal video summarization is one of the methods utilized to address this problem. When multi-modal video summarization is used to help video exploration, a text-based query is considered as one of the main drivers of video summary generation, as it is user-defined. Thus, encoding both the text-based query and the video effectively is important for the task of multi-modal video summarization. In this work, a new method is proposed that uses a specialized attention network and contextualized word representations to tackle this task. The proposed model consists of a contextualized video summary controller, multi-modal attention mechanisms, an interactive attention network, and a video summary generator. Based on the evaluation of the existing multi-modal video summarization benchmark, experimental results show that the proposed model is effective with the increase of +5.88% in accuracy and +4.06% increase of F1-score, compared with the state-of-the-art method. https://github.com/Jhhuangkay/GPT2MVS-Generative-Pre-trained-Transformer-2-for-Multi-modal-Video-Summarization.

Journal ArticleDOI
TL;DR: A strategy to load continuous data without post-selection with computational cost is proposed based on the probabilistic quantum memory, a strategy toload binary data in quantum devices, and the FF-QRAM using standard quantum gates, and is suitable for noisy intermediate-scale quantum computers.
Abstract: Loading data in a quantum device is required in several quantum computing applications. Without an efficient loading procedure, the cost to initialize the algorithms can dominate the overall computational cost. A circuit-based quantum random access memory named FF-QRAM can load $M$ M $n$ n -bit patterns with computational cost $O(CMn)$ O ( C M n ) to load continuous data where $C$ C depends on the data distribution. In this article, we propose a strategy to load continuous data without post-selection with computational cost $O(Mn$ O ( M n ). The proposed method is based on the probabilistic quantum memory, a strategy to load binary data in quantum devices, and the FF-QRAM using standard quantum gates, and is suitable for noisy intermediate-scale quantum computers.

Journal ArticleDOI
Hongyu An1, Qiyuan An1, Yang Yi1
01 Aug 2021
TL;DR: Simulation results demonstrate that the proposed associative memory learning method and the corresponding circuit implementations successfully associate the pronunciation and image of digits together, which mimics a human-like associatives memory learning behavior.
Abstract: Associative memory is a widespread self-learning method in biological livings, which enables the nervous system to remember the relationship between two concurrent events. The significance of rebuilding associative memory at a behavior level is not only to reveal a way of designing a brain-like self-learning neuromorphic system but also to explore a method of comprehending the learning mechanism of a nervous system. In this paper, an associative memory learning at a behavior level is realized that successfully associates concurrent visual and auditory information together (pronunciation and image of digits). The task is achieved by associating the large-scale artificial neural networks (ANNs) together instead of relating multiple analog signals. In this way, the information carried and preprocessed by these ANNs can be associated. A neuron has been designed, named signal intensity encoding neurons (SIENs), to encode the output data of the ANNs into the magnitude and frequency of the analog spiking signals. Then, the spiking signals are correlated together with an associative neural network, implemented with a three-dimensional (3-D) memristor array. Furthermore, the selector devices in the traditional memristor cells limiting the design area have been avoided by our novel memristor weight updating scheme. With the novel SIENs, the 3-D memristive synapse, and the proposed memristor weight updating scheme, the simulation results demonstrate that our proposed associative memory learning method and the corresponding circuit implementations successfully associate the pronunciation and image of digits together, which mimics a human-like associative memory learning behavior.

Journal ArticleDOI
TL;DR: The authors proposed explicit and implicit text compression approaches to enhance the Transformer encoding and evaluated models using this approach on several typical downstream tasks that rely on the encoding heavily, and concluded that text compression helps the encoders to learn better language representations.
Abstract: Text encoding is one of the most important steps in Natural Language Processing (NLP). It has been done well by the self-attention mechanism in the current state-of-the-art Transformer encoder, which has brought about significant improvements in the performance of many NLP tasks. Though the Transformer encoder may effectively capture general information in its resulting representations, the backbone information, meaning the gist of the input text, is not specifically focused on. In this paper, we propose explicit and implicit text compression approaches to enhance the Transformer encoding and evaluate models using this approach on several typical downstream tasks that rely on the encoding heavily. Our explicit text compression approaches use dedicated models to compress text, while our implicit text compression approach simply adds an additional module to the main model to handle text compression. We propose three ways of integration, namely backbone source-side fusion, target-side fusion, and both-side fusion, to integrate the backbone information into Transformer-based models for various downstream tasks. Our evaluation on benchmark datasets shows that the proposed explicit and implicit text compression approaches improve results in comparison to strong baselines. We therefore conclude, when comparing the encodings to the baseline models, text compression helps the encoders to learn better language representations.

Journal ArticleDOI
01 Mar 2021
TL;DR: The Parameter Constrained Spectral Encoder and Decoder (PCSED) - a neural network-based framework is presented for the design of BEST filters in spectroscopic instruments, and the generalizability of PCSED is validated in designing metasurface- and interference-thin-film-based BEST filters.
Abstract: Computational spectroscopic instruments with Broadband Encoding Stochastic (BEST) filters allow the reconstruction of the spectrum at high precision with only a few filters. However, conventional design manners of BEST filters are often heuristic and may fail to fully explore the encoding potential of BEST filters. The Parameter Constrained Spectral Encoder and Decoder (PCSED) - a neural network-based framework is presented for the design of BEST filters in spectroscopic instruments. By incorporating the target spectral response definition and the optical design procedures comprehensively, PCSED links the mathematical optimum and practical limits confined by available fabrication techniques. Benefiting from this, the BEST-filter-based spectral camera present a higher reconstruction accuracy with up to 30 times' enhancement and a better tolerance on fabrication errors. The generalizability of PCSED is validated in designing metasurface- and interference-thin-film-based BEST filters.

Journal ArticleDOI
TL;DR: This work presents WorkMATe, a neural network architecture that models cognitive control over working memory content and learns the appropriate control operations needed to solve complex working memory tasks and provides a new solution for the neural implementation of flexible memory control.
Abstract: Working memory is essential: it serves to guide intelligent behavior of humans and nonhuman primates when task-relevant stimuli are no longer present to the senses. Moreover, complex tasks often require that multiple working memory representations can be flexibly and independently maintained, prioritized, and updated according to changing task demands. Thus far, neural network models of working memory have been unable to offer an integrative account of how such control mechanisms can be acquired in a biologically plausible manner. Here, we present WorkMATe, a neural network architecture that models cognitive control over working memory content and learns the appropriate control operations needed to solve complex working memory tasks. Key components of the model include a gated memory circuit that is controlled by internal actions, encoding sensory information through untrained connections, and a neural circuit that matches sensory inputs to memory content. The network is trained by means of a biologically plausible reinforcement learning rule that relies on attentional feedback and reward prediction errors to guide synaptic updates. We demonstrate that the model successfully acquires policies to solve classical working memory tasks, such as delayed recognition and delayed pro-saccade/anti-saccade tasks. In addition, the model solves much more complex tasks, including the hierarchical 12-AX task or the ABAB ordered recognition task, both of which demand an agent to independently store and updated multiple items separately in memory. Furthermore, the control strategies that the model acquires for these tasks subsequently generalize to new task contexts with novel stimuli, thus bringing symbolic production rule qualities to a neural network architecture. As such, WorkMATe provides a new solution for the neural implementation of flexible memory control.

Journal ArticleDOI
TL;DR: This comprehensive investigation of a variety of word memorability demonstrates that semantic and function-related psycholinguistic properties play an important role in verbal memory processes.
Abstract: What makes some words more memorable than others? Words can vary in many dimensions, and a variety of lexical, semantic, and affective properties have previously been associated with variability in recall performance. Free recall data were used from 147 participants across 20 experimental sessions from the Penn Electrophysiology of Encoding and Retrieval Study (PEERS) data set, across 1,638 words. Here, I consider how well 20 different word properties—across lexical, semantic, and affective dimensions—relate to free recall. Semantic dimensions, particularly animacy (better memory for living), usefulness (with respect to survival; better memory for useful), and size (better memory for larger) demonstrated the strongest relationships with recall probability. These key results were then examined and replicated in the free recall data from Lau, Goh, and Yap (Quarterly Journal of Experimental Psychology, 71, 2207–2222, 2018), which had 532 words and 116 participants. This comprehensive investigation of a variety of word memorability demonstrates that semantic and function-related psycholinguistic properties play an important role in verbal memory processes.