scispace - formally typeset
Search or ask a question

Showing papers on "Encoding (memory) published in 2022"


Journal ArticleDOI
TL;DR: DSwarm-Net as mentioned in this paper employs deep learning and swarm intelligence-based metaheuristic for HAR that uses 3D skeleton data for action classification and extracts four different types of features from the skeletal data namely: Distance, Distance Velocity, Angle, and Angle Velocity, which capture complementary information from the skeleton joints for encoding them into images.
Abstract: Abstract Human Action Recognition (HAR) is a popular area of research in computer vision due to its wide range of applications such as surveillance, health care, and gaming, etc. Action recognition based on 3D skeleton data allows simplistic, cost-efficient models to be formed making it a widely used method. In this work, we propose DSwarm-Net , a framework that employs deep learning and swarm intelligence-based metaheuristic for HAR that uses 3D skeleton data for action classification. We extract four different types of features from the skeletal data namely: Distance, Distance Velocity, Angle, and Angle Velocity, which capture complementary information from the skeleton joints for encoding them into images. Encoding the skeleton data features into images is an alternative to the traditional video-processing approach and it helps in making the classification task less complex. The Distance and Distance Velocity encoded images have been stacked depth-wise and fed into a Convolutional Neural Network model which is a modified version of Inception-ResNet. Similarly, the Angle and Angle Velocity encoded images have been stacked depth-wise and fed into the same network. After training these models, deep features have been extracted from the pre-final layer of the networks, and the obtained feature representation is optimized by a nature-inspired metaheuristic, called Ant Lion Optimizer, to eliminate the non-informative or misleading features and to reduce the dimensionality of the feature set. DSwarm-Net has been evaluated on three publicly available HAR datasets, namely UTD-MHAD, HDM05, and NTU RGB+D 60 achieving competitive results, thus confirming the superiority of the proposed model compared to state-of-the-art models.

52 citations


Proceedings ArticleDOI
01 Jan 2022-Findings
TL;DR: This paper proposes a multi-task method to incorporate the multi-field information into BERT, which improves its news encoding capability and modify the gradients of auxiliary tasks based on their gradient conflicts with the main task, which further boosts the model performance.
Abstract: Existing news recommendation methods usually learn news representations solely based on news titles. To sufficiently utilize other fields of news information such as category and entities, some methods treat each field as an additional feature and combine different feature vectors with attentive pooling. With the adoption of large pre-trained models like BERT in news recommendation, the above way to incorporate multi-field information may encounter challenges: the shallow feature encoding to compress the category and entity information is not compatible with the deep BERT encoding. In this paper, we propose a multi-task method to incorporate the multi-field information into BERT, which improves its news encoding capability. Besides, we modify the gradients of auxiliary tasks based on their gradient conflicts with the main task, which further boosts the model performance. Extensive experiments on the MIND news recommendation benchmark show the effectiveness of our approach.

44 citations


Journal ArticleDOI
TL;DR: In this article , the authors identify neurons in the human brain whose responses to cognitive boundaries predict memory encoding success and mark timepoints that are reinstated during retrieval, mirroring a fundamental behavioral tradeoff between content and time memory.
Abstract: While experience is continuous, memories are organized as discrete events. Cognitive boundaries are thought to segment experience and structure memory, but how this process is implemented remains unclear. We recorded the activity of single neurons in the human medial temporal lobe (MTL) during the formation and retrieval of memories with complex narratives. Here, we show that neurons responded to abstract cognitive boundaries between different episodes. Boundary-induced neural state changes during encoding predicted subsequent recognition accuracy but impaired event order memory, mirroring a fundamental behavioral tradeoff between content and time memory. Furthermore, the neural state following boundaries was reinstated during both successful retrieval and false memories. These findings reveal a neuronal substrate for detecting cognitive boundaries that transform experience into mnemonic episodes and structure mental time travel during retrieval. Continuous experience is segmented into discrete mnemonic episodes. The authors identify neurons in the human brain whose responses to cognitive boundaries predict memory encoding success and mark timepoints that are reinstated during retrieval.

40 citations


Journal ArticleDOI
TL;DR: Li et al. as discussed by the authors proposed a multi-scale residual encoding and decoding network (Ms RED) for skin lesion segmentation, which is able to accurately and reliably segment a variety of lesions with efficiency.

37 citations


Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper proposed a multi-scale residual encoding and decoding network (Ms RED) for skin lesion segmentation, which is able to accurately and reliably segment a variety of lesions with efficiency.

37 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed an accident detection approach based on spatio-temporal feature encoding with a multilayer neural network, which achieved promising detection accuracy and efficiency for traffic accident detection, and meets the real-time detection requirement in the VANET environment.
Abstract: In the Vehicular Ad hoc Networks (VANET) environment, recognizing traffic accident events in the driving videos captured by vehicle-mounted cameras is an essential task. Generally, traffic accidents have a short duration in driving videos, and the backgrounds of driving videos are dynamic and complex. These make traffic accident detection quite challenging. To effectively and efficiently detect accidents from the driving videos, we propose an accident detection approach based on spatio–temporal feature encoding with a multilayer neural network. Specifically, the multilayer neural network is used to encode the temporal features of video for clustering the video frames. From the obtained frame clusters, we detect the border frames as the potential accident frames. Then, we capture and encode the spatial relationships of the objects detected from these potential accident frames to confirm whether these frames are accident frames. The extensive experiments demonstrate that the proposed approach achieves promising detection accuracy and efficiency for traffic accident detection, and meets the real-time detection requirement in the VANET environment.

35 citations


Journal ArticleDOI
TL;DR: In this paper , a template-based single-step retrosynthesis model based on Modern Hopfield Networks is introduced, which learns an encoding of both molecules and reaction templates in order to predict the relevance of templates for a given molecule.
Abstract: Finding synthesis routes for molecules of interest is essential in the discovery of new drugs and materials. To find such routes, computer-assisted synthesis planning (CASP) methods are employed, which rely on a single-step model of chemical reactivity. In this study, we introduce a template-based single-step retrosynthesis model based on Modern Hopfield Networks, which learn an encoding of both molecules and reaction templates in order to predict the relevance of templates for a given molecule. The template representation allows generalization across different reactions and significantly improves the performance of template relevance prediction, especially for templates with few or zero training examples. With inference speed up to orders of magnitude faster than baseline methods, we improve or match the state-of-the-art performance for top-k exact match accuracy for k ≥ 3 in the retrosynthesis benchmark USPTO-50k. Code to reproduce the results is available at github.com/ml-jku/mhn-react.

29 citations


Proceedings ArticleDOI
01 Jun 2022
TL;DR: NICE-SLAM as discussed by the authors incorporates multi-level local information by introducing a hierarchical scene representation and optimizing this representation with pre-trained geometric priors enables detailed reconstruction on large indoor scenes.
Abstract: Neural implicit representations have recently shown encouraging results in various domains, including promising progress in simultaneous localization and mapping (SLAM). Nevertheless, existing methods produce over- smoothed scene reconstructions and have difficulty scaling up to large scenes. These limitations are mainly due to their simple fully-connected network architecture that does not incorporate local information in the observations. In this paper, we present NICE-SLAM, a dense SLAM system that incorporates multi-level local information by introducing a hierarchical scene representation. Optimizing this representation with pre-trained geometric priors enables detailed reconstruction on large indoor scenes. Compared to recent neural implicit SLAM systems, our approach is more scalable, efficient, and robust. Experiments on five challenging datasets demonstrate competitive results of NICE-SLAM in both mapping and tracking quality. Project page: https://pengsongyou.github.io/nice-slam.

29 citations


Journal ArticleDOI
TL;DR: In this article , a deep privacy-encoding-based federated learning (FL) framework is proposed to achieve the target of privacy in smart agriculture applications, which adopts a perturbation-based encoding and long short-term memory-autoencoder technique.
Abstract: Smart agriculture (SA) incorporates low-cost and low-energy-consuming sensors and devices to enhance quantitative and qualitative agricultural production. However, this device uses an open communication channel, i.e., Internet, and generates large amount of data in real time and, thus, has the potential to be misused. As a consequence, the major concern in the implementation of SA is minimizing the risk of security and data privacy violation (e.g., adversaries performing inference attacks). To address these challenges, we propose PEFL, a deep privacy-encoding-based federated learning (FL) framework that adopts a perturbation-based encoding and long short-term memory-autoencoder technique to achieve the target of privacy. Then, an FL-based gated recurrent unit neural network algorithm (FedGRU) is designed using the encoded data for intrusion detection. The experimental results based on the ToN-IoT data set reveal that the PEFL can efficiently identify normal and attack patterns after transformation over other non-FL and FL methods.

28 citations


Journal ArticleDOI
TL;DR: In this article , a dual-mechanism framework of value-directed remembering is proposed, in which both strategic and automatic processes lead to differential encoding of valuable information, which can better support retrieval.
Abstract: The ability to prioritize valuable information is critical for the efficient use of memory in daily life. When information is important, we engage more effective encoding mechanisms that can better support retrieval. Here, we describe a dual-mechanism framework of value-directed remembering in which both strategic and automatic processes lead to differential encoding of valuable information. Strategic processes rely on metacognitive awareness of effective deep encoding strategies that allow younger and healthy older adults to selectively remember important information. In contrast, some high-value information may also be encoded automatically in the absence of intention to remember, but this may be more impaired in older age. These different mechanisms are subserved by different neural substrates, with left-hemisphere semantic processing regions active during the strategic encoding of high-value items, and automatic enhancement of encoding of high-value items may be supported by activation of midbrain dopaminergic projections to the hippocampal region.

27 citations


Journal ArticleDOI
TL;DR: In this article , an approach based on variational encoding is proposed for evaluating aircraft engine monitoring data. But it is not suitable for the analysis of the data collected from the Turbofan engine.

Journal ArticleDOI
TL;DR: In this paper , the distributed recursive fault estimation problem for a class of discrete time-varying systems with binary encoding schemes over a sensor network is investigated, and the fault signal with zero second-order difference is taken into account to reflect the sensor failures.
Abstract: In this paper, we investigate the distributed recursive fault estimation problem for a class of discrete time-varying systems with binary encoding schemes over a sensor network. The fault signal with zero second-order difference is taken into account to reflect the sensor failures. Since the communication bandwidth in practice is constrained, the binary encoding schemes are exploited to regulate the signal transmission from the neighbouring sensors to the local fault estimator. In addition, due to the influence of channel noises, each bit might change with a small crossover probability. In the presence of sensor faults and bit errors, an upper bound for the estimation error covariance matrix is ensured and minimized at each time step via designing the gain matrices of the estimator. Finally, the effectiveness of the method is verified by a simulation.

Journal ArticleDOI
TL;DR: An evolving block-based CNN (EB-CNN) to search the optimal architecture based on the genetic algorithm automatically for HSI classification, leading to its better usability than handcrafted CNNs.
Abstract: Deep Convolutional Neural Network (CNN) shows excellent effectiveness on hyperspectral image (HSI) classification. However, the architecture design of CNN requires abundant expert knowledge and experience, which poses great prohibition to its wide application in real-world engineering. To alleviate the issue, this paper proposes an evolving block-based CNN (EB-CNN) to search the optimal architecture based on the genetic algorithm automatically. Specifically, two kinds of basic blocks with totally six different configurations are first designed to construct the search space. Then, a flexible encoding strategy is devised for the genetic algorithm to allow different chromosomes to evolve with different lengths. In this manner, the width of each layer and the depth of the architecture can be simultaneously optimized. Furthermore, a novel swapping mutation operator is proposed for the genetic algorithm to speed up the search efficiency and save computing resources. With the above techniques, the proposed algorithm automatically seeks the optimal CNN architecture for HSI classification, leading to its better usability than handcrafted CNNs. At last, extensive experiments conducted on 5 commonly used HSI datasets demonstrate that the proposed EB-CNN achieves highly competitive or even better performance, as compared to state-of-the-art peer algorithms.

Journal ArticleDOI
TL;DR: In this article , an 8 × 8 array of robust, low-power, and bio-inspired crypto engines was integrated with IoT edge sensors based on two-dimensional (2D) memtransistors.
Abstract: Abstract In the emerging era of the internet of things (IoT), ubiquitous sensors continuously collect, consume, store, and communicate a huge volume of information which is becoming increasingly vulnerable to theft and misuse. Modern software cryptosystems require extensive computational infrastructure for implementing ciphering algorithms, making them difficult to be adopted by IoT edge sensors that operate with limited hardware resources and at low energy budgets. Here we propose and experimentally demonstrate an “all-in-one” 8 × 8 array of robust, low-power, and bio-inspired crypto engines monolithically integrated with IoT edge sensors based on two-dimensional (2D) memtransistors. Each engine comprises five 2D memtransistors to accomplish sensing and encoding functionalities. The ciphered information is shown to be secure from an eavesdropper with finite resources and access to deep neural networks. Our hardware platform consists of a total of 320 fully integrated monolayer MoS 2 -based memtransistors and consumes energy in the range of hundreds of picojoules and offers near-sensor security.

Journal ArticleDOI
06 May 2022-Science
TL;DR: The authors found that neurons encoding conflict probability, conflict, and error in one or both tasks were intermixed, forming a representational geometry that simultaneously allowed task specialization and generalization.
Abstract: Controlling behavior to flexibly achieve desired goals depends on the ability to monitor one’s own performance. It is unknown how performance monitoring can be both flexible, to support different tasks, and specialized, to perform each task well. We recorded single neurons in the human medial frontal cortex while subjects performed two tasks that involve three types of cognitive conflict. Neurons encoding conflict probability, conflict, and error in one or both tasks were intermixed, forming a representational geometry that simultaneously allowed task specialization and generalization. Neurons encoding conflict retrospectively served to update internal estimates of conflict probability. Population representations of conflict were compositional. These findings reveal how representations of evaluative signals can be both abstract and task-specific and suggest a neuronal mechanism for estimating control demand.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed an anisotropic Gaussian coordinate coding method to describe the skeleton direction cues among adjacent keypoints, and a multi-loss function is proposed to constrain the output to prevent the overfitting.
Abstract: Human pose estimation (HPE) has many wide applications such as multimedia processing, behavior understanding and human-computer interaction. Most previous studies have encountered many constraints, such as restricted scenarios and RGB inputs. To mitigate constraints to estimating the human poses in general scenarios, we present an efficient human pose estimation model (i.e., EHPE) with joint direction cues and Gaussian coordinate encoding. Specifically, we propose an anisotropic Gaussian coordinate coding method to describe the skeleton direction cues among adjacent keypoints. To the best of our knowledge, this is the first time that the skeleton direction cues is introduced to the heatmap encoding in HPE task. Then, a multi-loss function is proposed to constrain the output to prevent the overfitting. The Kullback-Leibler divergence is introduced to measure the predication label and its ground truth one. The performance of EHPE is evaluated on two HPE datasets: MS COCO and MPII. Experimental results demonstrate that EHPE can obtain robust results, and it significantly outperforms existing state-of-the-art HPE methods. Lastly, we extend the experiments on infrared images captured by our research group. The experiments achieved the impressive results regardless of insufficient color and texture information.

Journal ArticleDOI
TL;DR: In this paper , a color image is encoded into a DNA sequence using randomly selected row-level encoding rules and a 4D-Hyperchaotic system is used to generate pseudo-random sequences to permutate image information at bit-level and block-level.
Abstract: • A hyperchaotic system is used to shuffle image at block-level and bit-level. • Row-level dynamic and random DNA encoding of every color plane of image. • Hardness of Elliptic curve discrete logarithmic problem. • A novel digital signature, Mutual authentication and key exchange scheme. • Low computational complexity and robustness to various types of security attacks. This paper takes a holistic approach to propose a comprehensive framework for color image encryption with some novel features. The framework is designed around a secure encryption and decryption scheme leveraging dynamic DNA encoding, hyperchaotic system and elliptic curve cryptography. The novel features being mutually authenticated key generation and exchange based on split share and digital signature for image authentication. A color image is encoded into a DNA sequence using randomly selected row-level encoding rules. A novel 4D-Hyperchaotic system is used to generate pseudo-random sequences to permutate image information at bit-level and block-level. The multidimensional Hyperchaotic system increases non-periodicity, ergodicity, and unpredictability than a simple chaotic system. An elliptic curve-based substitution system is employed to achieve computationally efficient encryption and authentication. Different subkeys have been employed to increase the key space and embed confusion and diffusion into the proposed scheme. Results and analysis show that the proposed framework is computationally efficient and robust to different types of attacks and cryptanalysis carried over the images

Journal ArticleDOI
TL;DR: Li et al. as discussed by the authors proposed an encoder-decoder model based on deep learning to establish the mapping relationship between battery charging curves and the value of SOH, which consists of encoder and decoder consisting of two-dimensional convolution module, ultra-lightweight subspace attention mechanism (ULSAM) module and simple recurrent unit (SRU) module.
Abstract: Accurate estimation of state of health (SOH) of lithium-ion batteries is an important guarantee for the safe and stable operation of these batteries, which is a key technology in battery management system (BMS). The charging curves of lithium-ion batteries with different aging degrees are also different. Based on this fact, this paper proposes an encoder-decoder model based on deep learning to establish the mapping relationship between battery charging curves and the value of SOH. The model consists of encoder and decoder. The encoder is a hybrid neural network composed of two-dimensional convolution module, ultra-lightweight subspace attention mechanism (ULSAM) module and simple recurrent unit (SRU) module, which can effectively encode the sampling data of the charging curves and generate the encoding sequence. The decoder is mainly composed of back propagation (BP) neural network, which is responsible for decoding the encoding sequence and output an estimate of the SOH. For long encoding sequence, a decoder with attention mechanism is proposed to improve the estimation accuracy of the model. Experimental results show that the proposed model has good adaptability to different types of batteries, can adapt to various sampling modes of charging curves, and has high estimation accuracy.

Journal ArticleDOI
TL;DR: A dynamic texture feature extraction method based on 3D filter learning and fisher vector coding, trying to achieve good performance by applying learning techniques in the big data environment, which has no demand for a large number of computing resources to solve thebig data classification problem.

Journal ArticleDOI
TL;DR: In this article , an event-based encoding strategy is developed to encode the measurement signals into 1-bit codewords so as to reduce network resource consumption, and a necessary condition is proposed to derive a lower bound of the bit rate below which the decoding error diverges.

Proceedings ArticleDOI
23 May 2022
TL;DR: An online per-title encoding scheme (OPTE) for live video streaming applications that predicts each target bitrate’s optimal resolution from any pre-defined set of resolutions using Discrete Cosine Transform (DCT)-energy-based low-complexity spatial and temporal features for each video segment.
Abstract: Current per-title encoding schemes encode the same video content at various bitrates and spatial resolutions to find an optimized bitrate ladder for each video content in Video on Demand (VoD) applications. However, in live streaming applications, a bitrate ladder with fixed bitrate-resolution pairs is used to avoid the additional latency caused to find optimum bitrate-resolution pairs for every video content. This paper introduces an online per-title encoding scheme (OPTE) for live video streaming applications. In this scheme, each target bitrate’s optimal resolution is predicted from any pre-defined set of resolutions using Discrete Cosine Transform (DCT)-energy-based low-complexity spatial and temporal features for each video segment. Experimental results show that, on average, OPTE yields bitrate savings of 20.45% and 28.45% to maintain the same PSNR and VMAF, respectively, compared to a fixed bitrate ladder scheme (as adopted in current live streaming deployments) without any noticeable additional latency in streaming.

Journal ArticleDOI
TL;DR: The authors found that neurons with high Fos induction form ensembles of cells with highly correlated activity, exhibit reliable place fields that evenly tile the environment and have more stable tuning across days than nearby non-Fos-induced cells.
Abstract: Abstract In the hippocampus, spatial maps are formed by place cells while contextual memories are thought to be encoded as engrams 1–6 . Engrams are typically identified by expression of the immediate early gene Fos , but little is known about the neural activity patterns that drive, and are shaped by, Fos expression in behaving animals 7–10 . Thus, it is unclear whether Fos-expressing hippocampal neurons also encode spatial maps and whether Fos expression correlates with and affects specific features of the place code 11 . Here we measured the activity of CA1 neurons with calcium imaging while monitoring Fos induction in mice performing a hippocampus-dependent spatial learning task in virtual reality. We find that neurons with high Fos induction form ensembles of cells with highly correlated activity, exhibit reliable place fields that evenly tile the environment and have more stable tuning across days than nearby non-Fos-induced cells. Comparing neighbouring cells with and without Fos function using a sparse genetic loss-of-function approach, we find that neurons with disrupted Fos function have less reliable activity, decreased spatial selectivity and lower across-day stability. Our results demonstrate that Fos-induced cells contribute to hippocampal place codes by encoding accurate, stable and spatially uniform maps and that Fos itself has a causal role in shaping these place codes. Fos ensembles may therefore link two key aspects of hippocampal function: engrams for contextual memories and place codes that underlie cognitive maps.

Journal ArticleDOI
01 Feb 2022
TL;DR: In this article , a multi-objective hybrid driving algorithm (HDA) based on a three-layer encoding method with a heuristic rule is proposed to effectively address the partial disassembly line balancing problem with multi-robot workstations that can synchronously disassemble multiple products.
Abstract: To address the problem of considerable waste electromechanical product generation, a partial disassembly line balancing problem with multi-robot workstations that can synchronously disassemble multiple products (MPR-PDLBP) is investigated to improve the product capacity and efficiency of existing disassembly lines. First, an exact mixed-integer programming model is established to accurately obtain the minimum disassembly objectives: cycle time, energy consumption, and improved hazardous index. Second, compared with the conventional disassembly line balancing problem (DLBP), the solution space and optimization difficulty of MPR-PDLBP increase significantly. Thus, a multi-objective hybrid driving algorithm (HDA) based on a three-layer encoding method with a heuristic rule is proposed to effectively address MPR-PDLBP, and a driving strategy is proposed to improve the exploitation ability and convergence speed of HDA. Finally, validity of the proposed model and algorithm are verified by comparing the calculation results of GUROBI and HDA for two small-scale cases. The superiority of HDA is proved by comparing the optimization results of a large-scale case with three other classic algorithms.

Journal ArticleDOI
TL;DR: In this article , a color image is encoded into a DNA sequence using randomly selected row-level encoding rules and a 4D-Hyperchaotic system is used to generate pseudo-random sequences to permutate image information at bit-level and block-level.
Abstract: • A hyperchaotic system is used to shuffle image at block-level and bit-level. • Row-level dynamic and random DNA encoding of every color plane of image. • Hardness of Elliptic curve discrete logarithmic problem. • A novel digital signature, Mutual authentication and key exchange scheme. • Low computational complexity and robustness to various types of security attacks. This paper takes a holistic approach to propose a comprehensive framework for color image encryption with some novel features. The framework is designed around a secure encryption and decryption scheme leveraging dynamic DNA encoding, hyperchaotic system and elliptic curve cryptography. The novel features being mutually authenticated key generation and exchange based on split share and digital signature for image authentication. A color image is encoded into a DNA sequence using randomly selected row-level encoding rules. A novel 4D-Hyperchaotic system is used to generate pseudo-random sequences to permutate image information at bit-level and block-level. The multidimensional Hyperchaotic system increases non-periodicity, ergodicity, and unpredictability than a simple chaotic system. An elliptic curve-based substitution system is employed to achieve computationally efficient encryption and authentication. Different subkeys have been employed to increase the key space and embed confusion and diffusion into the proposed scheme. Results and analysis show that the proposed framework is computationally efficient and robust to different types of attacks and cryptanalysis carried over the images

Journal ArticleDOI
TL;DR: In this paper , the authors used behavioral correlates of two large-scale heteromodal networks at rest, the default mode (DMN) and frontoparietal (FPN) networks, to understand their contributions to distinct features of WM.
Abstract: Working memory (WM) allows goal-relevant information to be encoded and maintained in mind, even when the contents of WM are incongruent with the immediate environment. While regions of heteromodal cortex are important for WM, the neural mechanisms that relate to individual differences in the encoding and maintenance of goal-relevant information remain unclear. Here, we used behavioral correlates of two large-scale heteromodal networks at rest, the default mode (DMN) and frontoparietal (FPN) networks, to understand their contributions to distinct features of WM. We assessed each individual's ability to resist distracting information during the encoding and maintenance phases of a visuospatial WM task. Individuals with stronger connectivity of DMN with medial visual and retrosplenial cortex were less affected by encoding distraction. Conversely, weaker connectivity of both DMN and FPN with visual regions was associated with better WM performance when target information was no longer in the environment and distractors were presented in the maintenance phase. Our study suggests that stronger coupling between heteromodal cortex and visual-spatial regions supports WM encoding by reducing the influence of concurrently presented distractors, while weaker visual coupling is associated with better maintenance of goal-relevant information because it relates to the capacity to ignore task-irrelevant changes in the environment.

Proceedings ArticleDOI
19 Apr 2022
TL;DR: This paper focuses on how BERT encodes grammatical number, and on how it uses this encoding to solve the number agreement task, and tries to find an encoding that the model actually uses, introducing a usage-based probing setup.
Abstract: A central quest of probing is to uncover how pre-trained models encode a linguistic property within their representations. An encoding, however, might be spurious—i.e., the model might not rely on it when making predictions. In this paper, we try to find an encoding that the model actually uses, introducing a usage-based probing setup. We first choose a behavioral task which cannot be solved without using the linguistic property. Then, we attempt to remove the property by intervening on the model’s representations. We contend that, if an encoding is used by the model, its removal should harm the performance on the chosen behavioral task. As a case study, we focus on how BERT encodes grammatical number, and on how it uses this encoding to solve the number agreement task. Experimentally, we find that BERT relies on a linear encoding of grammatical number to produce the correct behavioral output. We also find that BERT uses a separate encoding of grammatical number for nouns and verbs. Finally, we identify in which layers information about grammatical number is transferred from a noun to its head verb.

Proceedings ArticleDOI
27 Jan 2022
TL;DR: This model utilizes a maximum-mean-discrepancy (MMD) based domain alignment approach to impose domain-invariance for encoded representations, which outperforms state-of-the-art approaches in EEG-based emotion classification.
Abstract: Deep learning based electroencephalography (EEG) signal processing methods are known to suffer from poor test-time generalization due to the changes in data distribution. This becomes a more challenging problem when privacy-preserving representation learning is of interest such as in clinical settings. To that end, we propose a multi-source learning architecture where we extract domain-invariant representations from dataset-specific private encoders. Our model utilizes a maximum-mean-discrepancy (MMD) based domain alignment approach to impose domain-invariance for encoded representations, which outperforms state-of-the-art approaches in EEG-based emotion classification. Furthermore, representations learned in our pipeline preserve domain privacy as dataset-specific private encoding alleviates the need for conventional, centralized EEG-based deep neural network training approaches with shared parameters.

Journal ArticleDOI
TL;DR: In this paper , the authors compared the encoding of learned reward locations in dCA1 and iCA1 during spatial navigation, and found that both of these location-invariant codes persisted over time and together they provided a dual hippocampal reward location code.

Book ChapterDOI
21 Mar 2022
TL;DR: Sem2NeRF as mentioned in this paper proposes a new task, Semantic-to-NeRF translation, that aims to reconstruct a 3D scene modelled by NeRF, conditioned on one single-view semantic mask as input.
Abstract: Image translation and manipulation have gain increasing attention along with the rapid development of deep generative models. Although existing approaches have brought impressive results, they mainly operated in 2D space. In light of recent advances in NeRF-based 3D-aware generative models, we introduce a new task, Semantic-to-NeRF translation, that aims to reconstruct a 3D scene modelled by NeRF, conditioned on one single-view semantic mask as input. To kick-off this novel task, we propose the Sem2NeRF framework. In particular, Sem2NeRF addresses the highly challenging task by encoding the semantic mask into the latent code that controls the 3D scene representation of a pre-trained decoder. To further improve the accuracy of the mapping, we integrate a new region-aware learning strategy into the design of both the encoder and the decoder. We verify the efficacy of the proposed Sem2NeRF and demonstrate that it outperforms several strong baselines on two benchmark datasets. Code and video are available at https://donydchen.github.io/sem2nerf/

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a semantics-consistent representation learning (SCRL) method for remote sensing image-voice retrieval, which takes the pairwise, intra-modality, and non-paired intermodality relationships into account simultaneously, thereby improving the semantic consistency of the learned representations.
Abstract: With the development of earth observation technology, massive amounts of remote sensing (RS) images are acquired. To find useful information from these images, cross-modal RS image-voice retrieval provides a new insight. This paper aims to study the task of RS image-voice retrieval so as to search effective information from massive amounts of RS data. Existing methods for RS image-voice retrieval rely primarily on the pairwise relationship to narrow the heterogeneous semantic gap between images and voices. However, apart from the pairwise relationship included in the datasets, the intra-modality and non-paired inter-modality relationships should also be taken into account simultaneously, since the semantic consistency among non-paired representations plays an important role in the RS image-voice retrieval task. Inspired by this, a semantics-consistent representation learning (SCRL) method is proposed for RS image-voice retrieval. The main novelty is that the proposed method takes the pairwise, intra-modality, and non-paired inter-modality relationships into account simultaneously, thereby improving the semantic consistency of the learned representations for the RS image-voice retrieval. The proposed SCRL method consists of two main steps: 1) semantics encoding and 2) semantics-consistent representation learning. Firstly, an image encoding network is adopted to extract high-level image features with a transfer learning strategy, and a voice encoding network with dilated convolution is devised to obtain high-level voice features. Secondly, a consistent representation space is conducted by modeling the three kinds of relationships to narrow the heterogeneous semantic gap and learn semantics-consistent representations across two modalities. Extensive experimental results on three challenging RS image-voice datasets show the effectiveness of the proposed method.