scispace - formally typeset
Search or ask a question

Showing papers on "Encoding (memory) published in 2018"


Journal ArticleDOI
TL;DR: A closed-loop system is used to decode and stimulate periods of ineffective encoding, showing that stimulation of lateral temporal cortex can enhance memory and suggesting that such systems may provide a therapeutic approach for treating memory dysfunction.
Abstract: Memory failures are frustrating and often the result of ineffective encoding. One approach to improving memory outcomes is through direct modulation of brain activity with electrical stimulation. Previous efforts, however, have reported inconsistent effects when using open-loop stimulation and often target the hippocampus and medial temporal lobes. Here we use a closed-loop system to monitor and decode neural activity from direct brain recordings in humans. We apply targeted stimulation to lateral temporal cortex and report that this stimulation rescues periods of poor memory encoding. This system also improves later recall, revealing that the lateral temporal cortex is a reliable target for memory enhancement. Taken together, our results suggest that such systems may provide a therapeutic approach for treating memory dysfunction.

232 citations


Journal ArticleDOI
TL;DR: In this paper, the authors use algebraic topology to study mesoscale network structures that arise from the arrangement of densely connected substructures called cliques in otherwise sparsely connected brain networks.
Abstract: Encoding brain regions and their connections as a network of nodes and edges captures many of the possible paths along which information can be transmitted as humans process and perform complex behaviors. Because cognitive processes involve large, distributed networks of brain areas, principled examinations of multi-node routes within larger connection patterns can offer fundamental insights into the complexities of brain function. Here, we investigate both densely connected groups of nodes that could perform local computations as well as larger patterns of interactions that would allow for parallel processing. Finding such structures necessitates that we move from considering exclusively pairwise interactions to capturing higher order relations, concepts naturally expressed in the language of algebraic topology. These tools can be used to study mesoscale network structures that arise from the arrangement of densely connected substructures called cliques in otherwise sparsely connected brain networks. We detect cliques (all-to-all connected sets of brain regions) in the average structural connectomes of 8 healthy adults scanned in triplicate and discover the presence of more large cliques than expected in null networks constructed via wiring minimization, providing architecture through which brain network can perform rapid, local processing. We then locate topological cavities of different dimensions, around which information may flow in either diverging or converging patterns. These cavities exist consistently across subjects, differ from those observed in null model networks, and --- importantly --- link regions of early and late evolutionary origin in long loops, underscoring their unique role in controlling brain function. These results offer a first demonstration that techniques from algebraic topology offer a novel perspective on structural connectomics, highlighting loop-like paths as crucial features in the human brain's structural architecture.

230 citations


Proceedings ArticleDOI
02 Jun 2018
TL;DR: This paper investigates widely used DNNs and finds that the major contributors to memory footprint are intermediate layer outputs (feature maps), and introduces a framework for DNN-layer-specific optimizations that significantly reduce this source of main memory pressure on GPUs.
Abstract: Modern deep neural networks (DNNs) training typically relies on GPUs to train complex hundred-layer deep networks A significant problem facing both researchers and industry practitioners is that, as the networks get deeper, the available GPU main memory becomes a primary bottleneck, limiting the size of networks it can train In this paper, we investigate widely used DNNs and find that the major contributors to memory footprint are intermediate layer outputs (feature maps) We then introduce a framework for DNN-layer-specific optimizations (eg, convolution, ReLU, pool) that significantly reduce this source of main memory pressure on GPUs We find that a feature map typically has two uses that are spread far apart temporally Our key approach is to store an encoded representation of feature maps for this temporal gap and decode this data for use in the backward pass; the full-fidelity feature maps are used in the forward pass and relinquished immediately Based on this approach, we present Gist, our system that employs two classes of layer-specific encoding schemes -- lossless and lossy -- to exploit existing value redundancy in DNN training to significantly reduce the memory consumption of targeted feature maps For example, one insight is by taking advantage of the computational nature of back propagation from pool to ReLU layer, we can store the intermediate feature map using just 1 bit instead of 32 bits per value We deploy these mechanisms in a state-of-the-art DNN framework (CNTK) and observe that Gist reduces the memory footprint to upto 2X across 5 state-of-the-art image classification DNNs, with an average of 18X with only 4% performance overhead We also show that further software (eg, CuDNN) and hardware (eg, dynamic allocation) optimizations can result in even larger footprint reduction (upto 41X)

136 citations


Posted Content
TL;DR: This paper proposes a model, called "bi-directional block self-attention network (Bi-BloSAN), for RNN/CNN-free sequence encoding that achieves or improves upon state-of-the-art accuracy, and shows better efficiency-memory trade-off than existing RNN /CNN/SAN.
Abstract: Recurrent neural networks (RNN), convolutional neural networks (CNN) and self-attention networks (SAN) are commonly used to produce context-aware representations RNN can capture long-range dependency but is hard to parallelize and not time-efficient CNN focuses on local dependency but does not perform well on some tasks SAN can model both such dependencies via highly parallelizable computation, but memory requirement grows rapidly in line with sequence length In this paper, we propose a model, called "bi-directional block self-attention network (Bi-BloSAN)", for RNN/CNN-free sequence encoding It requires as little memory as RNN but with all the merits of SAN Bi-BloSAN splits the entire sequence into blocks, and applies an intra-block SAN to each block for modeling local context, then applies an inter-block SAN to the outputs for all blocks to capture long-range dependency Thus, each SAN only needs to process a short sequence, and only a small amount of memory is required Additionally, we use feature-level attention to handle the variation of contexts around the same word, and use forward/backward masks to encode temporal order information On nine benchmark datasets for different NLP tasks, Bi-BloSAN achieves or improves upon state-of-the-art accuracy, and shows better efficiency-memory trade-off than existing RNN/CNN/SAN

121 citations


Journal ArticleDOI
14 Nov 2018-Nature
TL;DR: It is shown that the formation of long-term representations in a rat model of non-hippocampus-dependent memory depends not only on sleep but also on activation of a hippocampus-dependent mechanism during sleep, indicating that the hippocampus has an important role in long- term consolidation during sleep even for memories that have previously been considered hippocampus-independent.
Abstract: There is a long-standing division in memory research between hippocampus-dependent memory and non-hippocampus-dependent memory, as only the latter can be acquired and retrieved in the absence of normal hippocampal function1,2. Consolidation of hippocampus-dependent memory, in particular, is strongly supported by sleep3–5. Here we show that the formation of long-term representations in a rat model of non-hippocampus-dependent memory depends not only on sleep but also on activation of a hippocampus-dependent mechanism during sleep. Rats encoded non-hippocampus-dependent (novel-object recognition6–8) and hippocampus-dependent (object–place recognition) memories before a two-hour period of sleep or wakefulness. Memory was tested either immediately thereafter or remotely (after one or three weeks). Whereas object–place recognition memory was stronger for rats that had slept after encoding (rather than being awake) at both immediate and remote testing, novel-object recognition memory profited from sleep only three weeks after encoding, at which point it was preserved in rats that had slept after encoding but not in those that had been awake. Notably, inactivation of the hippocampus during post-encoding sleep by intrahippocampal injection of muscimol abolished the sleep-induced enhancement of remote novel-object recognition memory. By contrast, muscimol injection before remote retrieval or memory encoding had no effect on test performance, confirming that the encoding and retrieval of novel-object recognition memory are hippocampus-independent. Remote novel-object recognition memory was associated with spindle activity during post-encoding slow-wave sleep, consistent with the view that neuronal memory replay during slow-wave sleep contributes to long-term memory formation. Our results indicate that the hippocampus has an important role in long-term consolidation during sleep even for memories that have previously been considered hippocampus-independent. Hippocampal activity during a period of sleep after memory encoding is crucial for forming long-term memories in rats, even for types of memory considered not to be hippocampus-dependent.

118 citations


Journal ArticleDOI
TL;DR: The first successful implementation in humans of a proof-of-concept system for restoring and improving memory function via facilitation of memory encoding using the patient's own hippocampal spatiotemporal neural codes for memory is demonstrated.
Abstract: Objective. We demonstrate here the first successful implementation in humans of a proof-of-concept system for restoring and improving memory function via facilitation of memory encoding using the patient's own hippocampal spatiotemporal neural codes for memory. Memory in humans is subject to disruption by drugs, disease and brain injury, yet previous attempts to restore or rescue memory function in humans typically involved only nonspecific, modulation of brain areas and neural systems related to memory retrieval. Approach. We have constructed a model of processes by which the hippocampus encodes memory items via spatiotemporal firing of neural ensembles that underlie the successful encoding of short-term memory. A nonlinear multi-input, multi-output (MIMO) model of hippocampal CA3 and CA1 neural firing is computed that predicts activation patterns of CA1 neurons during the encoding (sample) phase of a delayed match-to-sample (DMS) human short-term memory task. Main results. MIMO model-derived electrical stimulation delivered to the same CA1 locations during the sample phase of DMS trials facilitated short-term/working memory by 37% during the task. Longer term memory retention was also tested in the same human subjects with a delayed recognition (DR) task that utilized images from the DMS task, along with images that were not from the task. Across the subjects, the stimulated trials exhibited significant improvement (35%) in both short-term and long-term retention of visual information. Significance. These results demonstrate the facilitation of memory encoding which is an important feature for the construction of an implantable neural prosthetic to improve human memory.

116 citations


Book ChapterDOI
08 Sep 2018
TL;DR: This work introduces bijective Gated Recurrent Units, a double mapping between the input and output of a GRU layer that allows for recurrent auto-encoders with state sharing between encoder and decoder, stratifying the sequence representation and helping to prevent capacity problems.
Abstract: This work introduces double-mapping Gated Recurrent Units (dGRU), an extension of standard GRUs where the input is considered as a recurrent state. An extra set of logic gates is added to update the input given the output. Stacking multiple such layers results in a recurrent auto-encoder: the operators updating the outputs comprise the encoder, while the ones updating the inputs form the decoder. Since the states are shared between corresponding encoder and decoder layers, the representation is stratified during learning: some information is not passed to the next layers. We test our model on future video prediction. Main challenges for this task include high variability in videos, temporal propagation of errors, and non-specificity of future frames. We show how only the encoder or decoder needs to be applied for encoding or prediction. This reduces the computational cost and avoids re-encoding predictions when generating multiple frames, mitigating error propagation. Furthermore, it is possible to remove layers from a trained model, giving an insight to the role of each layer. Our approach improves state of the art results on MMNIST and UCF101, being competitive on KTH with 2 and 3 times less memory usage and computational cost than the best scored approach.

98 citations


Proceedings ArticleDOI
01 Nov 2018
TL;DR: Simulation results show that partitioning a DNN coupled with feature space encoding enables significant improvement in the energy-efficiency and throughput over the baseline configurations that perform the entire inference at the edge or at the host.
Abstract: This paper introduces partitioning an inference task of a deep neural network between an edge and a host platform in the IoT environment. We present a DNN as an encoding pipeline, and propose to transmit the output feature space of an intermediate layer to the host. Encoding of the feature space is proposed to enhance the maximum input rate supported by the edge platform and/or reduce the energy of the edge platform. Simulation results show that partitioning a DNN coupled with feature space encoding enables significant improvement in the energy-efficiency and throughput over the baseline configurations that perform the entire inference at the edge or at the host.

97 citations


Proceedings ArticleDOI
02 Sep 2018
TL;DR: In this article, the authors apply self-attention to acoustic modeling, and propose a Gaussian biasing approach that allows explicit control over the context range, and demonstrate that interpretability is a strength of selfattentional acoustic models.
Abstract: Self-attention is a method of encoding sequences of vectors by relating these vectors to each-other based on pairwise similarities. These models have recently shown promising results for modeling discrete sequences, but they are non-trivial to apply to acoustic modeling due to computational and modeling issues. In this paper, we apply self-attention to acoustic modeling, proposing several improvements to mitigate these issues: First, self-attention memory grows quadratically in the sequence length, which we address through a downsampling technique. Second, we find that previous approaches to incorporate position information into the model are unsuitable and explore other representations and hybrid models to this end. Third, to stress the importance of local context in the acoustic signal, we propose a Gaussian biasing approach that allows explicit control over the context range. Experiments find that our model approaches a strong baseline based on LSTMs with network-in-network connections while being much faster to compute. Besides speed, we find that interpretability is a strength of self-attentional acoustic models, and demonstrate that self-attention heads learn a linguistically plausible division of labor.

91 citations


Journal ArticleDOI
TL;DR: It is suggested that the ability to influence associative memory may be related to the fidelity of hippocampal TMS targeting, and the notion that pIPC TMS may serve as a potential tool for manipulating hippocampal function in healthy participants is supported.
Abstract: The hippocampus plays a critical role in episodic memory, among other cognitive functions. However, few tools exist to causally manipulate hippocampal function in healthy human participants. Recent work has targeted hippocampal-cortical networks by performing TMS to a region interconnected with the hippocampus, posterior inferior parietal cortex (pIPC). Such hippocampal-targeted TMS enhances associative memory and influences hippocampal functional connectivity. However, it is currently unknown which stages of mnemonic processing (encoding or retrieval) are affected by hippocampal-targeted TMS. Here, we examined whether hippocampal-targeted TMS influences the initial encoding of associations (vs. items) into memory. To selectively influence encoding and not retrieval, we performed continuous theta-burst TMS before participants encoded object-location associations and assessed memory after the direct effect of stimulation dissipated. Relative to control TMS and baseline memory, pIPC TMS enhanced associative memory success and confidence. Item memory was unaffected, demonstrating a selective influence on associative versus item memory. The strength of hippocampal-pIPC functional connectivity predicted TMS-related memory benefits, which was mediated by parahippocampal and retrosplenial cortices. Our findings indicate that hippocampal-targeted TMS can specifically modulate the encoding of new associations into memory without directly influencing retrieval processes and suggest that the ability to influence associative memory may be related to the fidelity of hippocampal TMS targeting. These results support the notion that pIPC TMS may serve as a potential tool for manipulating hippocampal function in healthy participants. Nonetheless, future work combining hippocampal-targeted continuous theta-burst TMS with neuroimaging is needed to better understand the neural basis of TMS-induced memory changes.

90 citations


Posted ContentDOI
22 Apr 2018-bioRxiv
TL;DR: It is found that STP can support the short-term maintenance of information provided that the memory delay period is sufficiently short and the amount of persistent neuronal activity scales with the degree of manipulation required.
Abstract: Recently it has been proposed that information in short-term memory may not always be stored in persistent neuronal activity, but can be maintained in "activity-silent" hidden states such as synaptic efficacies endowed with short-term plasticity (STP). However, working memory involves manipulation as well as maintenance of information in the absence of external stimuli. In this work, we investigated working memory representation using recurrent neural network (RNN) models trained to perform several working memory dependent tasks. We found that STP can support the short-term maintenance of information provided that the memory delay period is sufficiently short. However, in tasks that require actively manipulating information, persistent neuronal activity naturally emerges from learning, and the amount of persistent neuronal activity scales with the degree of manipulation required. These results shed insight into the current debate on working memory encoding, and suggest that persistent neural activity can vary markedly between tasks used in different experiments.

Proceedings ArticleDOI
31 Oct 2018
TL;DR: This work proposes an alternative approach which instead relies on a single 2D convolutional neural network across both sequences, which outperforms state-of-the-art encoder-decoder systems, while being conceptually simpler and having fewer parameters.
Abstract: Current state-of-the-art machine translation systems are based on encoder-decoder architectures, that first encode the input sequence, and then generate an output sequence based on the input encoding. Both are interfaced with an attention mechanism that recombines a fixed encoding of the source tokens based on the decoder state. We propose an alternative approach which instead relies on a single 2D convolutional neural network across both sequences. Each layer of our network re-codes source tokens on the basis of the output sequence produced so far. Attention-like properties are therefore pervasive throughout the network. Our model yields excellent results, outperforming state-of-the-art encoder-decoder systems, while being conceptually simpler and having fewer parameters.

Journal ArticleDOI
TL;DR: In this paper, the authors review evidence for a selective removal process that operates on outdated information to limit working memory load and facilitate the maintenance of goal-relevant information, and propose two forms of removal: one is temporary, and reversible, which modifies working memory content without impacting content-to-context bindings, and another is permanent, which unbinds the content from its context in working memory (without necessarily impacting long-term forgetting).
Abstract: What happens to goal-relevant information in working memory after it is no longer needed? Here, we review evidence for a selective removal process that operates on outdated information to limit working memory load and hence facilitates the maintenance of goal-relevant information. Removal alters the representations of irrelevant content so as to reduce access to it, thereby improving access to the remaining relevant content and also facilitating the encoding of new information. Both behavioral and neural evidence support the existence of a removal process that is separate from forgetting due to decay or interference. We discuss the potential mechanisms involved in removal and characterize the time course and duration of the process. In doing so, we propose the existence of two forms of removal: one is temporary, and reversible, which modifies working memory content without impacting content-to-context bindings, and another is permanent, which unbinds the content from its context in working memory (without necessarily impacting long-term forgetting). Finally, we discuss limitations on removal and prescribe conditions for evaluating evidence for or against this process.

Journal ArticleDOI
TL;DR: Changes in the pupil size during encoding and recall of word lists showed significant differences during their encoding compared to those that were forgotten – the pupil was more constricted before and more dilated after the onset of word presentation.
Abstract: Pupil responses are known to indicate brain processes involved in perception, attention and decision-making. They can provide an accessible biomarker of human memory performance and cognitive states in general. Here we investigated changes in the pupil size during encoding and recall of word lists. Consistent patterns in the pupil response were found across and within distinct phases of the free recall task. The pupil was most constricted in the initial fixation phase and was gradually more dilated through the subsequent encoding, distractor and recall phases of the task, as the word items were maintained in memory. Within the final recall phase, retrieving memory for individual words was associated with pupil dilation in absence of visual stimulation. Words that were successfully recalled showed significant differences in pupil response during their encoding compared to those that were forgotten – the pupil was more constricted before and more dilated after the onset of word presentation. Our results suggest pupil size as a potential biomarker for probing and modulation of memory processing.

Proceedings ArticleDOI
19 Jul 2018
TL;DR: In this article, a new memory augmented neural network was proposed to model the complex interactions between two asynchronous sequential views, where two encoders for reading from and writing to two external memories for encoding input views.
Abstract: One of the core tasks in multi-view learning is to capture relations among views. For sequential data, the relations not only span across views, but also extend throughout the view length to form long-term intra-view and inter-view interactions. In this paper, we present a new memory augmented neural network that aims to model these complex interactions between two asynchronous sequential views. Our model uses two encoders for reading from and writing to two external memories for encoding input views. The intra-view interactions and the long-term dependencies are captured by the use of memories during this encoding process. There are two modes of memory accessing in our system: late-fusion and early-fusion, corresponding to late and early inter-view interactions. In the late-fusion mode, the two memories are separated, containing only view-specific contents. In the early-fusion mode, the two memories share the same addressing space, allowing cross-memory accessing. In both cases, the knowledge from the memories will be combined by a decoder to make predictions over the output space. The resulting dual memory neural computer is demonstrated on a comprehensive set of experiments, including a synthetic task of summing two sequences and the tasks of drug prescription and disease progression in healthcare. The results demonstrate competitive performance over both traditional algorithms and deep learning methods designed for multi-view problems.

Journal ArticleDOI
TL;DR: A broad overview of different hippocampal coding schemes across species is presented to inspire future empirical and modeling research to consider how factors surrounding memory formation shape the representations in which they are stored.

Proceedings ArticleDOI
01 Jan 2018
TL;DR: Phrase-level Self-Attention Networks (PSAN) that perform self-attention across words inside a phrase to capture context dependencies at the phrase level, and use the gated memory updating mechanism to refine each word’s representation hierarchically with longer-term context dependencies captured in a larger phrase are proposed.
Abstract: Universal sentence encoding is a hot topic in recent NLP research Attention mechanism has been an integral part in many sentence encoding models, allowing the models to capture context dependencies regardless of the distance between the elements in the sequence Fully attention-based models have recently attracted enormous interest due to their highly parallelizable computation and significantly less training time However, the memory consumption of their models grows quadratically with the sentence length, and the syntactic information is neglected To this end, we propose Phrase-level Self-Attention Networks (PSAN) that perform self-attention across words inside a phrase to capture context dependencies at the phrase level, and use the gated memory updating mechanism to refine each word’s representation hierarchically with longer-term context dependencies captured in a larger phrase As a result, the memory consumption can be reduced because the self-attention is performed at the phrase level instead of the sentence level At the same time, syntactic information can be easily integrated in the model Experiment results show that PSAN can achieve the state-of-the-art performance across a plethora of NLP tasks including binary and multi-class classification, natural language inference and sentence similarity

Posted Content
TL;DR: In this paper, a DNN as an encoding pipeline is proposed to transmit the output feature space of an intermediate layer to the host to enhance the maximum input rate supported by the edge platform and/or reduce the energy consumption of the edge platforms.
Abstract: This paper introduces partitioning an inference task of a deep neural network between an edge and a host platform in the IoT environment. We present a DNN as an encoding pipeline, and propose to transmit the output feature space of an intermediate layer to the host. The lossless or lossy encoding of the feature space is proposed to enhance the maximum input rate supported by the edge platform and/or reduce the energy of the edge platform. Simulation results show that partitioning a DNN at the end of convolutional (feature extraction) layers coupled with feature space encoding enables significant improvement in the energy-efficiency and throughput over the baseline configurations that perform the entire inference at the edge or at the host.

Journal ArticleDOI
TL;DR: The concise pixel-permutation algorithm is used to address the drawbacks of the traditional CA encoding methods and the effectiveness of the proposed video encoding method is demonstrated by simulation examples.

Journal ArticleDOI
TL;DR: Findings from presurgical epilepsy patients with bilateral hippocampal depth electrodes performing an object-location memory task that provided a broad range of spatial memory precision indicate that local processing in hippocampal CA1 and dorsolateral prefrontal cortex supports high-fidelity spatial memory representations.
Abstract: The hippocampus plays a critical role in spatial memory. However, the exact neural mechanisms underlying high-fidelity spatial memory representations are unknown. We report findings from presurgical epilepsy patients with bilateral hippocampal depth electrodes performing an object-location memory task that provided a broad range of spatial memory precision. During encoding, patients were shown a series of objects along the circumference of an invisible circle. At test, the same objects were shown at the top of the circle (0°), and patients used a dial to move the object to its location shown during encoding. Angular error between the correct location and the indicated location was recorded as a continuous measure of performance. By registering pre- and postimplantation MRI scans, we were able to localize the electrodes to specific hippocampal subfields. We found a correlation between increased gamma power, thought to reflect local excitatory activity, and the precision of spatial memory retrieval in hippocampal CA1 electrodes. Additionally, we found a similar relationship between gamma power and memory precision in the dorsolateral prefrontal cortex and a directional relationship between activity in this region and in the CA1, suggesting that the dorsolateral prefrontal cortex is involved in postretrieval processing. These results indicate that local processing in hippocampal CA1 and dorsolateral prefrontal cortex supports high-fidelity spatial memory representations.

Proceedings Article
01 Aug 2018
TL;DR: This article proposed an aggregation mechanism to obtain a fixed-size encoding with a dynamic routing policy, which dynamically decides what and how much information need to be transferred from each word to the final encoding of the text sequence.
Abstract: While much progress has been made in how to encode a text sequence into a sequence of vectors, less attention has been paid to how to aggregate these preceding vectors (outputs of RNN/CNN) into fixed-size encoding vector. Usually, a simple max or average pooling is used, which is a bottom-up and passive way of aggregation and lack of guidance by task information. In this paper, we propose an aggregation mechanism to obtain a fixed-size encoding with a dynamic routing policy. The dynamic routing policy is dynamically deciding that what and how much information need be transferred from each word to the final encoding of the text sequence. Following the work of Capsule Network, we design two dynamic routing policies to aggregate the outputs of RNN/CNN encoding layer into a final encoding vector. Compared to the other aggregation methods, dynamic routing can refine the messages according to the state of final encoding vector. Experimental results on five text classification tasks show that our method outperforms other aggregating models by a significant margin. Related source code is released on our github page.Related source code is released on our github page.

Posted Content
TL;DR: This paper proposes a new method, evolution of a tree-based encoding of the gated memory nodes, and shows that it makes it possible to explore new variations more effectively than other methods, and discovers nodes with multiple recurrent paths and multiple memory cells, which lead to significant improvement in the standard language modeling benchmark task.
Abstract: Gated recurrent networks such as those composed of Long Short-Term Memory (LSTM) nodes have recently been used to improve state of the art in many sequential processing tasks such as speech recognition and machine translation. However, the basic structure of the LSTM node is essentially the same as when it was first conceived 25 years ago. Recently, evolutionary and reinforcement learning mechanisms have been employed to create new variations of this structure. This paper proposes a new method, evolution of a tree-based encoding of the gated memory nodes, and shows that it makes it possible to explore new variations more effectively than other methods. The method discovers nodes with multiple recurrent paths and multiple memory cells, which lead to significant improvement in the standard language modeling benchmark task. The paper also shows how the search process can be speeded up by training an LSTM network to estimate performance of candidate structures, and by encouraging exploration of novel solutions. Thus, evolutionary design of complex neural network structures promises to improve performance of deep learning architectures beyond human ability to do so.

Posted Content
TL;DR: In this article, the authors evaluate decoding and encoding models in terms of their generalization performance, which depends on the level of generalization a model achieves (e.g. to new response measurements for the same stimuli, to new stimuli from the same population, or to stimuli from a different population).
Abstract: Encoding and decoding models are widely used in systems, cognitive, and computational neuroscience to make sense of brain-activity data. However, the interpretation of their results requires care. Decoding models can help reveal whether particular information is present in a brain region in a format the decoder can exploit. Encoding models make comprehensive predictions about representational spaces. In the context of sensory systems, encoding models enable us to test and compare brain-computational models, and thus directly constrain computational theory. Encoding and decoding models typically include fitted linear-model components. Sometimes the weights of the fitted linear combinations are interpreted as reflecting, in an encoding model, the contribution of different sensory features to the representation or, in a decoding model, the contribution of different measured brain responses to a decoded feature. Such interpretations can be problematic when the predictor variables or their noise components are correlated and when priors (or penalties) are used to regularize the fit. Encoding and decoding models are evaluated in terms of their generalization performance. The correct interpretation depends on the level of generalization a model achieves (e.g. to new response measurements for the same stimuli, to new stimuli from the same population, or to stimuli from a different population). Significant decoding or encoding performance of a single model (at whatever level of generality) does not provide strong constraints for theory. Many models must be tested and inferentially compared for analyses to drive theoretical progress.

Proceedings Article
01 Jan 2018
TL;DR: Bi-BloSAN as mentioned in this paper proposes a Bi-directional block self-attention network for RNN/CNN-free sequence encoding, which splits the entire sequence into blocks and applies an intra-block SAN to each block for modeling local context, then applies an inter-blockSAN to the outputs for all blocks to capture long-range dependency.
Abstract: Recurrent neural networks (RNN), convolutional neural networks (CNN) and self-attention networks (SAN) are commonly used to produce context-aware representations. RNN can capture long-range dependency but is hard to parallelize and not time-efficient. CNN focuses on local dependency but does not perform well on some tasks. SAN can model both such dependencies via highly parallelizable computation, but memory requirement grows rapidly in line with sequence length. In this paper, we propose a model, called "bi-directional block self-attention network (Bi-BloSAN)", for RNN/CNN-free sequence encoding. It requires as little memory as RNN but with all the merits of SAN. Bi-BloSAN splits the entire sequence into blocks, and applies an intra-block SAN to each block for modeling local context, then applies an inter-block SAN to the outputs for all blocks to capture long-range dependency. Thus, each SAN only needs to process a short sequence, and only a small amount of memory is required. Additionally, we use feature-level attention to handle the variation of contexts around the same word, and use forward/backward masks to encode temporal order information. On nine benchmark datasets for different NLP tasks, Bi-BloSAN achieves or improves upon state-of-the-art accuracy, and shows better efficiency-memory trade-off than existing RNN/CNN/SAN.

Posted Content
TL;DR: This paper designs two dynamic routing policies to aggregate the outputs of RNN/CNN encoding layer into a final encoding vector and shows that this method outperforms other aggregating models by a significant margin.
Abstract: While much progress has been made in how to encode a text sequence into a sequence of vectors, less attention has been paid to how to aggregate these preceding vectors (outputs of RNN/CNN) into fixed-size encoding vector. Usually, a simple max or average pooling is used, which is a bottom-up and passive way of aggregation and lack of guidance by task information. In this paper, we propose an aggregation mechanism to obtain a fixed-size encoding with a dynamic routing policy. The dynamic routing policy is dynamically deciding that what and how much information need be transferred from each word to the final encoding of the text sequence. Following the work of Capsule Network, we design two dynamic routing policies to aggregate the outputs of RNN/CNN encoding layer into a final encoding vector. Compared to the other aggregation methods, dynamic routing can refine the messages according to the state of final encoding vector. Experimental results on five text classification tasks show that our method outperforms other aggregating models by a significant margin. Related source code is released on our github page.

Posted Content
TL;DR: It is shown that for a random model for stragglers, the proposed moment encoding based gradient descent method can be viewed as the stochastic gradient descent Method, which allows for convergence guarantees for the proposed solution.
Abstract: This paper considers the problem of implementing large-scale gradient descent algorithms in a distributed computing setting in the presence of {\em straggling} processors. To mitigate the effect of the stragglers, it has been previously proposed to encode the data with an erasure-correcting code and decode at the master server at the end of the computation. We, instead, propose to encode the second-moment of the data with a low density parity-check (LDPC) code. The iterative decoding algorithms for LDPC codes have very low computational overhead and the number of decoding iterations can be made to automatically adjust with the number of stragglers in the system. We show that for a random model for stragglers, the proposed moment encoding based gradient descent method can be viewed as the stochastic gradient descent method. This allows us to obtain convergence guarantees for the proposed solution. Furthermore, the proposed moment encoding based method is shown to outperform the existing schemes in a real distributed computing setup.

Journal ArticleDOI
TL;DR: The results support the idea that when prior knowledge is involved, the HPC, vmPFC, and aTPL, which support prior episodic, social‐evaluative/schematic, and semantic memories, respectively, continue to interact with each other and posterior perceptual brain regions during the post‐encoding rest to facilitate off‐line processing of the newly formed memory, and enhance memory consolidation.

Journal ArticleDOI
TL;DR: A novel astrophysics-based approach is proposed for automatically finding the clusters and features simultaneously by using a novel dynamic threshold technique for an efficient searching.
Abstract: In this paper, a novel astrophysics-based approach is proposed for automatically finding the clusters and features simultaneously. A novel agent encoding scheme is used to encode both the number of...

Journal ArticleDOI
TL;DR: A neuro-inspired cognitive navigation model which integrates the cognitive mapping ability of entorhinal cortex (EC) and episodic memory ability of hippocampus to enable the robot to perform more versatile cognitive tasks is proposed.
Abstract: One of the important topics in the study of robotic cognition is to enable robot to perceive, plan, and react to situations in a real-world environment. We present a novel angle on this subject, by integrating active navigation with sequence learning. We propose a neuro-inspired cognitive navigation model which integrates the cognitive mapping ability of entorhinal cortex (EC) and episodic memory ability of hippocampus to enable the robot to perform more versatile cognitive tasks. The EC layer is modeled by a 3-D continuous attractor network structure to build the map of the environment. The hippocampus is modeled by a recurrent spiking neural network to store and retrieve task-related information. The information between cognitive map and memory network are exchanged through respective encoding and decoding schemes. The cognitive system is applied on a mobile robot platform and the robot exploration, localization, and navigation are investigated. The robotic experiments demonstrate the effectiveness of the proposed system.

Journal ArticleDOI
18 Jun 2018-eLife
TL;DR: Time-varying functional connectivity patterns across the human brain in periods of 30–40 s, which have recently been implicated in various cognitive functions, suggest that a diverse set of brain systems dynamically interact to support successful memory encoding.
Abstract: Although activation/deactivation of specific brain regions has been shown to be predictive of successful memory encoding, the relationship between time-varying large-scale brain networks and fluctuations of memory encoding performance remains unclear. Here, we investigated time-varying functional connectivity patterns across the human brain in periods of 30-40 s, which have recently been implicated in various cognitive functions. During functional magnetic resonance imaging, participants performed a memory encoding task, and their performance was assessed with a subsequent surprise memory test. A graph analysis of functional connectivity patterns revealed that increased integration of the subcortical, default-mode, salience, and visual subnetworks with other subnetworks is a hallmark of successful memory encoding. Moreover, multivariate analysis using the graph metrics of integration reliably classified the brain network states into the period of high (vs. low) memory encoding performance. Our findings suggest that a diverse set of brain systems dynamically interact to support successful memory encoding.