scispace - formally typeset
Search or ask a question

Showing papers by "IBM published in 2016"


Book
Yuan Taur1, Tak H. Ning1
01 Jan 2016
TL;DR: In this article, the authors highlight the intricate interdependencies and subtle tradeoffs between various practically important device parameters, and also provide an in-depth discussion of device scaling and scaling limits of CMOS and bipolar devices.
Abstract: Learn the basic properties and designs of modern VLSI devices, as well as the factors affecting performance, with this thoroughly updated second edition. The first edition has been widely adopted as a standard textbook in microelectronics in many major US universities and worldwide. The internationally-renowned authors highlight the intricate interdependencies and subtle tradeoffs between various practically important device parameters, and also provide an in-depth discussion of device scaling and scaling limits of CMOS and bipolar devices. Equations and parameters provided are checked continuously against the reality of silicon data, making the book equally useful in practical transistor design and in the classroom. Every chapter has been updated to include the latest developments, such as MOSFET scale length theory, high-field transport model, and SiGe-base bipolar devices.

2,680 citations


Proceedings ArticleDOI
19 Feb 2016
TL;DR: This paper proposed several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling key-words, capturing the hierarchy of sentence-toword structure, and emitting words that are rare or unseen at training time.
Abstract: In this work, we model abstractive text summarization using Attentional EncoderDecoder Recurrent Neural Networks, and show that they achieve state-of-the-art performance on two different corpora. We propose several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling key-words, capturing the hierarchy of sentence-toword structure, and emitting words that are rare or unseen at training time. Our work shows that many of our proposed models contribute to further improvement in performance. We also propose a new dataset consisting of multi-sentence summaries, and establish performance benchmarks for further research.

1,405 citations


Book ChapterDOI
08 Oct 2016
TL;DR: A unified deep neural network, denoted the multi-scale CNN (MS-CNN), is proposed for fast multi- scale object detection, which is learned end-to-end, by optimizing a multi-task loss.
Abstract: A unified deep neural network, denoted the multi-scale CNN (MS-CNN), is proposed for fast multi-scale object detection. The MS-CNN consists of a proposal sub-network and a detection sub-network. In the proposal sub-network, detection is performed at multiple output layers, so that receptive fields match objects of different scales. These complementary scale-specific detectors are combined to produce a strong multi-scale object detector. The unified network is learned end-to-end, by optimizing a multi-task loss. Feature upsampling by deconvolution is also explored, as an alternative to input upsampling, to reduce the memory and computation costs. State-of-the-art object detection performance, at up to 15 fps, is reported on datasets, such as KITTI and Caltech, containing a substantial number of small objects.

1,342 citations


Posted Content
TL;DR: This paper proposed several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling key-words, capturing the hierarchy of sentence-to-word structure, and emitting words that are rare or unseen at training time.
Abstract: In this work, we model abstractive text summarization using Attentional Encoder-Decoder Recurrent Neural Networks, and show that they achieve state-of-the-art performance on two different corpora. We propose several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling key-words, capturing the hierarchy of sentence-to-word structure, and emitting words that are rare or unseen at training time. Our work shows that many of our proposed models contribute to further improvement in performance. We also propose a new dataset consisting of multi-sentence summaries, and establish performance benchmarks for further research.

1,141 citations


Journal ArticleDOI
TL;DR: This paper proposed three attention schemes that integrate mutual influence between sentences into CNNs, thus the representation of each sentence takes into consideration its counterpart, and achieved state-of-the-art performance on answer selection, paraphrase identification, and textual entailment.
Abstract: How to model a pair of sentences is a critical issue in many NLP tasks such as answer selection (AS), paraphrase identification (PI) and textual entailment (TE). Most prior work (i) deals with one individual task by fine-tuning a specific system; (ii) models each sentence's representation separately, rarely considering the impact of the other sentence; or (iii) relies fully on manually designed, task-specific linguistic features. This work presents a general Attention Based Convolutional Neural Network (ABCNN) for modeling a pair of sentences. We make three contributions. (i) The ABCNN can be applied to a wide variety of tasks that require modeling of sentence pairs. (ii) We propose three attention schemes that integrate mutual influence between sentences into CNNs; thus, the representation of each sentence takes into consideration its counterpart. These interdependent sentence pair representations are more powerful than isolated sentence representations. (iii) ABCNNs achieve state-of-the-art performance on AS, PI and TE tasks. We release code at: https://github.com/yinwenpeng/Answer_Selection.

935 citations


Journal ArticleDOI
TL;DR: This work shows that chalcogenide-based phase-change materials can be used to create an artificial neuron in which the membrane potential is represented by the phase configuration of the nanoscale phase- change device and shows that the temporal integration of postsynaptic potentials can be achieved on a nanosecond timescale.
Abstract: A nanoscale phase-change device can be used to create an artificial neuron that exhibits integrate-and-fire functionality with stochastic dynamics.

808 citations


Journal ArticleDOI
TL;DR: This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors, bringing the promise of embedded, intelligent, brain-inspired computing one step closer.
Abstract: Deep networks are now able to achieve human-level performance on a broad spectrum of recognition tasks. Independently, neuromorphic computing has now demonstrated unprecedented energy-efficiency through a new chip architecture based on spiking neurons, low precision synapses, and a scalable communication network. Here, we demonstrate that neuromorphic computing, despite its novel architectural primitives, can implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech, (ii) perform inference while preserving the hardware’s underlying energy-efficiency and high throughput, running on the aforementioned datasets at between 1,200 and 2,600 frames/s and using between 25 and 275 mW (effectively >6,000 frames/s per Watt), and (iii) can be specified and trained using backpropagation with the same ease-of-use as contemporary deep learning. This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors, bringing the promise of embedded, intelligent, brain-inspired computing one step closer.

719 citations


Journal ArticleDOI
TL;DR: This work shows that the highly stable, non-toxic and earth-abundant material, ZrSiS, has an electronic band structure that hosts several Dirac cones that form a Fermi surface with a diamond-shaped line of Dirac nodes, making it a very promising candidate to study Dirac electrons, as well as the properties of lines ofDirac nodes.
Abstract: Materials harbouring exotic quasiparticles, such as massless Dirac and Weyl fermions, have garnered much attention from physics and material science communities due to their exceptional physical properties such as ultra-high mobility and extremely large magnetoresistances. Here, we show that the highly stable, non-toxic and earth-abundant material, ZrSiS, has an electronic band structure that hosts several Dirac cones that form a Fermi surface with a diamond-shaped line of Dirac nodes. We also show that the square Si lattice in ZrSiS is an excellent template for realizing new types of two-dimensional Dirac cones recently predicted by Young and Kane. Finally, we find that the energy range of the linearly dispersed bands is as high as 2 eV above and below the Fermi level; much larger than of other known Dirac materials. This makes ZrSiS a very promising candidate to study Dirac electrons, as well as the properties of lines of Dirac nodes.

661 citations


Posted Content
Ramesh Nallapati1, Feifei Zhai1, Bowen Zhou1
TL;DR: SummaRuNNer as mentioned in this paper is a recurrent neural network (RNN) based sequence model for extractive summarization of documents and achieves performance better than or comparable to state-of-the-art.
Abstract: We present SummaRuNNer, a Recurrent Neural Network (RNN) based sequence model for extractive summarization of documents and show that it achieves performance better than or comparable to state-of-the-art. Our model has the additional advantage of being very interpretable, since it allows visualization of its predictions broken up by abstract features such as information content, salience and novelty. Another novel contribution of our work is abstractive training of our extractive model that can train on human generated reference summaries alone, eliminating the need for sentence-level extractive labels.

645 citations


Proceedings ArticleDOI
12 Aug 2016
TL;DR: The results of the WMT16 shared tasks are presented, which included five machine translation (MT) tasks (standard news, IT-domain, biomedical, multimodal, pronoun), three evaluation tasks (metrics, tuning, run-time estimation of MT quality), and an automatic post-editing task and bilingual document alignment task.
Abstract: This paper presents the results of the WMT16 shared tasks, which included five machine translation (MT) tasks (standard news, IT-domain, biomedical, multimodal, pronoun), three evaluation tasks (metrics, tuning, run-time estimation of MT quality), and an automatic post-editing task and bilingual document alignment task. This year, 102 MT systems from 24 institutions (plus 36 anonymized online systems) were submitted to the 12 translation directions in the news translation task. The IT-domain task received 31 submissions from 12 institutions in 7 directions and the Biomedical task received 15 submissions systems from 5 institutions. Evaluation was both automatic and manual (relative ranking and 100-point scale assessments). The quality estimation task had three subtasks, with a total of 14 teams, submitting 39 entries. The automatic post-editing task had a total of 6 teams, submitting 11 entries.

616 citations


Journal ArticleDOI
TL;DR: Noise measurements show that such BP photodetectors are capable of sensing mid-infrared light in the picowatt range, and the high photoresponse remains effective at kilohertz modulation frequencies, because of the fast carrier dynamics arising from BP's moderate bandgap.
Abstract: Recently, black phosphorus (BP) has joined the two-dimensional material family as a promising candidate for photonic applications due to its moderate bandgap, high carrier mobility, and compatibility with a diverse range of substrates. Photodetectors are probably the most explored BP photonic devices, however, their unique potential compared with other layered materials in the mid-infrared wavelength range has not been revealed. Here, we demonstrate BP mid-infrared detectors at 3.39 μm with high internal gain, resulting in an external responsivity of 82 A/W. Noise measurements show that such BP photodetectors are capable of sensing mid-infrared light in the picowatt range. Moreover, the high photoresponse remains effective at kilohertz modulation frequencies, because of the fast carrier dynamics arising from BP’s moderate bandgap. The high photoresponse at mid-infrared wavelengths and the large dynamic bandwidth, together with its unique polarization dependent response induced by low crystalline symmetry,...

Journal ArticleDOI
TL;DR: The conceptual and theoretical framework that explains the importance of mixed selectivity and the experimental evidence that recorded neural representations are high-dimensional are reviewed and the implications for the design of future experiments are discussed.

Journal ArticleDOI
TL;DR: Unlocking the full potential of biometrics through inter-disciplinary research in the above areas will not only lead to widespread adoption of this promising technology, but will also result in wider user acceptance and societal impact.

Book ChapterDOI
31 Oct 2016
TL;DR: This work presents an alternative formulation of the concept of concentrated differential privacy in terms of the Renyi divergence between the distributions obtained by running an algorithm on neighboring inputs, which proves sharper quantitative results, establishes lower bounds, and raises a few new questions.
Abstract: "Concentrated differential privacy" was recently introduced by Dwork and Rothblum as a relaxation of differential privacy, which permits sharper analyses of many privacy-preserving computations. We present an alternative formulation of the concept of concentrated differential privacy in terms of the Renyi divergence between the distributions obtained by running an algorithm on neighboring inputs. With this reformulation in hand, we prove sharper quantitative results, establish lower bounds, and raise a few new questions. We also unify this approach with approximate differential privacy by giving an appropriate definition of "approximate concentrated differential privacy".

Journal ArticleDOI
01 Jan 2016
TL;DR: Learning-to-Hash (LHT) as mentioned in this paper is one of the most popular methods for approximate nearest neighbor (ANN) search in big data applications, which can exploit information such as data distributions or class labels when optimizing the hash codes or functions.
Abstract: The explosive growth in Big Data has attracted much attention in designing efficient indexing and search methods recently. In many critical applications such as large-scale search and pattern matching, finding the nearest neighbors to a query is a fundamental research problem. However, the straightforward solution using exhaustive comparison is infeasible due to the prohibitive computational complexity and memory requirement. In response, approximate nearest neighbor (ANN) search based on hashing techniques has become popular due to its promising performance in both efficiency and accuracy. Prior randomized hashing methods, e.g., locality-sensitive hashing (LSH), explore data-independent hash functions with random projections or permutations. Although having elegant theoretic guarantees on the search quality in certain metric spaces, performance of randomized hashing has been shown insufficient in many real-world applications. As a remedy, new approaches incorporating data-driven learning methods in development of advanced hash functions have emerged. Such learning-to-hash methods exploit information such as data distributions or class labels when optimizing the hash codes or functions. Importantly, the learned hash codes are able to preserve the proximity of neighboring data in the original feature spaces in the hash code spaces. The goal of this paper is to provide readers with systematic understanding of insights, pros, and cons of the emerging techniques. We provide a comprehensive survey of the learning-to-hash framework and representative techniques of various types, including unsupervised, semisupervised, and supervised. In addition, we also summarize recent hashing approaches utilizing the deep learning models. Finally, we discuss the future direction and trends of research in this area.

Posted Content
Steven J. Rennie1, Etienne Marcheret1, Youssef Mroueh1, Jarret Ross1, Vaibhava Goel1 
TL;DR: Self-Critical Sequence Training (SCST) as mentioned in this paper is a form of reinforcement learning that utilizes the output of its own test-time inference algorithm to normalize the rewards it experiences.
Abstract: Recently it has been shown that policy-gradient methods for reinforcement learning can be utilized to train deep end-to-end systems directly on non-differentiable metrics for the task at hand. In this paper we consider the problem of optimizing image captioning systems using reinforcement learning, and show that by carefully optimizing our systems using the test metrics of the MSCOCO task, significant gains in performance can be realized. Our systems are built using a new optimization approach that we call self-critical sequence training (SCST). SCST is a form of the popular REINFORCE algorithm that, rather than estimating a "baseline" to normalize the rewards and reduce variance, utilizes the output of its own test-time inference algorithm to normalize the rewards it experiences. Using this approach, estimating the reward signal (as actor-critic methods must do) and estimating normalization (as REINFORCE algorithms typically do) is avoided, while at the same time harmonizing the model with respect to its test-time inference procedure. Empirically we find that directly optimizing the CIDEr metric with SCST and greedy decoding at test-time is highly effective. Our results on the MSCOCO evaluation sever establish a new state-of-the-art on the task, improving the best result in terms of CIDEr from 104.9 to 114.7.

Proceedings Article
01 Jan 2016
TL;DR: In this paper, a deep recurrent convolutional network was proposed to learn robust representations from multi-channel EEG time-series, and demonstrated its advantages in the context of mental load classification task.
Abstract: One of the challenges in modeling cognitive events from electroencephalogram (EEG) data is finding representations that are invariant to inter- and intra-subject differences, as well as to inherent noise associated with such data. Herein, we propose a novel approach for learning such representations from multi-channel EEG time-series, and demonstrate its advantages in the context of mental load classification task. First, we transform EEG activities into a sequence of topology-preserving multi-spectral images, as opposed to standard EEG analysis techniques that ignore such spatial information. Next, we train a deep recurrent-convolutional network inspired by state-of-the-art video classification to learn robust representations from the sequence of images. The proposed approach is designed to preserve the spatial, spectral, and temporal structure of EEG which leads to finding features that are less sensitive to variations and distortions within each dimension. Empirical evaluation on the cognitive load classification task demonstrated significant improvements in classification accuracy over current state-of-the-art approaches in this field.

Journal ArticleDOI
TL;DR: In this article, the authors present improvements in both theoretical understanding and experimental implementation of the cross resonance (CR) gate that have led to shorter two-qubit gate times and interleaved randomized benchmarking fidelities exceeding 99%.
Abstract: We present improvements in both theoretical understanding and experimental implementation of the cross resonance (CR) gate that have led to shorter two-qubit gate times and interleaved randomized benchmarking fidelities exceeding 99%. The CR gate is an all-microwave two-qubit gate that does not require tunability and is therefore well suited to quantum computing architectures based on two-dimensional superconducting qubits. The performance of the gate has previously been hindered by long gate times and fidelities averaging 94--96%. We have developed a calibration procedure that accurately measures the full CR Hamiltonian. The resulting measurements agree with theoretical analysis of the gate and also elucidate the error terms that have previously limited gate fidelity. The increase in fidelity that we have achieved was accomplished by introducing a second microwave drive tone on the target qubit to cancel unwanted components of the CR Hamiltonian.

Book ChapterDOI
19 Sep 2016
TL;DR: This paper introduces the second major revision of SPMF 2, which provides more than 60 new algorithm implementations (including novel algorithms for sequence prediction), an improved user interface with pattern visualization, a novel plug-in system, improved performance, and support for text mining.
Abstract: SPMF is an open-source data mining library, specialized in pattern mining, offering implementations of more than 120 data mining algorithms. It has been used in more than 310 research papers to solve applied problems in a wide range of domains from authorship attribution to restaurant recommendation. Its implementations are also commonly used as benchmarks in research papers, and it has also been integrated in several data analysis software programs. After three years of development, this paper introduces the second major revision of the library, named SPMF 2, which provides (1) more than 60 new algorithm implementations (including novel algorithms for sequence prediction), (2) an improved user interface with pattern visualization (3) a novel plug-in system, (4) improved performance, and (5) support for text mining.

Journal ArticleDOI
An Chen1
TL;DR: High-performance and low-cost emerging NVMs may simplify memory hierarchy, introduce non-volatility in logic gates and circuits, reduce system power, and enable novel architectures, and Storage-class memory (SCM) based on high-density NVMs could fill the performance and density gap between memory and storage.
Abstract: This paper will review emerging non-volatile memory (NVM) technologies, with the focus on phase change memory (PCM), spin-transfer-torque random-access-memory (STTRAM), resistive random-access-memory (RRAM), and ferroelectric field-effect-transistor (FeFET) memory. These promising NVM devices are evaluated in terms of their advantages, challenges, and applications. Their performance is compared based on reported parameters of major industrial test chips. Memory selector devices and cell structures are discussed. Changing market trends toward low power ( e.g. , mobile, IoT) and data-centric applications create opportunities for emerging NVMs. High-performance and low-cost emerging NVMs may simplify memory hierarchy, introduce non-volatility in logic gates and circuits, reduce system power, and enable novel architectures. Storage-class memory (SCM) based on high-density NVMs could fill the performance and density gap between memory and storage. Some unique characteristics of emerging NVMs can be utilized for novel applications beyond the memory space, e.g. , neuromorphic computing, hardware security, etc . In the beyond-CMOS era, emerging NVMs have the potential to fulfill more important functions and enable more efficient, intelligent, and secure computing systems.

Proceedings ArticleDOI
26 Mar 2016
TL;DR: This article used two softmax layers in order to predict the next word in conditional language models: one predicts the location of a word in the source sentence, and the other predicts a word from the shortlist vocabulary.
Abstract: The problem of rare and unknown words is an important issue that can potentially effect the performance of many NLP systems, including traditional count-based and deep learning models. We propose a novel way to deal with the rare and unseen words for the neural network models using attention. Our model uses two softmax layers in order to predict the next word in conditional language models: one predicts the location of a word in the source sentence, and the other predicts a word in the shortlist vocabulary. At each timestep, the decision of which softmax layer to use is adaptively made by an MLP which is conditioned on the context. We motivate this work from a psychological evidence that humans naturally have a tendency to point towards objects in the context or the environment when the name of an object is not known. Using our proposed model, we observe improvements on two tasks, neural machine translation on the Europarl English to French parallel corpora and text summarization on the Gigaword dataset.

Journal ArticleDOI
TL;DR: Ontop is presented, an open-source Ontology-Based Data Access (OBDA) system that allows for querying relational data sources through a conceptual representation of the domain of interest, provided in terms of an ontology, to which the data sources are mapped.
Abstract: We present Ontop, an open-source Ontology-Based Data Access (OBDA) system that allows for querying relational data sources through a conceptual representation of the domain of interest, provided in terms of an ontology, to which the data sources are mapped. Key features of Ontop are its solid theoretical foundations, a virtual approach to OBDA, which avoids materializing triples and is implemented through the query rewriting technique, extensive optimizations exploiting all elements of the OBDA architecture, its compliance to all relevant W3C recommendations (including SPARQL queries, R2RML mappings, and OWL2QL and RDFS ontologies), and its support for all major relational databases.

Journal ArticleDOI
TL;DR: It is shown that at low Péclet (Pe) numbers, at which diffusion and deterministic displacement compete, nano-DLD arrays separate particles between 20 to 110 nm based on size with sharp resolution, which opens up the potential for on-chip sorting and quantification of these important biocolloids.
Abstract: Lateral displacement pillar arrays can now be used to separate nanoscale colloids including exosomes, offering new opportunities for on-chip sorting and quantification of biocolloids by size. Deterministic lateral displacement (DLD) pillar arrays are an efficient technology to sort, separate and enrich micrometre-scale particles, which include parasites1, bacteria2, blood cells3 and circulating tumour cells in blood4. However, this technology has not been translated to the true nanoscale, where it could function on biocolloids, such as exosomes. Exosomes, a key target of ‘liquid biopsies’, are secreted by cells and contain nucleic acid and protein information about their originating tissue5. One challenge in the study of exosome biology is to sort exosomes by size and surface markers6,7. We use manufacturable silicon processes to produce nanoscale DLD (nano-DLD) arrays of uniform gap sizes ranging from 25 to 235 nm. We show that at low Peclet (Pe) numbers, at which diffusion and deterministic displacement compete, nano-DLD arrays separate particles between 20 to 110 nm based on size with sharp resolution. Further, we demonstrate the size-based displacement of exosomes, and so open up the potential for on-chip sorting and quantification of these important biocolloids.

Proceedings Article
03 Nov 2016
TL;DR: The authors proposed a framework that facilitates better understanding of the encoded representations by defining prediction tasks around isolated aspects of sentence structure (namely sentence length, word content, and word order), and score representations by the ability to train a classifier to solve each prediction task when using the representation as input.
Abstract: There is a lot of research interest in encoding variable length sentences into fixed length vectors, in a way that preserves the sentence meanings. Two common methods include representations based on averaging word vectors, and representations based on the hidden states of recurrent neural networks such as LSTMs. The sentence vectors are used as features for subsequent machine learning tasks or for pre-training in the context of deep learning. However, not much is known about the properties that are encoded in these sentence representations and about the language information they capture. We propose a framework that facilitates better understanding of the encoded representations. We define prediction tasks around isolated aspects of sentence structure (namely sentence length, word content, and word order), and score representations by the ability to train a classifier to solve each prediction task when using the representation as input. We demonstrate the potential contribution of the approach by analyzing different sentence representation mechanisms. The analysis sheds light on the relative strengths of different sentence embedding methods with respect to these low level prediction tasks, and on the effect of the encoded vector's dimensionality on the resulting representations.

Journal ArticleDOI
Tayfun Gokmen1, Yurii A. Vlasov1
TL;DR: A concept of resistive processing unit (RPU) devices that can potentially accelerate DNN training by orders of magnitude while using much less power is proposed that will be able to tackle Big Data problems with trillions of parameters that is impossible to address today.
Abstract: In recent years, deep neural networks (DNN) have demonstrated significant business impact in large scale analysis and classification tasks such as speech recognition, visual object detection, pattern extraction, etc. Training of large DNNs, however, is universally considered as time consuming and computationally intensive task that demands datacenter-scale computational resources recruited for many days. Here we propose a concept of resistive processing unit (RPU) devices that can potentially accelerate DNN training by orders of magnitude while using much less power. The proposed RPU device can store and update the weight values locally thus minimizing data movement during training and allowing to fully exploit the locality and the parallelism of the training algorithm. We evaluate the effect of various RPU device features/non-idealities and system parameters on performance in order to derive the device and system level specifications for implementation of an accelerator chip for DNN training in a realistic CMOS-compatible technology. For large DNNs with about 1 billion weights this massively parallel RPU architecture can achieve acceleration factors of 30,000X compared to state-of-the-art microprocessors while providing power efficiency of 84,000 GigaOps/s/W. Problems that currently require days of training on a datacenter-size cluster with thousands of machines can be addressed within hours on a single RPU accelerator. A system consisting of a cluster of RPU accelerators will be able to tackle Big Data problems with trillions of parameters that is impossible to address today like, for example, natural speech recognition and translation between all world languages, real-time analytics on large streams of business and scientific data, integration and analysis of multimodal sensory data flows from a massive number of IoT (Internet of Things) sensors.

Journal ArticleDOI
Derrek P. Hibar1, Lars T. Westlye2, Lars T. Westlye3, T.G.M. van Erp4, Jerod M. Rasmussen4, Cassandra D. Leonardo1, Joshua Faskowitz1, Unn K. Haukvik3, Cecilie B. Hartberg3, Nhat Trung Doan3, Ingrid Agartz3, Anders M. Dale5, Oliver Gruber6, Oliver Gruber7, Bernd Krämer7, Sarah Trost7, Benny Liberg8, Christoph Abé8, C J Ekman8, Martin Ingvar8, Martin Ingvar9, Mikael Landén8, Mikael Landén10, Scott C. Fears11, Nelson B. Freimer11, Carrie E. Bearden11, Carrie E. Bearden12, Emma Sprooten13, David C. Glahn13, Godfrey D. Pearlson13, Louise Emsell14, Joanne Kenney14, Cathy Scanlon14, Colm McDonald14, Dara M. Cannon14, Jorge R. C. Almeida15, Amelia Versace16, Xavier Caseras17, Natalia Lawrence18, Mary L. Phillips17, Danai Dima19, Danai Dima20, G. Delvecchio20, Sophia Frangou19, Theodore D. Satterthwaite21, Daniel H. Wolf21, Josselin Houenou22, Josselin Houenou23, Chantal Henry24, Chantal Henry23, Ulrik Fredrik Malt2, Ulrik Fredrik Malt3, Erlend Bøen, Torbjørn Elvsåshagen, Allan H. Young20, Adrian J. Lloyd25, Guy M. Goodwin26, Clare E. Mackay26, C. Bourne27, C. Bourne26, Amy C. Bilderbeck26, L. Abramovic28, Marco P. Boks28, N.E.M. van Haren28, Roel A. Ophoff11, Roel A. Ophoff28, René S. Kahn28, Michael Bauer29, Andrea Pfennig29, Martin Alda30, Tomas Hajek31, Tomas Hajek30, Benson Mwangi, Jair C. Soares, Thomas Nickson32, Ralica Dimitrova32, Jess E. Sussmann32, Saskia P. Hagenaars32, Heather C. Whalley32, Andrew M. McIntosh32, Paul M. Thompson12, Paul M. Thompson1, Ole A. Andreassen3 
TL;DR: In this paper, the authors quantified case-control differences in intracranial volume (ICV) and each of eight subcortical brain measures: nucleus accumbens, amygdala, caudate, hippocampus, globus pallidus, putamen, thalamus, lateral ventricles.
Abstract: Considerable uncertainty exists about the defining brain changes associated with bipolar disorder (BD). Understanding and quantifying the sources of uncertainty can help generate novel clinical hypotheses about etiology and assist in the development of biomarkers for indexing disease progression and prognosis. Here we were interested in quantifying case–control differences in intracranial volume (ICV) and each of eight subcortical brain measures: nucleus accumbens, amygdala, caudate, hippocampus, globus pallidus, putamen, thalamus, lateral ventricles. In a large study of 1710 BD patients and 2594 healthy controls, we found consistent volumetric reductions in BD patients for mean hippocampus (Cohen’s d=−0.232; P=3.50 × 10−7) and thalamus (d=−0.148; P=4.27 × 10−3) and enlarged lateral ventricles (d=−0.260; P=3.93 × 10−5) in patients. No significant effect of age at illness onset was detected. Stratifying patients based on clinical subtype (BD type I or type II) revealed that BDI patients had significantly larger lateral ventricles and smaller hippocampus and amygdala than controls. However, when comparing BDI and BDII patients directly, we did not detect any significant differences in brain volume. This likely represents similar etiology between BD subtype classifications. Exploratory analyses revealed significantly larger thalamic volumes in patients taking lithium compared with patients not taking lithium. We detected no significant differences between BDII patients and controls in the largest such comparison to date. Findings in this study should be interpreted with caution and with careful consideration of the limitations inherent to meta-analyzed neuroimaging comparisons.

Journal ArticleDOI
TL;DR: A review of the clinical literature, mainly focusing on current outstanding issues, is given, followed by some innovative proposals for future improvements.
Abstract: The concept of diffusion magnetic resonance (MR) imaging emerged in the mid-1980s, together with the first images of water diffusion in the human brain, as a way to probe tissue structure at a microscopic scale, although the images were acquired at a millimetric scale. Since then, diffusion MR imaging has become a pillar of modern clinical imaging. Diffusion MR imaging has mainly been used to investigate neurologic disorders. A dramatic application of diffusion MR imaging has been acute brain ischemia, providing patients with the opportunity to receive suitable treatment at a stage when brain tissue might still be salvageable, thus avoiding terrible handicaps. On the other hand, it was found that water diffusion is anisotropic in white matter, because axon membranes limit molecular movement perpendicularly to the nerve fibers. This feature can be exploited to produce stunning maps of the orientation in space of the white matter tracts and brain connections in just a few minutes. Diffusion MR imaging is now also rapidly expanding in oncology, for the detection of malignant lesions and metastases, as well as monitoring. Water diffusion is usually largely decreased in malignant tissues, and body diffusion MR imaging, which does not require any tracer injection, is rapidly becoming a modality of choice to detect, characterize, or even stage malignant lesions, especially for breast or prostate cancer. After a brief summary of the key methodological concepts beyond diffusion MR imaging, this article will give a review of the clinical literature, mainly focusing on current outstanding issues, followed by some innovative proposals for future improvements.

Proceedings ArticleDOI
07 May 2016
TL;DR: The design and implementation of an interactive visual analytics system, Prospector, that provides interactive partial dependence diagnostics and support for localized inspection allows data scientists to understand how and why specific datapoints are predicted as they are.
Abstract: Understanding predictive models, in terms of interpreting and identifying actionable insights, is a challenging task. Often the importance of a feature in a model is only a rough estimate condensed into one number. However, our research goes beyond these naive estimates through the design and implementation of an interactive visual analytics system, Prospector. By providing interactive partial dependence diagnostics, data scientists can understand how features affect the prediction overall. In addition, our support for localized inspection allows data scientists to understand how and why specific datapoints are predicted as they are, as well as support for tweaking feature values and seeing how the prediction responds. Our system is then evaluated using a case study involving a team of data scientists improving predictive models for detecting the onset of diabetes from electronic medical records.

Journal ArticleDOI
TL;DR: It is shown that nano servers in Fog computing can complement centralized DCs to serve certain applications, mostly IoT applications for which the source of data is in end-user premises, and lead to energy saving if the applications are off-loadable from centralizedDCs and run on nDCs.
Abstract: Tiny computers located in end-user premises are becoming popular as local servers for Internet of Things (IoT) and Fog computing services. These highly distributed servers that can host and distribute content and applications in a peer-to-peer (P2P) fashion are known as nano data centers (nDCs). Despite the growing popularity of nano servers, their energy consumption is not well-investigated. To study energy consumption of nDCs, we propose and use flow-based and time-based energy consumption models for shared and unshared network equipment, respectively. To apply and validate these models, a set of measurements and experiments are performed to compare energy consumption of a service provided by nDCs and centralized data centers (DCs). A number of findings emerge from our study, including the factors in the system design that allow nDCs to consume less energy than its centralized counterpart. These include the type of access network attached to nano servers and nano server’s time utilization (the ratio of the idle time to active time). Additionally, the type of applications running on nDCs and factors such as number of downloads, number of updates, and amount of preloaded copies of data influence the energy cost. Our results reveal that number of hops between a user and content has little impact on the total energy consumption compared to the above-mentioned factors. We show that nano servers in Fog computing can complement centralized DCs to serve certain applications, mostly IoT applications for which the source of data is in end-user premises, and lead to energy saving if the applications (or a part of them) are off-loadable from centralized DCs and run on nDCs.

Proceedings ArticleDOI
30 Jun 2016
TL;DR: A deep learning approach for phenotyping from patient EHRs by building a fourlayer convolutional neural network model for extracting phenotypes and perform prediction and the proposed model is validated on a real world EHR data warehouse under the specific scenario of predictive modeling of chronic diseases.
Abstract: The recent years have witnessed a surge of interests in data analytics with patient Electronic Health Records (EHR). Data-driven healthcare, which aims at effective utilization of big medical data, representing the collective learning in treating hundreds of millions of patients, to provide the best and most personalized care, is believed to be one of the most promising directions for transforming healthcare. EHR is one of the major carriers for make this data-driven healthcare revolution successful. There are many challenges on working directly with EHR, such as temporality, sparsity, noisiness, bias, etc. Thus effective feature extraction, or phenotyping from patient EHRs is a key step before any further applications. In this paper, we propose a deep learning approach for phenotyping from patient EHRs. We first represent the EHRs for every patient as a temporal matrix with time on one dimension and event on the other dimension. Then we build a fourlayer convolutional neural network model for extracting phenotypes and perform prediction. The first layer is composed of those EHR matrices. The second layer is a one-side convolution layer that can extract phenotypes from the first layer. The third layer is a max pooling layer introducing sparsity on the detected phenotypes, so that only those significant phenotypes will remain. The fourth layer is a fully connected softmax prediction layer. In order to incorporate the temporal smoothness of the patient EHR, we also investigated three different temporal fusion mechanisms in the model: early fusion, late fusion and slow fusion. Finally the proposed model is validated on a real world EHR data warehouse under the specific scenario of predictive modeling of chronic diseases.