scispace - formally typeset
Search or ask a question

Showing papers by "Veronika Cheplygina published in 2020"


Journal ArticleDOI
TL;DR: Ten simple rules to help researchers who are planning to start their journey on Twitter to take their first steps and advance their careers using Twitter.
Abstract: Twitter is one of the most popular social media platforms, with over 320 million active users as of February 2019. Twitter users can enjoy free content delivered by other users whom they actively decide to follow. However, unlike in other areas where Twitter is used passively (e.g., to follow influential figures and/or information agencies), in science it can be used in a much more active, collaborative way: to ask for advice, to form new bonds and scientific collaborations, to announce jobs and find employees, to find new mentors and jobs. This is particularly important in the early stages of a scientific career, during which lack of collaboration or delayed access to information can have the most impact. For these reasons, using Twitter appropriately [1] can be more than just a social media activity; it can be a real career incubator in which researchers can develop their professional circles, launch new research projects and get helped by the community at various stages of the projects. Twitter is a tool that facilitates decentralization in science; you are able to present yourself to the community, to develop your personal brand, to set up a dialogue with people inside and outside your research field and to create or join professional environment in your field without mediators such as your direct boss. This article is written by a group of researchers who have a strong feeling that they have personally benefited from using Twitter, both research-wise and network-wise. We (@DrVeronikaCH, @Felienne, @CaAl, @nbielczyk_neuro, @ionicasmeets) share our personal experience and advice in the form of ten simple rules, and we hope that this material will help a number of researchers who are planning to start their journey on Twitter to take their first steps and advance their careers using Twitter.

48 citations


Journal ArticleDOI
01 Dec 2020
TL;DR: This survey reviews studies applying crowdsourcing to the analysis of medical images, published prior to July 2018, and identifies common approaches, challenges and considerations, providing guidance of utility to researchers adopting this approach.
Abstract: Rapid advances in image processing capabilities have been seen across many domains, fostered by the application of machine learning algorithms to "big-data". However, within the realm of medical image analysis, advances have been curtailed, in part, due to the limited availability of large-scale, well-annotated datasets. One of the main reasons for this is the high cost often associated with producing large amounts of high-quality meta-data. Recently, there has been growing interest in the application of crowdsourcing for this purpose; a technique that has proven effective for creating large-scale datasets across a range of disciplines, from computer vision to astrophysics. Despite the growing popularity of this approach, there has not yet been a comprehensive literature review to provide guidance to researchers considering using crowdsourcing methodologies in their own medical imaging analysis. In this survey, we review studies applying crowdsourcing to the analysis of medical images, published prior to July 2018. We identify common approaches, challenges and considerations, providing guidance of utility to researchers adopting this approach. Finally, we discuss future opportunities for development within this emerging domain.

42 citations


Journal ArticleDOI
22 Apr 2020-Neuron
TL;DR: The Organization for Human Brain Mapping undertook a group effort to gather helpful advice for ECRs in self-management, finding that early career researchers are faced with a range of competing pressures in academia.

19 citations


Posted Content
TL;DR: A survey of the MICCAI 2018 proceedings is conducted to investigate the common practice in medical image analysis applications and shows that it is possible to learn unbiased features by explicitly using demographic variables in an adversarial training setup, which leads to balanced scores per subgroups.
Abstract: One of the critical challenges in machine learning applications is to have fair predictions. There are numerous recent examples in various domains that convincingly show that algorithms trained with biased datasets can easily lead to erroneous or discriminatory conclusions. This is even more crucial in clinical applications where the predictive algorithms are designed mainly based on a limited or given set of medical images and demographic variables such as age, sex and race are not taken into account. In this work, we conduct a survey of the MICCAI 2018 proceedings to investigate the common practice in medical image analysis applications. Surprisingly, we found that papers focusing on diagnosis rarely describe the demographics of the datasets used, and the diagnosis is purely based on images. In order to highlight the importance of considering the demographics in diagnosis tasks, we used a publicly available dataset of skin lesions. We then demonstrate that a classifier with an overall area under the curve (AUC) of 0.83 has variable performance between 0.76 and 0.91 on subgroups based on age and sex, even though the training set was relatively balanced. Moreover, we show that it is possible to learn unbiased features by explicitly using demographic variables in an adversarial training setup, which leads to balanced scores per subgroups. Finally, we discuss the implications of these results and provide recommendations for further research.

13 citations


Book ChapterDOI
04 Oct 2020
TL;DR: In this article, the authors conduct a survey of the MICCAI 2018 proceedings to investigate the common practice in medical image analysis applications and highlight the importance of considering the demographics in diagnosis tasks, using a publicly available dataset of skin lesions.
Abstract: One of the critical challenges in machine learning applications is to have fair predictions. There are numerous recent examples in various domains that convincingly show that algorithms trained with biased datasets can easily lead to erroneous or discriminatory conclusions. This is even more crucial in clinical applications where predictive algorithms are designed mainly based on a given set of medical images, and demographic variables such as age, sex and race are not taken into account. In this work, we conduct a survey of the MICCAI 2018 proceedings to investigate the common practice in medical image analysis applications. Surprisingly, we found that papers focusing on diagnosis rarely describe the demographics of the datasets used, and the diagnosis is purely based on images. In order to highlight the importance of considering the demographics in diagnosis tasks, we used a publicly available dataset of skin lesions. We then demonstrate that a classifier with an overall area under the curve (AUC) of 0.83 has variable performance between 0.76 and 0.91 on subgroups based on age and sex, even though the training set was relatively balanced. Moreover, we show that it is possible to learn unbiased features by explicitly using demographic variables in an adversarial training setup, which leads to balanced scores per subgroups. Finally, we discuss the implications of these results and provide recommendations for further research.

13 citations


Journal ArticleDOI
TL;DR: This work proposes to use the "wisdom of the crowds" – internet users without specific expertise – to improve the predictions of the algorithms, and will validate these methods on three challenging detection tasks in chest computed tomography, histopathology images, and endoscopy video.
Abstract: Machine learning (ML) has great potential for early diagnosis of disease from medical scans, and at times, has even been shown to outperform experts. However, ML algorithms need large amounts of annotated data – scans with outlined abnormalities - for good performance. The time-consuming annotation process limits the progress of ML in this field. To address the annotation problem, multiple instance learning (MIL) algorithms were proposed, which learn from scans that have been diagnosed, but not annotated in detail. Unfortunately, these algorithms are not good enough at predicting where the abnormalities are located, which is important for diagnosis and prognosis of disease. This limits the application of these algorithms in research and in clinical practice. I propose to use the “wisdom of the crowds” –internet users without specific expertise – to improve the predictions of the algorithms. While the crowd does not have experience with medical imaging, recent studies and pilot data I collected show they can still provide useful information about the images, for example by saying whether images are visually similar or not. Such information has not been leveraged before in medical imaging applications. I will validate these methods on three challenging detection tasks in chest computed tomography, histopathology images, and endoscopy video. Understanding how the crowd can contribute to applications that typically require expert knowledge will allow harnessing the potential of large unannotated sets of data, training more reliable algorithms, and ultimately paving the way towards using ML algorithms in clinical practice.Keywords: machine learning, artificial intelligence, medical imaging, crowdsourcing, computer-aided diagnosis

3 citations


Book ChapterDOI
04 Oct 2020
TL;DR: This work investigates meta-learning for segmentation across ten datasets of different organs and modalities by proposing four ways to represent each dataset by meta-features: one based on statistical features of the images, three based on deep learning features, and two based on support vector regression and deep neural networks.
Abstract: Deep learning has led to state-of-the-art results for many medical imaging tasks, such as segmentation of different anatomical structures. With the increased numbers of deep learning publications and openly available code, the approach to choosing a model for a new task becomes more complicated, while time and (computational) resources are limited. A possible solution to choosing a model efficiently is meta-learning, a learning method in which prior performance of a model is used to predict the performance for new tasks. We investigate meta-learning for segmentation across ten datasets of different organs and modalities. We propose four ways to represent each dataset by meta-features: one based on statistical features of the images and three are based on deep learning features. We use support vector regression and deep neural networks to learn the relationship between the meta-features and prior model performance. On three external test datasets these methods give Dice scores within 0.10 of the true performance. These results demonstrate the potential of meta-learning in medical imaging.

2 citations


Posted Content
TL;DR: In this article, the authors focus on high level prior, embedded at the loss function level, and categorize the articles according to the nature of the prior: the object shape, size, topology, and the inter-regions constraints.
Abstract: Today, deep convolutional neural networks (CNNs) have demonstrated state of the art performance for supervised medical image segmentation, across various imaging modalities and tasks. Despite early success, segmentation networks may still generate anatomically aberrant segmentations, with holes or inaccuracies near the object boundaries. To mitigate this effect, recent research works have focused on incorporating spatial information or prior knowledge to enforce anatomically plausible segmentation. If the integration of prior knowledge in image segmentation is not a new topic in classical optimization approaches, it is today an increasing trend in CNN based image segmentation, as shown by the growing literature on the topic. In this survey, we focus on high level prior, embedded at the loss function level. We categorize the articles according to the nature of the prior: the object shape, size, topology, and the inter-regions constraints. We highlight strengths and limitations of current approaches, discuss the challenge related to the design and the integration of prior-based losses, and the optimization strategies, and draw future research directions.

2 citations


Posted Content
TL;DR: In this paper, the authors investigate meta-learning for segmentation across ten datasets of different organs and modalities and propose four ways to represent each dataset by meta-features: one based on statistical features of the images and three are based on deep learning features.
Abstract: Deep learning has led to state-of-the-art results for many medical imaging tasks, such as segmentation of different anatomical structures. With the increased numbers of deep learning publications and openly available code, the approach to choosing a model for a new task becomes more complicated, while time and (computational) resources are limited. A possible solution to choosing a model efficiently is meta-learning, a learning method in which prior performance of a model is used to predict the performance for new tasks. We investigate meta-learning for segmentation across ten datasets of different organs and modalities. We propose four ways to represent each dataset by meta-features: one based on statistical features of the images and three are based on deep learning features. We use support vector regression and deep neural networks to learn the relationship between the meta-features and prior model performance. On three external test datasets these methods give Dice scores within 0.10 of the true performance. These results demonstrate the potential of meta-learning in medical imaging.

1 citations


Posted Content
TL;DR: It is shown that multi-task models with individual crowdsourced features have limited effect on the model, but when combined in an ensembles, leads to improved generalisation.
Abstract: Machine learning has a recognised need for large amounts of annotated data. Due to the high cost of expert annotations, crowdsourcing, where non-experts are asked to label or outline images, has been proposed as an alternative. Although many promising results are reported, the quality of diagnostic crowdsourced labels is still unclear. We propose to address this by instead asking the crowd about visual features of the images, which can be provided more intuitively, and by using these features in a multi-task learning framework through ensemble strategies. We compare our proposed approach to a baseline model with a set of 2000 skin lesions from the ISIC 2017 challenge dataset. The baseline model only predicts a binary label from the skin lesion image, while our multi-task model also predicts one of the following features: asymmetry of the lesion, border irregularity and color. We show that multi-task models with individual crowdsourced features have limited effect on the model, but when combined in an ensembles, leads to improved generalisation. The area under the receiver operating characteristic curve is 0.794 for the baseline model and 0.811 and 0.808 for multi-task ensembles respectively. Finally, we discuss the findings, identify some limitations and recommend directions for further research. The code of the models is available at this https URL.

1 citations


Posted Content
28 Apr 2020
TL;DR: It is shown that crowd features in combination with multi-task learning leads to improved generalisation and some limitations and recommend directions for further research.
Abstract: Machine learning has a recognised need for large amounts of annotated data. Due to the high cost of expert annotations, crowdsourcing, where non-experts are asked to label or outline images, has been proposed as an alternative. Although many promising results are reported, the quality of diagnostic crowdsourced labels is still unclear. We propose to address this by instead asking the crowd about visual features of the images, which can be provided more intuitively, and by using these features in a multi-task learning framework through ensemble strategies. We compare our proposed approach to a baseline model with a set of 2000 skin lesions from the ISIC 2017 challenge dataset. The baseline model only predicts a binary label from the skin lesion image, while our multi-task model also predicts one of the following features: asymmetry of the lesion, border irregularity and color. We show that multi-task models with individual crowdsourced features have limited effect on the model, but when combined in an ensembles, leads to improved generalisation. The area under the receiver operating characteristic curve is 0.794 for the baseline model and 0.811 and 0.808 for multi-task ensembles respectively. Finally, we discuss the findings, identify some limitations and recommend directions for further research. The code of the models is available at this https URL.

Posted Content
TL;DR: This work suggests that a pre-trained feature extractor can be used as primary tumor origin classifier for lung nodules, eliminating the need for elaborate fine-tuning of a new network and large datasets.
Abstract: Early detection of lung cancer has been proven to decrease mortality significantly. A recent development in computed tomography (CT), spectral CT, can potentially improve diagnostic accuracy, as it yields more information per scan than regular CT. However, the shear workload involved with analyzing a large number of scans drives the need for automated diagnosis methods. Therefore, we propose a detection and classification system for lung nodules in CT scans. Furthermore, we want to observe whether spectral images can increase classifier performance. For the detection of nodules we trained a VGG-like 3D convolutional neural net (CNN). To obtain a primary tumor classifier for our dataset we pre-trained a 3D CNN with similar architecture on nodule malignancies of a large publicly available dataset, the LIDC-IDRI dataset. Subsequently we used this pre-trained network as feature extractor for the nodules in our dataset. The resulting feature vectors were classified into two (benign/malignant) and three (benign/primary lung cancer/metastases) classes using support vector machine (SVM). This classification was performed both on nodule- and scan-level. We obtained state-of-the art performance for detection and malignancy regression on the LIDC-IDRI database. Classification performance on our own dataset was higher for scan- than for nodule-level predictions. For the three-class scan-level classification we obtained an accuracy of 78\%. Spectral features did increase classifier performance, but not significantly. Our work suggests that a pre-trained feature extractor can be used as primary tumor origin classifier for lung nodules, eliminating the need for elaborate fine-tuning of a new network and large datasets. Code is available at \url{this https URL}.