Showing papers by "Veronika Cheplygina published in 2020"
••
TL;DR: Ten simple rules to help researchers who are planning to start their journey on Twitter to take their first steps and advance their careers using Twitter.
Abstract: Twitter is one of the most popular social media platforms, with over 320 million active users as of February 2019. Twitter users can enjoy free content delivered by other users whom they actively decide to follow. However, unlike in other areas where Twitter is used passively (e.g., to follow influential figures and/or information agencies), in science it can be used in a much more active, collaborative way: to ask for advice, to form new bonds and scientific collaborations, to announce jobs and find employees, to find new mentors and jobs. This is particularly important in the early stages of a scientific career, during which lack of collaboration or delayed access to information can have the most impact.
For these reasons, using Twitter appropriately [1] can be more than just a social media activity; it can be a real career incubator in which researchers can develop their professional circles, launch new research projects and get helped by the community at various stages of the projects. Twitter is a tool that facilitates decentralization in science; you are able to present yourself to the community, to develop your personal brand, to set up a dialogue with people inside and outside your research field and to create or join professional environment in your field without mediators such as your direct boss.
This article is written by a group of researchers who have a strong feeling that they have personally benefited from using Twitter, both research-wise and network-wise. We (@DrVeronikaCH, @Felienne, @CaAl, @nbielczyk_neuro, @ionicasmeets) share our personal experience and advice in the form of ten simple rules, and we hope that this material will help a number of researchers who are planning to start their journey on Twitter to take their first steps and advance their careers using Twitter.
48 citations
••
01 Dec 2020
TL;DR: This survey reviews studies applying crowdsourcing to the analysis of medical images, published prior to July 2018, and identifies common approaches, challenges and considerations, providing guidance of utility to researchers adopting this approach.
Abstract: Rapid advances in image processing capabilities have been seen across many domains, fostered by the application of machine learning algorithms to "big-data". However, within the realm of medical image analysis, advances have been curtailed, in part, due to the limited availability of large-scale, well-annotated datasets. One of the main reasons for this is the high cost often associated with producing large amounts of high-quality meta-data. Recently, there has been growing interest in the application of crowdsourcing for this purpose; a technique that has proven effective for creating large-scale datasets across a range of disciplines, from computer vision to astrophysics. Despite the growing popularity of this approach, there has not yet been a comprehensive literature review to provide guidance to researchers considering using crowdsourcing methodologies in their own medical imaging analysis. In this survey, we review studies applying crowdsourcing to the analysis of medical images, published prior to July 2018. We identify common approaches, challenges and considerations, providing guidance of utility to researchers adopting this approach. Finally, we discuss future opportunities for development within this emerging domain.
42 citations
••
Heidelberg University1, Université de Montréal2, Queen Mary University of London3, Trinity College, Dublin4, University of Hong Kong5, ETH Zurich6, University of Zurich7, University of California, Los Angeles8, University of Southern California9, University of Michigan10, University of California, Berkeley11, Cairo University12, Harvard University13, Yale University14, MIND Institute15, University of Oxford16, Florey Institute of Neuroscience and Mental Health17, Cardiff University18, Florida International University19, Georgia Institute of Technology20, Eindhoven University of Technology21, University of São Paulo22, Leiden University Medical Center23, Nicolaus Copernicus University in Toruń24, University of New South Wales25, University of Birmingham26, Stanford University27, University of Münster28, Montreal Neurological Institute and Hospital29, McGill University30, University of Miami31, Massachusetts Eye and Ear Infirmary32, Max Planck Society33, University of Queensland34, Monash University35, Auburn University36, University of Electronic Science and Technology of China37
TL;DR: The Organization for Human Brain Mapping undertook a group effort to gather helpful advice for ECRs in self-management, finding that early career researchers are faced with a range of competing pressures in academia.
19 citations
•
TL;DR: A survey of the MICCAI 2018 proceedings is conducted to investigate the common practice in medical image analysis applications and shows that it is possible to learn unbiased features by explicitly using demographic variables in an adversarial training setup, which leads to balanced scores per subgroups.
Abstract: One of the critical challenges in machine learning applications is to have fair predictions. There are numerous recent examples in various domains that convincingly show that algorithms trained with biased datasets can easily lead to erroneous or discriminatory conclusions. This is even more crucial in clinical applications where the predictive algorithms are designed mainly based on a limited or given set of medical images and demographic variables such as age, sex and race are not taken into account. In this work, we conduct a survey of the MICCAI 2018 proceedings to investigate the common practice in medical image analysis applications. Surprisingly, we found that papers focusing on diagnosis rarely describe the demographics of the datasets used, and the diagnosis is purely based on images. In order to highlight the importance of considering the demographics in diagnosis tasks, we used a publicly available dataset of skin lesions. We then demonstrate that a classifier with an overall area under the curve (AUC) of 0.83 has variable performance between 0.76 and 0.91 on subgroups based on age and sex, even though the training set was relatively balanced. Moreover, we show that it is possible to learn unbiased features by explicitly using demographic variables in an adversarial training setup, which leads to balanced scores per subgroups. Finally, we discuss the implications of these results and provide recommendations for further research.
13 citations
••
04 Oct 2020TL;DR: In this article, the authors conduct a survey of the MICCAI 2018 proceedings to investigate the common practice in medical image analysis applications and highlight the importance of considering the demographics in diagnosis tasks, using a publicly available dataset of skin lesions.
Abstract: One of the critical challenges in machine learning applications is to have fair predictions. There are numerous recent examples in various domains that convincingly show that algorithms trained with biased datasets can easily lead to erroneous or discriminatory conclusions. This is even more crucial in clinical applications where predictive algorithms are designed mainly based on a given set of medical images, and demographic variables such as age, sex and race are not taken into account. In this work, we conduct a survey of the MICCAI 2018 proceedings to investigate the common practice in medical image analysis applications. Surprisingly, we found that papers focusing on diagnosis rarely describe the demographics of the datasets used, and the diagnosis is purely based on images. In order to highlight the importance of considering the demographics in diagnosis tasks, we used a publicly available dataset of skin lesions. We then demonstrate that a classifier with an overall area under the curve (AUC) of 0.83 has variable performance between 0.76 and 0.91 on subgroups based on age and sex, even though the training set was relatively balanced. Moreover, we show that it is possible to learn unbiased features by explicitly using demographic variables in an adversarial training setup, which leads to balanced scores per subgroups. Finally, we discuss the implications of these results and provide recommendations for further research.
13 citations
••
TL;DR: This work proposes to use the "wisdom of the crowds" – internet users without specific expertise – to improve the predictions of the algorithms, and will validate these methods on three challenging detection tasks in chest computed tomography, histopathology images, and endoscopy video.
Abstract: Machine learning (ML) has great potential for early diagnosis of disease from medical scans, and at times, has even been shown to outperform experts. However, ML algorithms need large amounts of annotated data – scans with outlined abnormalities - for good performance. The time-consuming annotation process limits the progress of ML in this field. To address the annotation problem, multiple instance learning (MIL) algorithms were proposed, which learn from scans that have been diagnosed, but not annotated in detail. Unfortunately, these algorithms are not good enough at predicting where the abnormalities are located, which is important for diagnosis and prognosis of disease. This limits the application of these algorithms in research and in clinical practice. I propose to use the “wisdom of the crowds” –internet users without specific expertise – to improve the predictions of the algorithms. While the crowd does not have experience with medical imaging, recent studies and pilot data I collected show they can still provide useful information about the images, for example by saying whether images are visually similar or not. Such information has not been leveraged before in medical imaging applications. I will validate these methods on three challenging detection tasks in chest computed tomography, histopathology images, and endoscopy video. Understanding how the crowd can contribute to applications that typically require expert knowledge will allow harnessing the potential of large unannotated sets of data, training more reliable algorithms, and ultimately paving the way towards using ML algorithms in clinical practice.Keywords: machine learning, artificial intelligence, medical imaging, crowdsourcing, computer-aided diagnosis
3 citations
••
04 Oct 2020TL;DR: This work investigates meta-learning for segmentation across ten datasets of different organs and modalities by proposing four ways to represent each dataset by meta-features: one based on statistical features of the images, three based on deep learning features, and two based on support vector regression and deep neural networks.
Abstract: Deep learning has led to state-of-the-art results for many medical imaging tasks, such as segmentation of different anatomical structures. With the increased numbers of deep learning publications and openly available code, the approach to choosing a model for a new task becomes more complicated, while time and (computational) resources are limited. A possible solution to choosing a model efficiently is meta-learning, a learning method in which prior performance of a model is used to predict the performance for new tasks. We investigate meta-learning for segmentation across ten datasets of different organs and modalities. We propose four ways to represent each dataset by meta-features: one based on statistical features of the images and three are based on deep learning features. We use support vector regression and deep neural networks to learn the relationship between the meta-features and prior model performance. On three external test datasets these methods give Dice scores within 0.10 of the true performance. These results demonstrate the potential of meta-learning in medical imaging.
2 citations
•
TL;DR: In this article, the authors focus on high level prior, embedded at the loss function level, and categorize the articles according to the nature of the prior: the object shape, size, topology, and the inter-regions constraints.
Abstract: Today, deep convolutional neural networks (CNNs) have demonstrated state of the art performance for supervised medical image segmentation, across various imaging modalities and tasks. Despite early success, segmentation networks may still generate anatomically aberrant segmentations, with holes or inaccuracies near the object boundaries. To mitigate this effect, recent research works have focused on incorporating spatial information or prior knowledge to enforce anatomically plausible segmentation. If the integration of prior knowledge in image segmentation is not a new topic in classical optimization approaches, it is today an increasing trend in CNN based image segmentation, as shown by the growing literature on the topic. In this survey, we focus on high level prior, embedded at the loss function level. We categorize the articles according to the nature of the prior: the object shape, size, topology, and the inter-regions constraints. We highlight strengths and limitations of current approaches, discuss the challenge related to the design and the integration of prior-based losses, and the optimization strategies, and draw future research directions.
2 citations
•
TL;DR: In this paper, the authors investigate meta-learning for segmentation across ten datasets of different organs and modalities and propose four ways to represent each dataset by meta-features: one based on statistical features of the images and three are based on deep learning features.
Abstract: Deep learning has led to state-of-the-art results for many medical imaging tasks, such as segmentation of different anatomical structures. With the increased numbers of deep learning publications and openly available code, the approach to choosing a model for a new task becomes more complicated, while time and (computational) resources are limited. A possible solution to choosing a model efficiently is meta-learning, a learning method in which prior performance of a model is used to predict the performance for new tasks. We investigate meta-learning for segmentation across ten datasets of different organs and modalities. We propose four ways to represent each dataset by meta-features: one based on statistical features of the images and three are based on deep learning features. We use support vector regression and deep neural networks to learn the relationship between the meta-features and prior model performance. On three external test datasets these methods give Dice scores within 0.10 of the true performance. These results demonstrate the potential of meta-learning in medical imaging.
1 citations
•
TL;DR: It is shown that multi-task models with individual crowdsourced features have limited effect on the model, but when combined in an ensembles, leads to improved generalisation.
Abstract: Machine learning has a recognised need for large amounts of annotated data. Due to the high cost of expert annotations, crowdsourcing, where non-experts are asked to label or outline images, has been proposed as an alternative. Although many promising results are reported, the quality of diagnostic crowdsourced labels is still unclear. We propose to address this by instead asking the crowd about visual features of the images, which can be provided more intuitively, and by using these features in a multi-task learning framework through ensemble strategies. We compare our proposed approach to a baseline model with a set of 2000 skin lesions from the ISIC 2017 challenge dataset. The baseline model only predicts a binary label from the skin lesion image, while our multi-task model also predicts one of the following features: asymmetry of the lesion, border irregularity and color. We show that multi-task models with individual crowdsourced features have limited effect on the model, but when combined in an ensembles, leads to improved generalisation. The area under the receiver operating characteristic curve is 0.794 for the baseline model and 0.811 and 0.808 for multi-task ensembles respectively. Finally, we discuss the findings, identify some limitations and recommend directions for further research. The code of the models is available at this https URL.
1 citations
•
28 Apr 2020
TL;DR: It is shown that crowd features in combination with multi-task learning leads to improved generalisation and some limitations and recommend directions for further research.
Abstract: Machine learning has a recognised need for large amounts of annotated data. Due to the high cost of expert annotations, crowdsourcing, where non-experts are asked to label or outline images, has been proposed as an alternative. Although many promising results are reported, the quality of diagnostic crowdsourced labels is still unclear. We propose to address this by instead asking the crowd about visual features of the images, which can be provided more intuitively, and by using these features in a multi-task learning framework through ensemble strategies. We compare our proposed approach to a baseline model with a set of 2000 skin lesions from the ISIC 2017 challenge dataset. The baseline model only predicts a binary label from the skin lesion image, while our multi-task model also predicts one of the following features: asymmetry of the lesion, border irregularity and color. We show that multi-task models with individual crowdsourced features have limited effect on the model, but when combined in an ensembles, leads to improved generalisation. The area under the receiver operating characteristic curve is 0.794 for the baseline model and 0.811 and 0.808 for multi-task ensembles respectively. Finally, we discuss the findings, identify some limitations and recommend directions for further research. The code of the models is available at this https URL.
•
TL;DR: This work suggests that a pre-trained feature extractor can be used as primary tumor origin classifier for lung nodules, eliminating the need for elaborate fine-tuning of a new network and large datasets.
Abstract: Early detection of lung cancer has been proven to decrease mortality significantly. A recent development in computed tomography (CT), spectral CT, can potentially improve diagnostic accuracy, as it yields more information per scan than regular CT. However, the shear workload involved with analyzing a large number of scans drives the need for automated diagnosis methods. Therefore, we propose a detection and classification system for lung nodules in CT scans. Furthermore, we want to observe whether spectral images can increase classifier performance. For the detection of nodules we trained a VGG-like 3D convolutional neural net (CNN). To obtain a primary tumor classifier for our dataset we pre-trained a 3D CNN with similar architecture on nodule malignancies of a large publicly available dataset, the LIDC-IDRI dataset. Subsequently we used this pre-trained network as feature extractor for the nodules in our dataset. The resulting feature vectors were classified into two (benign/malignant) and three (benign/primary lung cancer/metastases) classes using support vector machine (SVM). This classification was performed both on nodule- and scan-level. We obtained state-of-the art performance for detection and malignancy regression on the LIDC-IDRI database. Classification performance on our own dataset was higher for scan- than for nodule-level predictions. For the three-class scan-level classification we obtained an accuracy of 78\%. Spectral features did increase classifier performance, but not significantly. Our work suggests that a pre-trained feature extractor can be used as primary tumor origin classifier for lung nodules, eliminating the need for elaborate fine-tuning of a new network and large datasets. Code is available at \url{this https URL}.