Showing papers on "Annotation published in 2021"

PDF

Open Access

Journal Article•DOI•

Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation.

[...]

Ivica Letunic, Peer Bork¹, Peer Bork², Peer Bork³•Institutions (3)

University of Würzburg¹, Yonsei University², European Bioinformatics Institute³

02 Jul 2021-Nucleic Acids Research

TL;DR: The Interactive Tree Of Life (ITOL) as mentioned in this paper is an online tool for the display, manipulation and annotation of phylogenetic and other trees, which allows users to draw shapes, labels and other features directly onto the trees.

...read moreread less

Abstract: The Interactive Tree Of Life (https://itol.embl.de) is an online tool for the display, manipulation and annotation of phylogenetic and other trees. It is freely available and open to everyone. iTOL version 5 introduces a completely new tree display engine, together with numerous new features. For example, a new dataset type has been added (MEME motifs), while annotation options have been expanded for several existing ones. Node metadata display options have been extended and now also support non-numerical categorical values, as well as multiple values per node. Direct manual annotation is now available, providing a set of basic drawing and labeling tools, allowing users to draw shapes, labels and other features by hand directly onto the trees. Support for tree and dataset scales has been extended, providing fine control over line and label styles. Unrooted tree displays can now use the equal-daylight algorithm, proving a much greater display clarity. The user account system has been streamlined and expanded with new navigation options and currently handles >1 million trees from >70 000 individual users.

...read moreread less

2,856 citations

Journal Article•DOI•

The Gene Ontology resource: enriching a GOld mine

[...]

Seth Carbon, Eric Douglass, Benjamin M. Good, Deepak Unni +176 more

08 Jan 2021-Nucleic Acids Research

TL;DR: A historical archive covering the past 15 years of GO data with a consistent format and file structure for both the ontology and annotations is made available to maintain consistency with other ontologies.

...read moreread less

Abstract: The Gene Ontology Consortium (GOC) provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products. Here, we report the advances of the consortium over the past two years. The new GO-CAM annotation framework was notably improved, and we formalized the model with a computational schema to check and validate the rapidly increasing repository of 2838 GO-CAMs. In addition, we describe the impacts of several collaborations to refine GO and report a 10% increase in the number of GO annotations, a 25% increase in annotated gene products, and over 9,400 new scientific articles annotated. As the project matures, we continue our efforts to review older annotations in light of newer findings, and, to maintain consistency with other ontologies. As a result, 20 000 annotations derived from experimental data were reviewed, corresponding to 2.5% of experimental GO annotations. The website (http://geneontology.org) was redesigned for quick access to documentation, downloads and tools. To maintain an accurate resource and support traceability and reproducibility, we have made available a historical archive covering the past 15 years of GO data with a consistent format and file structure for both the ontology and annotations.

...read moreread less

1,988 citations

Journal Article•DOI•

BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database

[...]

Tomáš Brůna¹, Katharina J. Hoff², Alexandre Lomsadze¹, Mario Stanke², Mark Borodovsky¹ - Show less +1 more•Institutions (2)

Georgia Institute of Technology¹, University of Greifswald²

06 Jan 2021

TL;DR: The BRAKER2 pipeline as mentioned in this paper generates and integrates external protein support into the iterative process of training and gene prediction by GeneMark-EP+ and AUGUSTUS, and it is favorably compared with other pipelines, e.g. MAKER2, in terms of accuracy and performance.

...read moreread less

Abstract: The task of eukaryotic genome annotation remains challenging. Only a few genomes could serve as standards of annotation achieved through a tremendous investment of human curation efforts. Still, the correctness of all alternative isoforms, even in the best-annotated genomes, could be a good subject for further investigation. The new BRAKER2 pipeline generates and integrates external protein support into the iterative process of training and gene prediction by GeneMark-EP+ and AUGUSTUS. BRAKER2 continues the line started by BRAKER1 where self-training GeneMark-ET and AUGUSTUS made gene predictions supported by transcriptomic data. Among the challenges addressed by the new pipeline was a generation of reliable hints to protein-coding exon boundaries from likely homologous but evolutionarily distant proteins. In comparison with other pipelines for eukaryotic genome annotation, BRAKER2 is fully automatic. It is favorably compared under equal conditions with other pipelines, e.g. MAKER2, in terms of accuracy and performance. Development of BRAKER2 should facilitate solving the task of harmonization of annotation of protein-coding genes in genomes of different eukaryotic species. However, we fully understand that several more innovations are needed in transcriptomic and proteomic technologies as well as in algorithmic development to reach the goal of highly accurate annotation of eukaryotic genomes.

...read moreread less

455 citations

Journal Article•DOI•

An Extensive Review of Tools for Manual Annotation of Documents

[...]

Mariana Neves¹, Jurica Ševa¹•Institutions (1)

Federal Institute for Risk Assessment¹

18 Jan 2021-Briefings in Bioinformatics

TL;DR: Motivation Annotation tools are applied to build training and test corpora, which are essential for the development and evaluation of new natural language processing algorithms, and some tools are comprehensive and mature enough to be used on most annotation projects.

...read moreread less

Abstract: MOTIVATION: Annotation tools are applied to build training and test corpora, which are essential for the development and evaluation of new natural language processing algorithms. Further, annotation tools are also used to extract new information for a particular use case. However, owing to the high number of existing annotation tools, finding the one that best fits particular needs is a demanding task that requires searching the scientific literature followed by installing and trying various tools. METHODS: We searched for annotation tools and selected a subset of them according to five requirements with which they should comply, such as being Web-based or supporting the definition of a schema. We installed the selected tools (when necessary), carried out hands-on experiments and evaluated them using 26 criteria that covered functional and technical aspects. We defined each criterion on three levels of matches and a score for the final evaluation of the tools. RESULTS: We evaluated 78 tools and selected the following 15 for a detailed evaluation: BioQRator, brat, Catma, Djangology, ezTag, FLAT, LightTag, MAT, MyMiner, PDFAnno, prodigy, tagtog, TextAE, WAT-SL and WebAnno. Full compliance with our 26 criteria ranged from only 9 up to 20 criteria, which demonstrated that some tools are comprehensive and mature enough to be used on most annotation projects. The highest score of 0.81 was obtained by WebAnno (of a maximum value of 1.0).

...read moreread less

78 citations

DOI•

Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification.

[...]

Oliver Schwengers¹, Lukas Jelonek¹, Marius Alfred Dieckmann¹, Sebastian Beyvers¹, Jochen Blom¹, Alexander Goesmann¹ - Show less +2 more•Institutions (1)

University of Giessen¹

01 Nov 2021

TL;DR: Bakta as discussed by the authors is a command-line software tool for the robust, taxon-independent, thorough and, nonetheless, fast annotation of bacterial genomes, including the detection of small proteins taking into account replicon metadata.

...read moreread less

Abstract: Command-line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command-line software pipelines heavily depend on taxon-specific databases or sufficiently well annotated reference genomes. Here, we introduce Bakta, a new command-line software tool for the robust, taxon-independent, thorough and, nonetheless, fast annotation of bacterial genomes. Bakta conducts a comprehensive annotation workflow including the detection of small proteins taking into account replicon metadata. The annotation of coding sequences is accelerated via an alignment-free sequence identification approach that in addition facilitates the precise assignment of public database cross-references. Annotation results are exported in GFF3 and International Nucleotide Sequence Database Collaboration (INSDC)-compliant flat files, as well as comprehensive JSON files, facilitating automated downstream analysis. We compared Bakta to other rapid contemporary command-line annotation software tools in both targeted and taxonomically broad benchmarks including isolates and metagenomic-assembled genomes. We demonstrated that Bakta outperforms other tools in terms of functional annotations, the assignment of functional categories and database cross-references, whilst providing comparable wall-clock runtimes. Bakta is implemented in Python 3 and runs on MacOS and Linux systems. It is freely available under a GPLv3 license at https://github.com/oschwengers/bakta. An accompanying web version is available at https://bakta.computational.bio.

...read moreread less

76 citations

Journal Article•DOI•

The image annotation algorithm using convolutional features from intermediate layer of deep learning

[...]

Yuantao Chen¹, Linwu Liu¹, Jiajun Tao¹, Xi Chen¹, Runlong Xia, Qian Zhang, Jie Xiong², Kai Yang, Jingbo Xie - Show less +5 more•Institutions (2)

Changsha University of Science and Technology¹, Yangtze University²

01 Jan 2021-Multimedia Tools and Applications

TL;DR: This paper proposes an innovative method in which the visual features of the image are presented by the intermediate layer features of deep learning, while semantic concepts are represented by mean vectors of positive samples.

...read moreread less

Abstract: The automatic image annotation is an effective computer operation that predicts the annotation of an unknown image by automatically learning potential relationships between the semantic concept space and the visual feature space in the annotation image dataset. Usually, the auto-labeling image includes the processing: learning processing and labeling processing. Existing image annotation methods that employ convolutional features of deep learning methods have a number of limitations, including complex training and high space/time expenses associated with the image annotation procedure. Accordingly, this paper proposes an innovative method in which the visual features of the image are presented by the intermediate layer features of deep learning, while semantic concepts are represented by mean vectors of positive samples. Firstly, the convolutional result is directly output in the form of low-level visual features through the mid-level of the pre-trained deep learning model, with the image being represented by sparse coding. Secondly, the positive mean vector method is used to construct visual feature vectors for each text vocabulary item, so that a visual feature vector database is created. Finally, the visual feature vector similarity between the testing image and all text vocabulary is calculated, and the vocabulary with the largest similarity used for annotation. Experiments on the datasets demonstrate the effectiveness of the proposed method; in terms of F1 score, the proposed method’s performance on the Corel5k dataset and IAPR TC-12 dataset is superior to that of MBRM, JEC-AF, JEC-DF, and 2PKNN with end-to-end deep features.

...read moreread less

57 citations

Journal Article•DOI•

OpenProt 2021: deeper functional annotation of the coding potential of eukaryotic genomes

[...]

Marie A. Brunet¹, Marie A. Brunet², Jean-François Lucier², Maxime Levesque², Sebastien Leblanc², Sebastien Leblanc¹, Jean-François Jacques¹, Jean-François Jacques², Hassan R H Al-Saedi², Noé Guilloy², Noé Guilloy¹, Frédéric Grenier², Mariano Avino², Isabelle Fournier³, Michel Salzet³, Aïda Ouangraoua², Michelle S. Scott², François-Michel Boisvert², Xavier Roucou¹, Xavier Roucou² - Show less +16 more•Institutions (3)

Laval University¹, Université de Sherbrooke², Lille University of Science and Technology³

08 Jan 2021-Nucleic Acids Research

TL;DR: This update presents the major improvements since the initial release of OpenProt, which enables a more comprehensive annotation of eukaryotic genomes and fosters functional proteomic discoveries.

...read moreread less

Abstract: OpenProt (www.openprot.org) is the first proteogenomic resource supporting a polycistronic annotation model for eukaryotic genomes. It provides a deeper annotation of open reading frames (ORFs) while mining experimental data for supporting evidence using cutting-edge algorithms. This update presents the major improvements since the initial release of OpenProt. All species support recent NCBI RefSeq and Ensembl annotations, with changes in annotations being reported in OpenProt. Using the 131 ribosome profiling datasets re-analysed by OpenProt to date, non-AUG initiation starts are reported alongside a confidence score of the initiating codon. From the 177 mass spectrometry datasets re-analysed by OpenProt to date, the unicity of the detected peptides is controlled at each implementation. Furthermore, to guide the users, detectability statistics and protein relationships (isoforms) are now reported for each protein. Finally, to foster access to deeper ORF annotation independently of one's bioinformatics skills or computational resources, OpenProt now offers a data analysis platform. Users can submit their dataset for analysis and receive the results from the analysis by OpenProt. All data on OpenProt are freely available and downloadable for each species, the release-based format ensuring a continuous access to the data. Thus, OpenProt enables a more comprehensive annotation of eukaryotic genomes and fosters functional proteomic discoveries.

...read moreread less

57 citations

Journal Article•DOI•

The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset: Collection, Insights and Improvements

[...]

Lukas Stappen¹, Alice Baird¹, Lea Schumann¹, Schuller Bjorn²•Institutions (2)

University of Augsburg¹, Imperial College London²

14 Jul 2021-IEEE Transactions on Affective Computing

TL;DR: The MuSe-CaR dataset as discussed by the authors is a large-scale multimodal dataset for sentiment and emotion research, which includes audio-visual and language modalities, and has been used as a testbed for the 1st Multimodal Sentiment Analysis Challenge.

...read moreread less

Abstract: Truly real-life data presents a strong, but exciting challenge for sentiment and emotion research. The high variety of possible ‘in-the-wild’ properties makes large datasets such as these indispensable with respect to building robust machine learning models. A sufficient quantity of data covering a deep variety in the challenges of each modality to force the exploratory analysis of the interplay of all modalities has not yet been made available in this context. In this contribution, we present MuSe-CaR, a first of its kind multimodal dataset. The data is publicly available as it recently served as the testing bed for the 1st Multimodal Sentiment Analysis Challenge, and focused on the tasks of emotion, emotion-target engagement, and trustworthiness recognition by means of comprehensively integrating the audio-visual and language modalities. Furthermore, we give a thorough overview of the dataset in terms of collection and annotation, including annotation tiers not used in this year's MuSe 2020. In addition, for one of the sub-challenges -- predicting the level of trustworthiness -- no participant outperformed the baseline model, and so we propose a simple, but highly efficient Multi-Head-Attention network that exceeds using multimodal fusion the baseline by around 0.2 CCC (almost 50 % improvement).

...read moreread less

32 citations

Journal Article•DOI•

Toward Automated Factchecking: Developing an Annotation Schema and Benchmark for Consistent Automated Claim Detection

[...]

Lev Konstantinovskiy, Oliver Price¹, Mevan Babakar, Arkaitz Zubiaga²•Institutions (2)

University of Warwick¹, Queen Mary University of London²

15 Apr 2021

TL;DR: The authors developed an annotation schema and a benchmark for automated claim detection that is more consistent across time, topics, and annotators than are previous approaches, achieving an F1 score of 0.83 with over 5% relative improvement over the state-of-the-art methods ClaimBuster and ClaimRank.

...read moreread less

Abstract: In an effort to assist factcheckers in the process of factchecking, we tackle the claim detection task, one of the necessary stages prior to determining the veracity of a claim. It consists of identifying the set of sentences, out of a long text, deemed capable of being factchecked. This article is a collaborative work between Full Fact, an independent factchecking charity, and academic partners. Leveraging the expertise of professional factcheckers, we develop an annotation schema and a benchmark for automated claim detection that is more consistent across time, topics, and annotators than are previous approaches. Our annotation schema has been used to crowdsource the annotation of a dataset with sentences from UK political TV shows. We introduce an approach based on universal sentence representations to perform the classification, achieving an F1 score of 0.83, with over 5% relative improvement over the state-of-the-art methods ClaimBuster and ClaimRank. The system was deployed in production and received positive user feedback.

...read moreread less

30 citations

Proceedings Article•DOI•

A Step Toward More Inclusive People Annotations for Fairness

[...]

Candice Schumann¹, Susanna Ricco¹, Utsav Prabhu¹, Vittorio Ferrari¹, Caroline Pantofaru¹ - Show less +1 more•Institutions (1)

Google¹

21 Jul 2021

TL;DR: More Inclusive Annotations for People (MIAP) as discussed by the authors is a subset of the Open Images dataset, containing bounding boxes and attributes for all of the people visible in those images.

...read moreread less

Abstract: The Open Images Dataset contains approximately 9 million images and is a widely accepted dataset for computer vision research. As is common practice for large datasets, the annotations are not exhaustive, with bounding boxes and attribute labels for only a subset of the classes in each image. In this paper, we present a new set of annotations on a subset of the Open Images dataset called the MIAP (More Inclusive Annotations for People) subset, containing bounding boxes and attributes for all of the people visible in those images. The attributes and labeling methodology for the MIAP subset were designed to enable research into model fairness. In addition, we analyze the original annotation methodology for the person class and its subclasses, discussing the resulting patterns in order to inform future annotation efforts. By considering both the original and exhaustive annotation sets, researchers can also now study how systematic patterns in training annotations affect modeling.

...read moreread less

26 citations

Journal Article•DOI•

Deep-LIFT: Deep Label-Specific Feature Learning for Image Annotation.

[...]

Junbing Li¹, Changqing Zhang¹, Joey Tianyi Zhou², Huazhu Fu, Shuyin Xia³, Qinghua Hu¹ - Show less +2 more•Institutions (3)

Tianjin University¹, Agency for Science, Technology and Research², Chongqing University of Posts and Telecommunications³

10 Feb 2021-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: Zhang et al. as mentioned in this paper proposed a deep label-specific feature (Deep-LIFT) learning model to build the explicit and exact correspondence between the label and the local visual region, which improves the effectiveness of feature learning and enhances the interpretability of the model itself.

...read moreread less

Abstract: Image annotation aims to jointly predict multiple tags for an image. Although significant progress has been achieved, existing approaches usually overlook aligning specific labels and their corresponding regions due to the weak supervised information (i.e., ``bag of labels'' for regions), thus failing to explicitly exploit the discrimination from different classes. In this article, we propose the deep label-specific feature (Deep-LIFT) learning model to build the explicit and exact correspondence between the label and the local visual region, which improves the effectiveness of feature learning and enhances the interpretability of the model itself. Deep-LIFT extracts features for each label by aligning each label and its region. Specifically, Deep-LIFTs are achieved through learning multiple correlation maps between image convolutional features and label embeddings. Moreover, we construct two variant graph convolutional networks (GCNs) to further capture the interdependency among labels. Empirical studies on benchmark datasets validate that the proposed model achieves superior performance on multilabel classification over other existing state-of-the-art methods.

...read moreread less

Journal Article•DOI•

SAGES consensus recommendations on an annotation framework for surgical video

[...]

Ozanan R. Meireles¹, Guy Rosman², Guy Rosman¹, Maria S. Altieri³, Lawrence Carin⁴, Gregory D. Hager⁵, Amin Madani⁶, Nicolas Padoy⁷, Carla M. Pugh⁸, Patricia Sylla⁹, Thomas M. Ward¹, Daniel A. Hashimoto¹ - Show less +8 more•Institutions (9)

Harvard University¹, Massachusetts Institute of Technology², East Carolina University³, Duke University⁴, Johns Hopkins University⁵, University Health Network⁶, University of Strasbourg⁷, Stanford University⁸, Mount Sinai Hospital⁹

06 Jul 2021-Surgical Endoscopy and Other Interventional Techniques

TL;DR: In this article, four working groups were formed from a pool of participants that included clinicians, engineers, and data scientists, focused on four themes: (1) temporal models, actions and tasks, tissue characteristics and general anatomy, and software and data structure.

...read moreread less

Abstract: The growing interest in analysis of surgical video through machine learning has led to increased research efforts; however, common methods of annotating video data are lacking. There is a need to establish recommendations on the annotation of surgical video data to enable assessment of algorithms and multi-institutional collaboration. Four working groups were formed from a pool of participants that included clinicians, engineers, and data scientists. The working groups were focused on four themes: (1) temporal models, (2) actions and tasks, (3) tissue characteristics and general anatomy, and (4) software and data structure. A modified Delphi process was utilized to create a consensus survey based on suggested recommendations from each of the working groups. After three Delphi rounds, consensus was reached on recommendations for annotation within each of these domains. A hierarchy for annotation of temporal events in surgery was established. While additional work remains to achieve accepted standards for video annotation in surgery, the consensus recommendations on a general framework for annotation presented here lay the foundation for standardization. This type of framework is critical to enabling diverse datasets, performance benchmarks, and collaboration.

...read moreread less

Posted Content•

Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations.

[...]

Aida Mostafazadeh Davani¹, Mark Díaz², Vinodkumar Prabhakaran²•Institutions (2)

University of Southern California¹, Google²

12 Oct 2021-arXiv: Computation and Language

TL;DR: The authors proposed a multi-task based approach to resolve annotator disagreements and derive single ground truth labels from multiple annotations, where predicting each annotators' judgements is treated as separate subtasks, while sharing a common learned representation of the task.

...read moreread less

Abstract: Majority voting and averaging are common approaches employed to resolve annotator disagreements and derive single ground truth labels from multiple annotations. However, annotators may systematically disagree with one another, often reflecting their individual biases and values, especially in the case of subjective tasks such as detecting affect, aggression, and hate speech. Annotator disagreements may capture important nuances in such tasks that are often ignored while aggregating annotations to a single ground truth. In order to address this, we investigate the efficacy of multi-annotator models. In particular, our multi-task based approach treats predicting each annotators' judgements as separate subtasks, while sharing a common learned representation of the task. We show that this approach yields same or better performance than aggregating labels in the data prior to training across seven different binary classification tasks. Our approach also provides a way to estimate uncertainty in predictions, which we demonstrate better correlate with annotation disagreements than traditional methods. Being able to model uncertainty is especially useful in deployment scenarios where knowing when not to make a prediction is important.

...read moreread less

Proceedings Article•DOI•

A Step Toward More Inclusive People Annotations for Fairness

[...]

Candice Schumann¹, Susanna Ricco¹, Utsav Prabhu¹, Vittorio Ferrari¹, Caroline Pantofaru¹ - Show less +1 more•Institutions (1)

Google¹

05 May 2021-arXiv: Computer Vision and Pattern Recognition

...read moreread less

Journal Article•DOI•

EXACT: a collaboration toolset for algorithm-aided annotation of images with annotation version control

[...]

Christian Marzahl¹, Marc Aubreville¹, Christof A. Bertram², Jennifer Maier¹, Christian Bergler¹, Christine Kröger, Jörn Voigt, Katharina Breininger¹, Robert Klopfleisch², Andreas Maier¹ - Show less +6 more•Institutions (2)

University of Erlangen-Nuremberg¹, Free University of Berlin²

23 Feb 2021-Scientific Reports

TL;DR: Exact as discussed by the authors is an open-source online platform that enables the collaborative interdisciplinary analysis of images from different domains online and offline, including medical whole-slide images and image series with thousands of images.

...read moreread less

Abstract: In many research areas, scientific progress is accelerated by multidisciplinary access to image data and their interdisciplinary annotation. However, keeping track of these annotations to ensure a high-quality multi-purpose data set is a challenging and labour intensive task. We developed the open-source online platform EXACT (EXpert Algorithm Collaboration Tool) that enables the collaborative interdisciplinary analysis of images from different domains online and offline. EXACT supports multi-gigapixel medical whole slide images as well as image series with thousands of images. The software utilises a flexible plugin system that can be adapted to diverse applications such as counting mitotic figures with a screening mode, finding false annotations on a novel validation view, or using the latest deep learning image analysis technologies. This is combined with a version control system which makes it possible to keep track of changes in the data sets and, for example, to link the results of deep learning experiments to specific data set versions. EXACT is freely available and has already been successfully applied to a broad range of annotation tasks, including highly diverse applications like deep learning supported cytology scoring, interdisciplinary multi-centre whole slide image tumour annotation, and highly specialised whale sound spectroscopy clustering.

...read moreread less

Journal Article•DOI•

Weakly Supervised Person Re-ID: Differentiable Graphical Learning and a New Benchmark

[...]

Guangrun Wang¹, Guangcong Wang¹, Xujie Zhang¹, Jianhuang Lai¹, Zhengtao Yu², Liang Lin¹ - Show less +2 more•Institutions (2)

Sun Yat-sen University¹, Kunming University of Science and Technology²

01 May 2021-IEEE Transactions on Neural Networks

TL;DR: Wang et al. as mentioned in this paper proposed weakly supervised ReID (SYSU-30k), which groups the images into bags in terms of time and assigns a bag-level label for each bag.

...read moreread less

Abstract: Person reidentification (Re-ID) benefits greatly from the accurate annotations of existing data sets (e.g., CUHK03 and Market-1501), which are quite expensive because each image in these data sets has to be assigned with a proper label. In this work, we ease the annotation of Re-ID by replacing the accurate annotation with inaccurate annotation, i.e., we group the images into bags in terms of time and assign a bag-level label for each bag. This greatly reduces the annotation effort and leads to the creation of a large-scale Re-ID benchmark called SYSU- $30k$ . The new benchmark contains 30k individuals, which is about 20 times larger than CUHK03 (1.3k individuals) and Market-1501 (1.5k individuals), and 30 times larger than ImageNet (1k categories). It sums up to 29606918 images. Learning a Re-ID model with bag-level annotation is called the weakly supervised Re-ID problem. To solve this problem, we introduce a differentiable graphical model to capture the dependencies from all images in a bag and generate a reliable pseudolabel for each person’s image. The pseudolabel is further used to supervise the learning of the Re-ID model. Compared with the fully supervised Re-ID models, our method achieves state-of-the-art performance on SYSU- $30k$ and other data sets. The code, data set, and pretrained model will be available at https://github.com/wanggrun/SYSU-30k .

...read moreread less

Journal Article•DOI•

Mantis: flexible and consensus-driven genome annotation

[...]

Pedro Queirós¹, Francesco Delogu¹, Oskar Hickl¹, Patrick May¹, Paul Wilmes¹ - Show less +1 more•Institutions (1)

University of Luxembourg¹

02 Jun 2021-GigaScience

TL;DR: Mantis as mentioned in this paper is a protein annotation tool that uses database identifiers intersection and text mining to integrate knowledge from multiple reference data sources into a single consensus-driven output, allowing for customization of reference data and execution parameters, and is reproducible across different research goals and user environments.

...read moreread less

Abstract: BACKGROUND The rapid development of the (meta-)omics fields has produced an unprecedented amount of high-resolution and high-fidelity data Through the use of these datasets we can infer the role of previously functionally unannotated proteins from single organisms and consortia In this context, protein function annotation can be described as the identification of regions of interest (ie, domains) in protein sequences and the assignment of biological functions Despite the existence of numerous tools, challenges remain in terms of speed, flexibility, and reproducibility In the big data era, it is also increasingly important to cease limiting our findings to a single reference, coalescing knowledge from different data sources, and thus overcoming some limitations in overly relying on computationally generated data from single sources RESULTS We implemented a protein annotation tool, Mantis, which uses database identifiers intersection and text mining to integrate knowledge from multiple reference data sources into a single consensus-driven output Mantis is flexible, allowing for the customization of reference data and execution parameters, and is reproducible across different research goals and user environments We implemented a depth-first search algorithm for domain-specific annotation, which significantly improved annotation performance compared to sequence-wide annotation The parallelized implementation of Mantis results in short runtimes while also outputting high coverage and high-quality protein function annotations CONCLUSIONS Mantis is a protein function annotation tool that produces high-quality consensus-driven protein annotations It is easy to set up, customize, and use, scaling from single genomes to large metagenomes Mantis is available under the MIT license at https://githubcom/PedroMTQ/mantis

...read moreread less

Proceedings Article•DOI•

Efficient video annotation with visual interpolation and frame selection guidance

[...]

Alina Kuznetsova¹, Aakrati Talati¹, Yiwen Luo¹, Keith Simmons¹, Vittorio Ferrari¹ - Show less +1 more•Institutions (1)

Google¹

01 Jan 2021

TL;DR: In this article, a unified framework for generic video annotation with bounding boxes is proposed, which has both interpolating and extrapolating capabilities and a guiding mechanism that sequentially generates suggestions for what frame to annotate next, based on the annotations made previously.

...read moreread less

Abstract: We introduce a unified framework for generic video annotation with bounding boxes. Video annotation is a longstanding problem, as it is a tedious and time-consuming process. We tackle two important challenges of video annotation: (1) automatic temporal interpolation and extrapolation of bounding boxes provided by a human annotator on a subset of all frames, and (2) automatic selection of frames to annotate manually. Our contribution is two-fold: first, we propose a model that has both interpolating and extrapolating capabilities; second, we propose a guiding mechanism that sequentially generates suggestions for what frame to annotate next, based on the annotations made previously. We extensively evaluate our approach on several challenging datasets in simulation and demonstrate a reduction in terms of the number of manual bounding boxes drawn by 60% over linear interpolation and by 35% over an off-the-shelf tracker. Moreover, we also show 10% annotation time improvement over a state-of-the-art method for video annotation with bounding boxes [25]. Finally, we run human annotation experiments and provide extensive analysis of the results, showing that our approach reduces actual measured annotation time by 50% compared to commonly used linear interpolation.

...read moreread less

Journal Article•DOI•

A Semi-supervised Learning Approach Based on Adaptive Weighted Fusion for Automatic Image Annotation

[...]

Zhixin Li¹, Lan Lin¹, Canlong Zhang¹, Huifang Ma², Weizhong Zhao³, Zhiping Shi⁴ - Show less +2 more•Institutions (4)

Guangxi Normal University¹, Northwest Normal University², Central China Normal University³, Capital Normal University⁴

16 Apr 2021-ACM Transactions on Multimedia Computing, Communications, and Applications

TL;DR: Zhang et al. as discussed by the authors proposed a semi-supervised approach based on adaptive weighted fusion for automatic image annotation that can simultaneously utilize the labeled data and unlabeled data to improve the annotation performance.

...read moreread less

Abstract: To learn a well-performed image annotation model, a large number of labeled samples are usually required. Although the unlabeled samples are readily available and abundant, it is a difficult task for humans to annotate large numbers of images manually. In this article, we propose a novel semi-supervised approach based on adaptive weighted fusion for automatic image annotation that can simultaneously utilize the labeled data and unlabeled data to improve the annotation performance. At first, two different classifiers, constructed based on support vector machine and covolutional neural network, respectively, are trained by different features extracted from the labeled data. Therefore, these two classifiers are independently represented as different feature views. Then, the corresponding features of unlabeled images are extracted and input into these two classifiers, and the semantic annotation of images can be obtained respectively. At the same time, the confidence of corresponding image annotation can be measured by an adaptive weighted fusion strategy. After that, the images and its semantic annotations with high confidence are submitted to the classifiers for retraining until a certain stop condition is reached. As a result, we can obtain a strong classifier that can make full use of unlabeled data. Finally, we conduct experiments on four datasets, namely, Corel 5K, IAPR TC12, ESP Game, and NUS-WIDE. In addition, we measure the performance of our approach with standard criteria, including precision, recall, F-measure, N+, and mAP. The experimental results show that our approach has superior performance and outperforms many state-of-the-art approaches.

...read moreread less

Journal Article•DOI•

Mouse Genome Informatics (MGI): latest news from MGD and GXD.

[...]

Martin Ringwald, Joel E. Richardson, Richard M. Baldarelli, Judith A. Blake, James A. Kadin, Cynthia L. Smith, Carol J. Bult - Show less +3 more

26 Oct 2021-Mammalian Genome

TL;DR: The Mouse Genome Informatics (MGI) database system combines multiple expertly curated community data resources into a shared knowledge management ecosystem united by common metadata annotation standards as mentioned in this paper.

...read moreread less

Abstract: The Mouse Genome Informatics (MGI) database system combines multiple expertly curated community data resources into a shared knowledge management ecosystem united by common metadata annotation standards. MGI’s mission is to facilitate the use of the mouse as an experimental model for understanding the genetic and genomic basis of human health and disease. MGI is the authoritative source for mouse gene, allele, and strain nomenclature and is the primary source of mouse phenotype annotations, functional annotations, developmental gene expression information, and annotations of mouse models with human diseases. MGI maintains mouse anatomy and phenotype ontologies and contributes to the development of the Gene Ontology and Disease Ontology and uses these ontologies as standard terminologies for annotation. The Mouse Genome Database (MGD) and the Gene Expression Database (GXD) are MGI’s two major knowledgebases. Here, we highlight some of the recent changes and enhancements to MGD and GXD that have been implemented in response to changing needs of the biomedical research community and to improve the efficiency of expert curation. MGI can be accessed freely at http://www.informatics.jax.org .

...read moreread less

Journal Article•DOI•

Grapevine and Wine Metabolomics-Based Guidelines for FAIR Data and Metadata Management.

[...]

Stefania Savoi¹, Panagiotis Arapitsas², Eric Duchêne³, Maria Nikolantonaki⁴, Ignacio Ontañón⁵, Silvia Carlin², Florian Schwander, Régis D. Gougeon⁴, António César Silva Ferreira⁶, Georgios Theodoridis⁷, Reinhard Töpfer, Urska Vrhovsek², Anne-Françoise Adam-Blondon⁸, Mario Pezzotti⁹, Fulvio Mattivi² - Show less +11 more•Institutions (9)

University of Montpellier¹, Edmund Mach Foundation², University of Strasbourg³, University of Burgundy⁴, University of Zaragoza⁵, Catholic University of Portugal⁶, Aristotle University of Thessaloniki⁷, Université Paris-Saclay⁸, University of Verona⁹

03 Nov 2021-Metabolites

TL;DR: In this paper, the authors provide a step-by-step guideline for the FAIR data and metadata management specific to grapevine and wine science, including meaningful information on experimental design and phenotyping, sample collection, sample preparation, chemotype analysis, data analysis and metabolite annotation.

...read moreread less

Abstract: In the era of big and omics data, good organization, management, and description of experimental data are crucial for achieving high-quality datasets. This, in turn, is essential for the export of robust results, to publish reliable papers, make data more easily available, and unlock the huge potential of data reuse. Lately, more and more journals now require authors to share data and metadata according to the FAIR (Findable, Accessible, Interoperable, Reusable) principles. This work aims to provide a step-by-step guideline for the FAIR data and metadata management specific to grapevine and wine science. In detail, the guidelines include recommendations for the organization of data and metadata regarding (i) meaningful information on experimental design and phenotyping, (ii) sample collection, (iii) sample preparation, (iv) chemotype analysis, (v) data analysis (vi) metabolite annotation, and (vii) basic ontologies. We hope that these guidelines will be helpful for the grapevine and wine metabolomics community and that it will benefit from the true potential of data usage in creating new knowledge being revealed.

...read moreread less

Journal Article•DOI•

Transposable element subfamily annotation has a reproducibility problem

[...]

Kaitlin M. Carey¹, Gilia Patterson¹, Gilia Patterson², Travis J. Wheeler¹•Institutions (2)

University of Montana¹, University of Oregon²

23 Jan 2021-Mobile Dna

TL;DR: The authors evaluate the reliability of annotation with common subfamilies by assessing the extent to which subfamily annotation is reproducible in replicate copies created by segmental duplications in the human genome, and in homologous copies shared by human and chimpanzee.

...read moreread less

Abstract: Transposable element (TE) sequences are classified into families based on the reconstructed history of replication, and into subfamilies based on more fine-grained features that are often intended to capture family history. We evaluate the reliability of annotation with common subfamilies by assessing the extent to which subfamily annotation is reproducible in replicate copies created by segmental duplications in the human genome, and in homologous copies shared by human and chimpanzee. We find that standard methods annotate over 10% of replicates as belonging to different subfamilies, despite the fact that they are expected to be annotated as belonging to the same subfamily. Point mutations and homologous recombination appear to be responsible for some of this discordant annotation (particularly in the young Alu family), but are unlikely to fully explain the annotation unreliability. The surprisingly high level of disagreement in subfamily annotation of homologous sequences highlights a need for further research into definition of TE subfamilies, methods for representing subfamily annotation confidence of TE instances, and approaches to better utilizing such nuanced annotation data in downstream analysis.

...read moreread less

Proceedings Article•DOI•

EventAnchor: Reducing Human Interactions in Event Annotation of Racket Sports Videos

[...]

Dazhen Deng¹, Jiang Wu¹, Jiachen Wang¹, Yihong Wu¹, Xiao Xie¹, Zheng Zhou¹, Hui Zhang¹, Xiaolong Zhang², Yingcai Wu¹ - Show less +5 more•Institutions (2)

Zhejiang University¹, Penn State College of Information Sciences and Technology²

06 May 2021

TL;DR: EventAnchor as mentioned in this paper uses machine learning models in computer vision to help users acquire essential events from videos (e.g., serve, the ball bouncing on the court) and offers users a set of interactive tools for data annotation.

...read moreread less

Abstract: The popularity of racket sports (e.g., tennis and table tennis) leads to high demands for data analysis, such as notational analysis, on player performance. While sports videos offer many benefits for such analysis, retrieving accurate information from sports videos could be challenging. In this paper, we propose EventAnchor, a data analysis framework to facilitate interactive annotation of racket sports video with the support of computer vision algorithms. Our approach uses machine learning models in computer vision to help users acquire essential events from videos (e.g., serve, the ball bouncing on the court) and offers users a set of interactive tools for data annotation. An evaluation study on a table tennis annotation system built on this framework shows significant improvement of user performances in simple annotation tasks on objects of interest and complex annotation tasks requiring domain knowledge.

...read moreread less

Journal Article•DOI•

CorGAT: a tool for the functional annotation of SARS-CoV-2 genomes.

[...]

Matteo Chiara¹, Federico Zambelli¹, Federico Zambelli², Marco Antonio Tangaro², Pietro Mandreoli¹, Pietro Mandreoli², David S. Horner², David S. Horner¹, Graziano Pesole², Graziano Pesole³ - Show less +6 more•Institutions (3)

University of Milan¹, National Research Council², University of Bari³

01 Apr 2021-Bioinformatics

TL;DR: CorGAT as discussed by the authors is a tool for functional annotation of SARS-CoV-2 genomic variants, which can facilitate the identification of evolutionary patterns in the genome of the SARS co-v-2.

...read moreread less

Abstract: SUMMARY: While over 150 thousand genomic sequences are currently available through dedicated repositories, ad hoc methods for the functional annotation of SARS-CoV-2 genomes do not harness all currently available resources for the annotation of functionally relevant genomic sites. Here we present CorGAT, a novel tool for the functional annotation of SARS-CoV-2 genomic variants. By comparisons with other state of the art methods we demonstrate that, by providing a more comprehensive and rich annotation, our method can facilitate the identification of evolutionary patterns in the genome of SARS-CoV-2. AVAILABILITY: Galaxy: http://corgat.cloud.ba.infn.it/galaxy; software: https://github.com/matteo14c/CorGAT/tree/Revision_V1; docker: https://hub.docker.com/r/laniakeacloud/galaxy_corgat. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

...read moreread less

Journal Article•DOI•

QAUST: Protein Function Prediction Using Structure Similarity, Protein Interaction, and Functional Motifs.

[...]

Fatima Zohra Smaili¹, Shuye Tian², Ambrish Roy³, Meshari Alazmi¹, Stefan T. Arold¹, Srayanta Mukherjee³, P. Scott Hefty⁴, Wei Chen², Xin Gao¹ - Show less +5 more•Institutions (4)

King Abdullah University of Science and Technology¹, University of Science and Technology of China², University of Michigan³, University of Kansas⁴

22 Feb 2021-Genomics, Proteomics & Bioinformatics

TL;DR: In this paper, a method, Quantitative Annotation of Unknown Structure (QAUST), is proposed to infer protein functions, specifically Gene Ontology (GO) terms and Enzyme Commission (EC) numbers.

...read moreread less

Journal Article•DOI•

scMRA: A robust deep learning method to annotate scRNA-seq data with multiple reference datasets.

[...]

Musu Yuan¹, Liang Chen¹, Minghua Deng¹•Institutions (1)

Peking University¹

08 Oct 2021-Bioinformatics

TL;DR: Wang et al. as mentioned in this paper proposed a robust deep learning based single-cell multiple reference annotator (scMRA), which aggregated multiple datasets into a meta-dataset on which annotation is conducted.

...read moreread less

Abstract: Motivation Single-cell RNA-seq (scRNA-seq) has been widely used to resolve cellular heterogeneity. After collecting scRNA-seq data, the natural next step is to integrate the accumulated data to achieve a common ontology of cell types and states. Thus, an effective and efficient cell-type identification method is urgently needed. Meanwhile, high quality reference data remain a necessity for precise annotation. However, such tailored reference data are always lacking in practice. To address this, we aggregated multiple datasets into a meta-dataset on which annotation is conducted. Existing supervised or semi-supervised annotation methods suffer from batch effects caused by different sequencing platforms, the effect of which increases in severity with multiple reference datasets. Results Herein, a robust deep learning based single-cell Multiple Reference Annotator (scMRA) is introduced. In scMRA, a knowledge graph is constructed to represent the characteristics of cell types in different datasets, and a graphic convolutional network (GCN) serves as a discriminator based on this graph. scMRA keeps intra-cell-type closeness and the relative position of cell types across datasets. scMRA is remarkably powerful at transferring knowledge from multiple reference datasets, to the unlabeled target domain, thereby gaining an advantage over other state-of-the-art annotation methods in multi-reference data experiments. Furthermore, scMRA can remove batch effects. To the best of our knowledge, this is the first attempt to use multiple insufficient reference datasets to annotate target data, and it is, comparatively, the best annotation method for multiple scRNA-seq datasets. Availability An implementation of scMRA is available from https://github.com/ddb-qiwang/scMRA-torch. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

Journal Article•DOI•

A semi-supervised 3D object detection method for autonomous driving

[...]

Jiacheng Zhang¹, Conor Morrissey², Huafeng Liu¹, Jianfeng Lu¹•Institutions (2)

Nanjing University of Science and Technology¹, Duke University²

20 Nov 2021-Displays

TL;DR: Li et al. as discussed by the authors adopt the teacher-student framework to generate pseudo-labels from unlabeled training data, and use a label filtering method to improve the pseudo label quality.

...read moreread less

Posted Content•

Multi-Layer Pseudo-Supervision for Histopathology Tissue Semantic Segmentation using Patch-level Classification Labels.

[...]

Chu Han, Jiatai Lin, Jinhai Mai, Yi Wang, Qingling Zhang, Bingchao Zhao, Xin Chen, Xipeng Pan, Zhenwei Shi, Xiaowei Xu, Su Yao, Lixu Yan, Huan Lin, Zeyan Xu, Xiaomei Huang, Guoqiang Han, Changhong Liang, Zaiyi Liu - Show less +14 more

14 Oct 2021-arXiv: Image and Video Processing

TL;DR: Wang et al. as discussed by the authors used only patch-level classification labels to achieve tissue semantic segmentation on histopathology images, finally reducing the annotation efforts, and proposed a two-step model including a classification and a segmentation phases.

...read moreread less

Abstract: Tissue-level semantic segmentation is a vital step in computational pathology. Fully-supervised models have already achieved outstanding performance with dense pixel-level annotations. However, drawing such labels on the giga-pixel whole slide images is extremely expensive and time-consuming. In this paper, we use only patch-level classification labels to achieve tissue semantic segmentation on histopathology images, finally reducing the annotation efforts. We proposed a two-step model including a classification and a segmentation phases. In the classification phase, we proposed a CAM-based model to generate pseudo masks by patch-level labels. In the segmentation phase, we achieved tissue semantic segmentation by our proposed Multi-Layer Pseudo-Supervision. Several technical novelties have been proposed to reduce the information gap between pixel-level and patch-level annotations. As a part of this paper, we introduced a new weakly-supervised semantic segmentation (WSSS) dataset for lung adenocarcinoma (LUAD-HistoSeg). We conducted several experiments to evaluate our proposed model on two datasets. Our proposed model outperforms two state-of-the-art WSSS approaches. Note that we can achieve comparable quantitative and qualitative results with the fully-supervised model, with only around a 2\% gap for MIoU and FwIoU. By comparing with manual labeling, our model can greatly save the annotation time from hours to minutes. The source code is available at: \url{this https URL}.

...read moreread less

Proceedings Article•DOI•

Studying Test Annotation Maintenance in the Wild

[...]

Dong Jae Kim¹, Nikolaos Tsantalis¹, Tse-Hsun Peter Chen¹, Jinqiu Yang¹•Institutions (1)

Concordia University¹

22 May 2021

TL;DR: In this paper, the authors perform a fine-grained empirical study on test annotation changes and present a taxonomy by manually inspecting and classifying a sample of 368 test annotations and documenting the motivations driving these changes.

...read moreread less

Abstract: Since the introduction of annotations in Java 5, the majority of testing frameworks, such as JUnit, TestNG, and Mockito, have adopted annotations in their core design. This adoption affected the testing practices in every step of the test life-cycle, from fixture setup and test execution to fixture teardown. Despite the importance of test annotations, most research on test maintenance has mainly focused on test code quality and test assertions. As a result, there is little empirical evidence on the evolution and maintenance of test annotations. To fill this gap, we perform the first fine-grained empirical study on annotation changes. We developed a tool to mine 82,810 commits and detect 23,936 instances of test annotation changes from 12 open-source Java projects. Our main findings are: (1) Test annotation changes are more frequent than rename and type change refactorings. (2) We recover various migration efforts within the same testing framework or between different frameworks by analyzing common annotation replacement patterns. (3) We create a taxonomy by manually inspecting and classifying a sample of 368 test annotation changes and documenting the motivations driving these changes. Finally, we present a list of actionable implications for developers, researchers, and framework designers.

...read moreread less

Journal Article•DOI•

MultiPhATE2: code for functional annotation and comparison of phage genomes.

[...]

Carol L. Ecale Zhou¹, Jeffrey A. Kimbrel¹, Robert Edwards, Katelyn McNair², Brian A. Souza¹, Stephanie Malfatti¹ - Show less +2 more•Institutions (2)

Lawrence Livermore National Laboratory¹, San Diego State University²

07 May 2021-G3: Genes, Genomes, Genetics

TL;DR: MultiPhATE2 as mentioned in this paper performs gene finding using multiple algorithms, compares the results of the algorithms, performs functional annotation of coding sequences, and incorporates additional search algorithms and databases to extend the search space of the original code.

...read moreread less

Abstract: To address a need for improved tools for annotation and comparative genomics of bacteriophage genomes, we developed multiPhATE2. As an extension of multiPhATE, a functional annotation code released previously, multiPhATE2 performs gene finding using multiple algorithms, compares the results of the algorithms, performs functional annotation of coding sequences, and incorporates additional search algorithms and databases to extend the search space of the original code. MultiPhATE2 performs gene matching among sets of closely related bacteriophage genomes, and uses multiprocessing to speed computations. MultiPhATE2 can be re-started at multiple points within the workflow to allow the user to examine intermediate results and adjust the subsequent computations accordingly. In addition, multiPhATE2 accommodates custom gene calls and sequence databases, again adding flexibility. MultiPhATE2 was implemented in Python 3.7 and runs as a command-line code under Linux or MAC operating systems. Full documentation is provided as a README file and a Wiki website.

...read moreread less

Collapse