scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Quality of Deaf and Hard-of-Hearing Mobile Apps: Evaluation Using the Mobile App Rating Scale (MARS) With Additional Criteria From a Content Expert

30 Oct 2019-Jmir mhealth and uhealth (JMIR Mhealth Uhealth)-Vol. 7, Iss: 10
TL;DR: Evaluation of population-specific mHealth apps can benefit from content-specific measurement criteria developed by a content expert in the field, and a clear distinction in Mobile App Rating Scale (MARS) scores among apps within the study’s three app categories is indicated.
Abstract: Background: The spread of technology and dissemination of knowledge across the World Wide Web has prompted the development of apps for American Sign Language (ASL) translation, interpretation, and syntax recognition. There is limited literature regarding the quality, effectiveness, and appropriateness of mobile health (mHealth) apps for the deaf and hard-of-hearing (DHOH) that pose to aid the DHOH in their everyday communication and activities. Other than the star-rating system with minimal comments regarding quality, the evaluation metrics used to rate mobile apps are commonly subjective. Objective: This study aimed to evaluate the quality and effectiveness of DHOH apps using a standardized scale. In addition, it also aimed to identify content-specific criteria to improve the evaluation process by using a content expert, and to use the content expert to more accurately evaluate apps and features supporting the DHOH. Methods: A list of potential apps for evaluation was generated after a preliminary screening for apps related to the DHOH. Inclusion and exclusion criteria were developed to refine the master list of apps. The study modified a standardized rating scale with additional content-specific criteria applicable to the DHOH population for app evaluation. This was accomplished by including a DHOH content expert in the design of content-specific criteria. Results: The results indicate a clear distinction in Mobile App Rating Scale scores among apps within the study’s 3 app categories: ASL translators (highest score=3.72), speech-to-text (highest score=3.6), and hard-of-hearing assistants (highest score=3.90). Of the 217 apps obtained from the search criteria, 21 apps met the inclusion and exclusion criteria. Furthermore, the limited consideration for measures specific to the target population along with a high app turnover rate suggests opportunities for improved app effectiveness and evaluation. Conclusions: As more mHealth apps enter the market for the DHOH population, more criteria-based evaluation is needed to ensure the safety and appropriateness of the apps for the intended users. Evaluation of population-specific mHealth apps can benefit from content-specific measurement criteria developed by a content expert in the field.

Content maybe subject to copyright    Report

Citations
More filters
01 Jan 2017
TL;DR: In this paper, the authors introduce several probabilistic techniques to classify app reviews into four types: bug reports, feature requests, user experiences, and text ratings, using review metadata such as the star rating and the tense, as well as, text classification, natural language processing, and sentiment analysis techniques.
Abstract: App stores like Google Play and Apple AppStore have over 3 million apps covering nearly every kind of software and service. Billions of users regularly download, use, and review these apps. Recent studies have shown that reviews written by the users represent a rich source of information for the app vendors and the developers, as they include information about bugs, ideas for new features, or documentation of released features. The majority of the reviews, however, is rather non-informative just praising the app and repeating to the star ratings in words. This paper introduces several probabilistic techniques to classify app reviews into four types: bug reports, feature requests, user experiences, and text ratings. For this, we use review metadata such as the star rating and the tense, as well as, text classification, natural language processing, and sentiment analysis techniques. We conducted a series of experiments to compare the accuracy of the techniques and compared them with simple string matching. We found that metadata alone results in a poor classification accuracy. When combined with simple text classification and natural language preprocessing of the text--particularly with bigrams and lemmatization--the classification precision for all review types got up to 88---92 % and the recall up to 90---99 %. Multiple binary classifiers outperformed single multiclass classifiers. Our results inspired the design of a review analytics tool, which should help app vendors and developers deal with the large amount of reviews, filter critical reviews, and assign them to the appropriate stakeholders. We describe the tool main features and summarize nine interviews with practitioners on how review analytics tools including ours could be used in practice.

96 citations

Journal ArticleDOI
TL;DR: The quality of the available mobile applications related to interventions for low back pain generally has good overall quality, especially in terms of functionality and aesthetics, but engagement and information should be improved in most of the apps.
Abstract: Digital health interventions may improve different behaviours. However, the rapid proliferation of technological solutions often does not allow for a correct assessment of the quality of the tools. This study aims to review and assess the quality of the available mobile applications (apps) related to interventions for low back pain. Two reviewers search the official stores of Android (Play Store) and iOS (App Store) for localisation in Spain and the United Kingdom, in September 2019, searching for apps related to interventions for low back pain. Seventeen apps finally are included. The quality of the apps is measured using the Mobile App Rating Scale (MARS). The scores of each section and the final score of the apps are retrieved and the mean and standard deviation obtained. The average quality ranges between 2.83 and 4.57 (mean 3.82) on a scale from 1 (inadequate) to 5 (excellent). The best scores are found in functionality (4.7), followed by aesthetic content (mean 4.1). Information (2.93) and engagement (3.58) are the worst rated items. Apps generally have good overall quality, especially in terms of functionality and aesthetics. Engagement and information should be improved in most of the apps. Moreover, scientific evidence is necessary to support the use of applied health tools.

23 citations

Journal ArticleDOI
TL;DR: In this paper, a rapid review aims to identify current methodologies in the literature to assess the quality of mHealth apps, understand what aspects of quality these methodologies address, determine what input has been made by authors from low and middle-income countries (LMICs), and examine the applicability of such methods in LMICs.
Abstract: Background: In recent years, there has been rapid growth in the availability and use of mobile health (mHealth) apps around the world. A consensus regarding an accepted standard to assess the quality of such apps has yet to be reached. A factor that exacerbates the challenge of mHealth app quality assessment is variations in the interpretation of quality and its subdimensions. Consequently, it has become increasingly difficult for health care professionals worldwide to distinguish apps of high quality from those of lower quality. This exposes both patients and health care professionals to unnecessary risks. Despite progress, limited understanding of the contributions of researchers in low- and middle-income countries (LMICs) exists on this topic. Furthermore, the applicability of quality assessment methodologies in LMIC settings remains relatively unexplored. Objective: This rapid review aims to identify current methodologies in the literature to assess the quality of mHealth apps, understand what aspects of quality these methodologies address, determine what input has been made by authors from LMICs, and examine the applicability of such methodologies in LMICs. Methods: This review was registered with PROSPERO (International Prospective Register of Systematic Reviews). A search of PubMed, EMBASE, Web of Science, and Scopus was performed for papers related to mHealth app quality assessment methodologies, which were published in English between 2005 and 2020. By taking a rapid review approach, a thematic and descriptive analysis of the papers was performed. Results: Electronic database searches identified 841 papers. After the screening process, 52 papers remained for inclusion. Of the 52 papers, 5 (10%) proposed novel methodologies that could be used to evaluate mHealth apps of diverse medical areas of interest, 8 (15%) proposed methodologies that could be used to assess apps concerned with a specific medical focus, and 39 (75%) used methodologies developed by other published authors to evaluate the quality of various groups of mHealth apps. The authors in 6% (3/52) of papers were solely affiliated to institutes in LMICs. A further 15% (8/52) of papers had at least one coauthor affiliated to an institute in an LMIC. Conclusions: Quality assessment of mHealth apps is complex in nature and at times subjective. Despite growing research on this topic, to date, an all-encompassing appropriate means for evaluating the quality of mHealth apps does not exist. There has been engagement with authors affiliated to institutes across LMICs; however, limited consideration of current generic methodologies for application in LMIC settings has been identified. Trial Registration: PROSPERO CRD42020205149; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=205149

15 citations

Journal ArticleDOI
TL;DR: In this article, the authors conducted a user-centered design analysis of popular consumer apps with scientific backing utilizing the well-validated Mobile Application Rating Scale (MARS) to evaluate the objective and subjective quality of apps that are successful across both research and consumer sectors.
Abstract: Background: There is a robust market for mobile health (mHealth) apps focused on self-guided interventions to address a high prevalence of mental health disorders and behavioral health needs in the general population. Disseminating mental health interventions via mHealth technologies may help overcome barriers in access to care and has broad consumer appeal. However, development and testing of mental health apps in formal research settings are limited and far outpaced by everyday consumer use. In addition to prioritizing efficacy and effectiveness testing, researchers should examine and test app design elements that impact the user experience, increase engagement, and lead to sustained use over time. Objective: The aim of this study was to evaluate the objective and subjective quality of apps that are successful across both research and consumer sectors, and the relationships between objective app quality, subjective user ratings, and evidence-based behavior change techniques. This will help inform user-centered design considerations for mHealth researchers to maximize design elements and features associated with consumer appeal, engagement, and sustainability. Methods: We conducted a user-centered design analysis of popular consumer apps with scientific backing utilizing the well-validated Mobile Application Rating Scale (MARS). Popular consumer apps with research support were identified via a systematic search of the App Store iOS (Apple Inc) and Google Play (Google LLC) and literature review. We evaluated the quality metrics of 19 mental health apps along 4 MARS subscales, namely, Engagement, Functionality, Aesthetics, and Information Quality. MARS total and subscale scores range from 1 to 5, with higher scores representing better quality. We then extracted user ratings from app download platforms and coded apps for evidence-based treatment components. We calculated Pearson correlation coefficients to identify associations between MARS scores, App Store iOS/Google Play consumer ratings, and number of evidence-based treatment components. Results: The mean MARS score was 3.52 (SD 0.71), consumer rating was 4.22 (SD 0.54), and number of evidence-based treatment components was 2.32 (SD 1.42). Consumer ratings were significantly correlated with the MARS Functionality subscale (r=0.74, P<.001), Aesthetics subscale (r=0.70, P<.01), and total score (r=0.58, P=.01). Number of evidence-based intervention components was not associated with MARS scores (r=0.085, P=.73) or consumer ratings (r=–0.329, P=.16). Conclusions: In our analysis of popular research-supported consumer apps, objective app quality and subjective consumer ratings were generally high. App functionality and aesthetics were highly consistent with consumer appeal, whereas evidence-based components were not. In addition to designing treatments that work, we recommend that researchers prioritize aspects of app design that impact the user experience for engagement and sustainability (eg, ease of use, navigation, visual appeal). This will help translate evidence-based interventions to the competitive consumer app market, thus bridging the gap between research development and real-world implementation.

12 citations

Journal ArticleDOI
TL;DR: There is a need to involve expert healthcare professionals in the development of mental health apps and for healthcare providers to empower patients through discussing apps that are useful and discern them from those that can potentially cause harm.

11 citations

References
More filters
Journal ArticleDOI
TL;DR: In this article, the authors present guidelines for choosing among six different forms of the intraclass correlation for reliability studies in which n target are rated by k judges, and the confidence intervals for each of the forms are reviewed.
Abstract: Reliability coefficients often take the form of intraclass correlation coefficients. In this article, guidelines are given for choosing among six different forms of the intraclass correlation for reliability studies in which n target are rated by k judges. Relevant to the choice of the coefficient are the appropriate statistical model for the reliability and the application to be made of the reliability results. Confidence intervals for each of the forms are reviewed.

21,185 citations

Journal ArticleDOI
TL;DR: It is found that both methods of computing the scale-level index (S-CVI) are being used by nurse researchers, although it was not always possible to infer the calculation method.
Abstract: Scale developers often provide evidence of content validity by computing a content validity index (CVI), using ratings of item relevance by content experts. We analyzed how nurse researchers have defined and calculated the CVI, and found considerable consistency for item-level CVIs (I-CVIs). However, there are two alternative, but unacknowledged, methods of computing the scale-level index (S-CVI). One method requires universal agreement among experts, but a less conservative method averages the item-level CVIs. Using backward inference with a purposive sample of scale development studies, we found that both methods are being used by nurse researchers, although it was not always possible to infer the calculation method. The two approaches can lead to different values, making it risky to draw conclusions about content validity. Scale developers should indicate which method was used to provide readers with interpretable content validity information.

3,554 citations

Journal ArticleDOI
TL;DR: The MARS is a simple, objective, and reliable tool for classifying and assessing the quality of mobile health apps and can also be used to provide a checklist for the design and development of new high quality health apps.
Abstract: Background: The use of mobile apps for health and well being promotion has grown exponentially in recent years. Yet, there is currently no app-quality assessment tool beyond “star”-ratings. Objective: The objective of this study was to develop a reliable, multidimensional measure for trialling, classifying, and rating the quality of mobile health apps. Methods: A literature search was conducted to identify articles containing explicit Web or app quality rating criteria published between January 2000 and January 2013. Existing criteria for the assessment of app quality were categorized by an expert panel to develop the new Mobile App Rating Scale (MARS) subscales, items, descriptors, and anchors. There were sixty well being apps that were randomly selected using an iTunes search for MARS rating. There were ten that were used to pilot the rating procedure, and the remaining 50 provided data on interrater reliability. Results: There were 372 explicit criteria for assessing Web or app quality that were extracted from 25 published papers, conference proceedings, and Internet resources. There were five broad categories of criteria that were identified including four objective quality scales: engagement, functionality, aesthetics, and information quality; and one subjective quality scale; which were refined into the 23-item MARS. The MARS demonstrated excellent internal consistency (alpha = .90) and interrater reliability intraclass correlation coefficient (ICC = .79). Conclusions: The MARS is a simple, objective, and reliable tool for classifying and assessing the quality of mobile health apps. It can also be used to provide a checklist for the design and development of new high quality health apps.

1,293 citations

Journal ArticleDOI
TL;DR: In this paper, the authors study the existing applications for mobile devices exclusively dedicated to the eight most prevalent health conditions by the latest update (2004) of the Global Burden of Disease of the World Health Organization (WHO): iron-deficiency anemia, hearing loss, migraine, low vision, asthma, diabetes mellitus, osteoarthritis (OA), and unipolar depressive disorders.
Abstract: Background: New possibilities for mHealth have arisen by means of the latest advances in mobile communications and technologies. With more than 1 billion smartphones and 100 million tablets around the world, these devices can be a valuable tool in health care management. Every aid for health care is welcome and necessary as shown by the more than 50 million estimated deaths caused by illnesses or health conditions in 2008. Some of these conditions have additional importance depending on their prevalence. Objective: To study the existing applications for mobile devices exclusively dedicated to the eight most prevalent health conditions by the latest update (2004) of the Global Burden of Disease (GBD) of the World Health Organization (WHO): iron-deficiency anemia, hearing loss, migraine, low vision, asthma, diabetes mellitus, osteoarthritis (OA), and unipolar depressive disorders. Methods: Two reviews have been carried out. The first one is a review of mobile applications in published articles retrieved from the following systems: IEEE Xplore, Scopus, ScienceDirect, Web of Knowledge, and PubMed. The second review is carried out by searching the most important commercial app stores: Google play, iTunes, BlackBerry World, Windows Phone Apps+Games, and Nokia's Ovi store. Finally, two applications for each condition, one for each review, were selected for an in-depth analysis. Results: Search queries up to April 2013 located 247 papers and more than 3673 apps related to the most prevalent conditions. The conditions in descending order by the number of applications found in literature are diabetes, asthma, depression, hearing loss, low vision, OA, anemia, and migraine. However when ordered by the number of commercial apps found, the list is diabetes, depression, migraine, asthma, low vision, hearing loss, OA, and anemia. Excluding OA from the former list, the four most prevalent conditions have fewer apps and research than the final four. Several results are extracted from the in-depth analysis: most of the apps are designed for monitoring, assisting, or informing about the condition. Typically an Internet connection is not required, and most of the apps are aimed for the general public and for nonclinical use. The preferred type of data visualization is text followed by charts and pictures. Assistive and monitoring apps are shown to be frequently used, whereas informative and educational apps are only occasionally used. Conclusions: Distribution of work on mobile applications is not equal for the eight most prevalent conditions. Whereas some conditions such as diabetes and depression have an overwhelming number of apps and research, there is a lack of apps related to other conditions, such as anemia, hearing loss, or low vision, which must be filled. [J Med Internet Res 2013;15(6):e120]

459 citations

Journal ArticleDOI
TL;DR: Sign language use in the U.S. has been studied extensively in the literature as mentioned in this paper, with a focus on two demographic research categories: (1) ASL as a language of national origin and (2) deafness.
Abstract: IN THE UNITED STATES, home language use surveys are now commonplace. The decennial census has included inquiries about home language use within immigrant households since 1890 and within all U.S. homes since 1970 (see U.S. Census Bureau 20023, hereafter cited as Measuring America). Public schools, originally to comply with the Bilingual Education Act of 1968, authorized in Title VII, Part A, of the Elementary and secondary Education Act, routinely collect home language use data for each student enrolled. The number of languages used in homes in the United States, as identified by the various federal and state surveys, is quite large. However, American Sign Language (ASL) is not on the list of non-English languages used in the home, and no state in the union counts its users in either the general or the school population. Conspicuous by its absence in U.S. language census data is an estimate of how many people use American Sign Language in the United States. We have found that California records sign language use in the home when children enter school (e.g., California Department of Education 2004); the Annual Survey of Deaf and Hard of Hearing Children and Youth (hereafter cited as Annual Survey) collects data on sign language use by family members with their deaf or hard of hearing children (e.g., see Mitchell and Karchmer 2005). However, there is no systematic and routine collection of data on sign language or ASL use in the general population. Given that estimates of the number of people who use ASL are relatively easy to find in research and practitioner publications, as well as scattered across the Internet, and range from 100,000 to 15,000,000, we decided to track down their sources. In this review of the literature on the prevalence of ASL use in the United States, we identify a number of misunderstandings. To make sense of them, we focus on two documents in particular: first, a statement presented during the U.S. Senate hearings for the Bilingual Courts Act of 1974 about how sign language use ranks in comparison to other non-English languages in the United States (Beale 1974) and, second, the findings from the National Census of the Deaf Population (NCDP; see Schein and DeIkJr. 1974). This in-depth review clarifies the meaning of the original statement for the Bilingual Courts Act of 1974 hearings and provides a more justifiable estimate of the number of signers. This number does not necessarily include all ASL users, based upon the NCDP, which is the only research study from which data-based estimates may be derived. Before we consider these earlier works, however, we offer some background on the problems of obtaining accurate (let alone current) estimates of how many people use ASL in the United States from large-scale, ongoing national data collection efforts. These include the decennial census of the U.S. population and its companion projects, the Current Population Survey (CPS) and the American Comsmunity Survey (ACS), as well as surveys commissioned by other federal agencies, in particular, the National Health Survey (NHS) and the Survey of Income and Program Participation (SIPP). Demography of Language and Deafness We focus on two demographic research categories: (1) ASL as a language of national origin and (2) deafness. For more than a century, the federal government has mandated national census counts, or censusbased survey estimates, of non-English language use in the U.S. population. Also, originally as an activity of the U.S. Bureau of the Census and then, after a delay of several decades, a U.S. Public Health Service responsibility, there have been regular estimates of the prevalence of deafness and other disabilities in the country. In this section we review some of the specifics of these two demographic categories-language and deafness-and suggest that these distinct projects require a unified perspective before ASL use is likely to be included as part of the demographic description of the U. …

264 citations