scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Associating the visual representation of user interfaces with their internal structures and metadata

TL;DR: This paper presents a hybrid framework, PAX, which associates the visual representation of user interfaces and their internal hierarchical metadata, i.e. the pixels, with the content, role, and value of GUI widgets.
Abstract: Pixel-based methods are emerging as a new and promising way to develop new interaction techniques on top of existing user interfaces. However, in order to maintain platform independence, other available low-level information about GUI widgets, such as accessibility metadata, was neglected intentionally. In this paper, we present a hybrid framework, PAX, which associates the visual representation of user interfaces (i.e. the pixels) and their internal hierarchical metadata (i.e. the content, role, and value). We identify challenges to building such a framework. We also develop and evaluate two new algorithms for detecting text at arbitrary places on the screen, and for segmenting a text image into individual word blobs. Finally, we validate our framework in implementations of three applications. We enhance an existing pixel-based system, Sikuli Script, and preserve the readability of its script code at the same time. Further, we create two novel applications, Screen Search and Screen Copy, to demonstrate how PAX can be applied to development of desktop-level interactive systems.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The fourth Vienna International Film Festival was held in Austria from 24-29 April 2004 with a total attendance of more than 100,000 people.
Abstract: April 24-29, 2004
Vienna, Austria

227 citations

Proceedings ArticleDOI
09 Nov 2015
TL;DR: The first technique to automatically Reverse Engineer Mobile Application User Interfaces (REMAUI) is introduced, which identifies user interface elements such as images, texts, containers, and lists, via computer vision and optical character recognition (OCR) techniques.
Abstract: When developing the user interface code of a mobile application, in practice a big gap exists between the digital conceptual drawings of graphic artists and working user interface code. Currently, programmers bridge this gap manually, by reimplementing the conceptual drawings in code, which is cumbersome and expensive. To bridge this gap, we introduce the first technique to automatically Reverse Engineer Mobile Application User Interfaces (REMAUI). On a given input bitmap REMAUI identifies user interface elements such as images, texts, containers, and lists, via computer vision and optical character recognition (OCR) techniques. In our experiments on 488 screenshots of over 100 popular third-party Android and iOS applications, REMAUI-generated user interfaces were similar to the originals, both pixel-by-pixel and in terms of their runtime user interface hierarchies. REMAUI's average overall runtime on a standard desktop computer was 9 seconds.

173 citations

Journal ArticleDOI
TL;DR: In this paper, the authors present an approach that enables accurate prototyping of graphical user interface (GUI) via three tasks: detection, classification, and assembly, where logical components of a GUI are detected from a mock-up artifact using either computer vision techniques or mockup metadata.
Abstract: It is common practice for developers of user-facing software to transform a mock-up of a graphical user interface (GUI) into code. This process takes place both at an application's inception and in an evolutionary context as GUI changes keep pace with evolving features. Unfortunately, this practice is challenging and time-consuming. In this paper, we present an approach that automates this process by enabling accurate prototyping of GUIs via three tasks: detection , classification , and assembly . First, logical components of a GUI are detected from a mock-up artifact using either computer vision techniques or mock-up metadata. Then, software repository mining, automated dynamic analysis, and deep convolutional neural networks are utilized to accurately classify GUI-components into domain-specific types (e.g., toggle-button). Finally, a data-driven, K-nearest-neighbors algorithm generates a suitable hierarchical GUI structure from which a prototype application can be automatically assembled . We implemented this approach for Android in a system called ReDraw . Our evaluation illustrates that ReDraw achieves an average GUI-component classification accuracy of 91 percent and assembles prototype applications that closely mirror target mock-ups in terms of visual affinity while exhibiting reasonable code structure. Interviews with industrial practitioners illustrate ReDraw's potential to improve real development workflows.

141 citations

Proceedings ArticleDOI
27 Apr 2013
TL;DR: The Patina system is presented, an application independent system for collecting and visualizing software application usage data that requires no instrumentation of the target application and all data is collected through standard window metrics and accessibility APIs.
Abstract: We present Patina, an application independent system for collecting and visualizing software application usage data. Patina requires no instrumentation of the target application, all data is collected through standard window metrics and accessibility APIs. The primary visualization is a dynamic heatmap overlay which adapts to match the content, location, and shape of the user interface controls visible in the active application. We discuss a set of design guidelines for the Patina system, describe our implementation of the system, and report on an initial evaluation based on a short-term deployment of the system.

69 citations


Cites methods from "Associating the visual representati..."

  • ...Projects such as Sikuli [7, 8], Prefab [10, 11], and Hurst et al.’s automatic target identification system [15], are all impressive demonstrations of how vision can be used to interpret interface layout and usage and could be used in concert with Patina to recognize UI widgets without sufficient accessibility information present....

    [...]

  • ...[15], the PAX framework [7], and Façades [24] combine image techniques with accessibility data collected from the publicly exposed accessibility APIs....

    [...]

  • ...Projects such as Sikuli [7, 8], Prefab [10, 11], and Hurst et al....

    [...]

  • ...Prefab [10, 11] and Sikuli [8] use a vision based approach to locate user interface elements based on their appearance....

    [...]

Proceedings ArticleDOI
07 Oct 2012
TL;DR: The Waken Video Player is presented, which allows users to directly interact with UI components that are displayed in the video and showcase the design opportunities that are introduced by having this additional meta-data.
Abstract: We present Waken, an application-independent system that recognizes UI components and activities from screen captured videos, without any prior knowledge of that application. Waken can identify the cursors, icons, menus, and tooltips that an application contains, and when those items are used. Waken uses frame differencing to identify occurrences of behaviors that are common across graphical user interfaces. Candidate templates are built, and then other occurrences of those templates are identified using a multi-phase algorithm. An evaluation demonstrates that the system can successfully reconstruct many aspects of a UI without any prior application-dependant knowledge. To showcase the design opportunities that are introduced by having this additional meta-data, we present the Waken Video Player, which allows users to directly interact with UI components that are displayed in the video.

66 citations


Cites methods from "Associating the visual representati..."

  • ...To minimize the time required for collecting training data, past research [3, 5, 18, 21] explored abstracting identification of different GUI elements and decoupling GUI element representation from predefined image templates....

    [...]

  • ...[3] proposed an accessibility and pixel-based framework, which also allowed for detecting text and arbitrary word blobs in user interfaces....

    [...]

References
More filters
Proceedings ArticleDOI
04 Oct 2009
TL;DR: Sikuli allows users to take a screenshot of a GUI element and query a help system using the screenshot instead of the element's name, and provides a visual scripting API for automating GUI interactions, using screenshot patterns to direct mouse and keyboard events.
Abstract: We present Sikuli, a visual approach to search and automation of graphical user interfaces using screenshots. Sikuli allows users to take a screenshot of a GUI element (such as a toolbar button, icon, or dialog box) and query a help system using the screenshot instead of the element's name. Sikuli also provides a visual scripting API for automating GUI interactions, using screenshot patterns to direct mouse and keyboard events. We report a web-based user study showing that searching by screenshot is easy to learn and faster to specify than keywords. We also demonstrate several automation tasks suitable for visual scripting, such as map navigation and bus tracking, and show how visual scripting can improve interactive help systems previously proposed in the literature.

358 citations

01 Oct 2009
TL;DR: Sikuli as discussed by the authors allows users to take a screenshot of a GUI element (such as a toolbar button, icon, or dialog box) and query a help system using the screenshot instead of the element's name.
Abstract: We present Sikuli, a visual approach to search and automation of graphical user interfaces using screenshots. Sikuli allows users to take a screenshot of a GUI element (such as a toolbar button, icon, or dialog box) and query a help system using the screenshot instead of the element's name. Sikuli also provides a visual scripting API for automating GUI interactions, using screenshot patterns to direct mouse and keyboard events. We report a web-based user study showing that searching by screenshot is easy to learn and faster to specify than keywords. We also demonstrate several automation tasks suitable for visual scripting, such as map navigation and bus tracking, and show how visual scripting can improve interactive help systems previously proposed in the literature.

273 citations

Journal ArticleDOI
TL;DR: The fourth Vienna International Film Festival was held in Austria from 24-29 April 2004 with a total attendance of more than 100,000 people.
Abstract: April 24-29, 2004
Vienna, Austria

227 citations

Proceedings ArticleDOI
24 Apr 2004
TL;DR: A novel interaction technique that allows users to replicate arbitrary regions of existing windows into independent windows called WinCuts, where each WinCut is a live view of a region of the source window with which users can interact.
Abstract: Each window on our computer desktop provides a view into some information. Although users can currently manipulate multiple windows, we assert that being able to spatially arrange smaller regions of these windows could help users perform certain tasks more efficiently. In this paper, we describe a novel interaction technique that allows users to replicate arbitrary regions of existing windows into independent windows called WinCuts. Each WinCut is a live view of a region of the source window with which users can interact. We also present an extension that allows users to share WinCuts across multiple devices. Next, we classify the set of tasks for which WinCuts may be useful, both in single as well as multiple device scenarios. We present high level implementation details so that other researchers can replicate this work. And finally, we discuss future work that we will pursue in extending these ideas.

215 citations


"Associating the visual representati..." refers background in this paper

  • ...WinCuts allowed users to cut a sub-region of an existing window and create an independent live view of the source, but did not interpret its content [12]....

    [...]

Proceedings ArticleDOI
10 Apr 2010
TL;DR: This paper presents a new approach to GUI testing using computer vision for testers to automate their tasks and shows how this approach can facilitate good testing practices such as unit testing, regression testing, and test-driven development.
Abstract: Testing a GUI's visual behavior typically requires human testers to interact with the GUI and to observe whether the expected results of interaction are presented. This paper presents a new approach to GUI testing using computer vision for testers to automate their tasks. Testers can write a visual test script that uses images to specify which GUI components to interact with and what visual feedback to be observed. Testers can also generate visual test scripts by demonstration. By recording both input events and screen images, it is possible to extract the images of components interacted with and the visual feedback seen by the demonstrator, and generate a visual test script automatically. We show that a variety of GUI behavior can be tested using this approach. Also, we show how this approach can facilitate good testing practices such as unit testing, regression testing, and test-driven development.

214 citations

Trending Questions (1)
How to display system information on your desktop?

Further, we create two novel applications, Screen Search and Screen Copy, to demonstrate how PAX can be applied to development of desktop-level interactive systems.