Associating the visual representation of user interfaces with their internal structures and metadata

doi:10.1145/2047196.2047228

Home
/
Papers
/
Associating the visual representation of user interfaces with their internal structures and metadata

Proceedings Article•DOI•

Associating the visual representation of user interfaces with their internal structures and metadata

Tsung-Hsiang Chang¹, Tom Yeh², Robert C. Miller¹•Institutions (2)

Massachusetts Institute of Technology¹, University of Maryland, College Park²

16 Oct 2011-pp 245-256

TL;DR: This paper presents a hybrid framework, PAX, which associates the visual representation of user interfaces and their internal hierarchical metadata, i.e. the pixels, with the content, role, and value of GUI widgets.

read less

Abstract: Pixel-based methods are emerging as a new and promising way to develop new interaction techniques on top of existing user interfaces. However, in order to maintain platform independence, other available low-level information about GUI widgets, such as accessibility metadata, was neglected intentionally. In this paper, we present a hybrid framework, PAX, which associates the visual representation of user interfaces (i.e. the pixels) and their internal hierarchical metadata (i.e. the content, role, and value). We identify challenges to building such a framework. We also develop and evaluate two new algorithms for detecting text at arbitrary places on the screen, and for segmenting a text image into individual word blobs. Finally, we validate our framework in implementations of three applications. We enhance an existing pixel-based system, Sikuli Script, and preserve the readability of its script code at the same time. Further, we create two novel applications, Screen Search and Screen Copy, to demonstrate how PAX can be applied to development of desktop-level interactive systems.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Chi 2004

[...]

Marisa Campbell¹•Institutions (1)

Association for Computing Machinery¹

01 Jan 2004-Interactions

TL;DR: The fourth Vienna International Film Festival was held in Austria from 24-29 April 2004 with a total attendance of more than 100,000 people.

...read moreread less

Abstract: April 24-29, 2004
Vienna, Austria

...read moreread less

227 citations

Proceedings Article•DOI•

Reverse Engineering Mobile Application User Interfaces with REMAUI (T)

[...]

Tuan Anh Nguyen¹, Christoph Csallner¹•Institutions (1)

University of Texas at Arlington¹

09 Nov 2015

TL;DR: The first technique to automatically Reverse Engineer Mobile Application User Interfaces (REMAUI) is introduced, which identifies user interface elements such as images, texts, containers, and lists, via computer vision and optical character recognition (OCR) techniques.

...read moreread less

Abstract: When developing the user interface code of a mobile application, in practice a big gap exists between the digital conceptual drawings of graphic artists and working user interface code. Currently, programmers bridge this gap manually, by reimplementing the conceptual drawings in code, which is cumbersome and expensive. To bridge this gap, we introduce the first technique to automatically Reverse Engineer Mobile Application User Interfaces (REMAUI). On a given input bitmap REMAUI identifies user interface elements such as images, texts, containers, and lists, via computer vision and optical character recognition (OCR) techniques. In our experiments on 488 screenshots of over 100 popular third-party Android and iOS applications, REMAUI-generated user interfaces were similar to the originals, both pixel-by-pixel and in terms of their runtime user interface hierarchies. REMAUI's average overall runtime on a standard desktop computer was 9 seconds.

...read moreread less

173 citations

Journal Article•DOI•

Machine Learning-Based Prototyping of Graphical User Interfaces for Mobile Apps

[...]

Kevin Moran¹, Carlos Bernal-Cardenas¹, Michael Curcio¹, Richard Bonett¹, Denys Poshyvanyk¹ - Show less +1 more•Institutions (1)

College of William & Mary¹

01 Feb 2020-IEEE Transactions on Software Engineering

TL;DR: In this paper, the authors present an approach that enables accurate prototyping of graphical user interface (GUI) via three tasks: detection, classification, and assembly, where logical components of a GUI are detected from a mock-up artifact using either computer vision techniques or mockup metadata.

...read moreread less

Abstract: It is common practice for developers of user-facing software to transform a mock-up of a graphical user interface (GUI) into code. This process takes place both at an application's inception and in an evolutionary context as GUI changes keep pace with evolving features. Unfortunately, this practice is challenging and time-consuming. In this paper, we present an approach that automates this process by enabling accurate prototyping of GUIs via three tasks: detection , classification , and assembly . First, logical components of a GUI are detected from a mock-up artifact using either computer vision techniques or mock-up metadata. Then, software repository mining, automated dynamic analysis, and deep convolutional neural networks are utilized to accurately classify GUI-components into domain-specific types (e.g., toggle-button). Finally, a data-driven, K-nearest-neighbors algorithm generates a suitable hierarchical GUI structure from which a prototype application can be automatically assembled . We implemented this approach for Android in a system called ReDraw . Our evaluation illustrates that ReDraw achieves an average GUI-component classification accuracy of 91 percent and assembles prototype applications that closely mirror target mock-ups in terms of visual affinity while exhibiting reasonable code structure. Interviews with industrial practitioners illustrate ReDraw's potential to improve real development workflows.

...read moreread less

141 citations

Proceedings Article•DOI•

Patina: dynamic heatmaps for visualizing application usage

[...]

Justin Matejka¹, Tovi Grossman¹, George Fitzmaurice¹•Institutions (1)

Autodesk¹

27 Apr 2013

TL;DR: The Patina system is presented, an application independent system for collecting and visualizing software application usage data that requires no instrumentation of the target application and all data is collected through standard window metrics and accessibility APIs.

...read moreread less

Abstract: We present Patina, an application independent system for collecting and visualizing software application usage data. Patina requires no instrumentation of the target application, all data is collected through standard window metrics and accessibility APIs. The primary visualization is a dynamic heatmap overlay which adapts to match the content, location, and shape of the user interface controls visible in the active application. We discuss a set of design guidelines for the Patina system, describe our implementation of the system, and report on an initial evaluation based on a short-term deployment of the system.

...read moreread less

69 citations

Cites methods from "Associating the visual representati..."

...Projects such as Sikuli [7, 8], Prefab [10, 11], and Hurst et al.’s automatic target identification system [15], are all impressive demonstrations of how vision can be used to interpret interface layout and usage and could be used in concert with Patina to recognize UI widgets without sufficient accessibility information present....
[...]
...[15], the PAX framework [7], and Façades [24] combine image techniques with accessibility data collected from the publicly exposed accessibility APIs....
[...]
...Projects such as Sikuli [7, 8], Prefab [10, 11], and Hurst et al....
[...]
...Prefab [10, 11] and Sikuli [8] use a vision based approach to locate user interface elements based on their appearance....
[...]

Proceedings Article•DOI•

Waken: reverse engineering usage information and interface structure from software videos

[...]

Nikola Banovic¹, Tovi Grossman², Justin Matejka², George Fitzmaurice²•Institutions (2)

University of Toronto¹, Autodesk²

07 Oct 2012

TL;DR: The Waken Video Player is presented, which allows users to directly interact with UI components that are displayed in the video and showcase the design opportunities that are introduced by having this additional meta-data.

...read moreread less

Abstract: We present Waken, an application-independent system that recognizes UI components and activities from screen captured videos, without any prior knowledge of that application. Waken can identify the cursors, icons, menus, and tooltips that an application contains, and when those items are used. Waken uses frame differencing to identify occurrences of behaviors that are common across graphical user interfaces. Candidate templates are built, and then other occurrences of those templates are identified using a multi-phase algorithm. An evaluation demonstrates that the system can successfully reconstruct many aspects of a UI without any prior application-dependant knowledge. To showcase the design opportunities that are introduced by having this additional meta-data, we present the Waken Video Player, which allows users to directly interact with UI components that are displayed in the video.

...read moreread less

66 citations

Cites methods from "Associating the visual representati..."

...To minimize the time required for collecting training data, past research [3, 5, 18, 21] explored abstracting identification of different GUI elements and decoupling GUI element representation from predefined image templates....
[...]
...[3] proposed an accessibility and pixel-based framework, which also allowed for detecting text and arbitrary word blobs in user interfaces....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Sikuli: using GUI screenshots for search and automation

[...]

Tom Yeh¹, Tsung-Hsiang Chang¹, Robert C. Miller¹•Institutions (1)

Massachusetts Institute of Technology¹

04 Oct 2009

TL;DR: Sikuli allows users to take a screenshot of a GUI element and query a help system using the screenshot instead of the element's name, and provides a visual scripting API for automating GUI interactions, using screenshot patterns to direct mouse and keyboard events.

...read moreread less

Abstract: We present Sikuli, a visual approach to search and automation of graphical user interfaces using screenshots. Sikuli allows users to take a screenshot of a GUI element (such as a toolbar button, icon, or dialog box) and query a help system using the screenshot instead of the element's name. Sikuli also provides a visual scripting API for automating GUI interactions, using screenshot patterns to direct mouse and keyboard events. We report a web-based user study showing that searching by screenshot is easy to learn and faster to specify than keywords. We also demonstrate several automation tasks suitable for visual scripting, such as map navigation and bus tracking, and show how visual scripting can improve interactive help systems previously proposed in the literature.

...read moreread less

358 citations

Sikuli: Using GUI screenshots for search and automation

[...]

Tom Yeh¹, Tsung-Hsiang Chang¹, Robert C. Miller¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Oct 2009

TL;DR: Sikuli as discussed by the authors allows users to take a screenshot of a GUI element (such as a toolbar button, icon, or dialog box) and query a help system using the screenshot instead of the element's name.

...read moreread less

273 citations

Journal Article•DOI•

Chi 2004

[...]

Marisa Campbell¹•Institutions (1)

Association for Computing Machinery¹

01 Jan 2004-Interactions

TL;DR: The fourth Vienna International Film Festival was held in Austria from 24-29 April 2004 with a total attendance of more than 100,000 people.

...read moreread less

Abstract: April 24-29, 2004
Vienna, Austria

...read moreread less

227 citations

Proceedings Article•DOI•

WinCuts: manipulating arbitrary window regions for more effective use of screen space

[...]

Desney S. Tan¹, Brian R. Meyers¹, Mary Czerwinski¹•Institutions (1)

Microsoft¹

24 Apr 2004

TL;DR: A novel interaction technique that allows users to replicate arbitrary regions of existing windows into independent windows called WinCuts, where each WinCut is a live view of a region of the source window with which users can interact.

...read moreread less

Abstract: Each window on our computer desktop provides a view into some information. Although users can currently manipulate multiple windows, we assert that being able to spatially arrange smaller regions of these windows could help users perform certain tasks more efficiently. In this paper, we describe a novel interaction technique that allows users to replicate arbitrary regions of existing windows into independent windows called WinCuts. Each WinCut is a live view of a region of the source window with which users can interact. We also present an extension that allows users to share WinCuts across multiple devices. Next, we classify the set of tasks for which WinCuts may be useful, both in single as well as multiple device scenarios. We present high level implementation details so that other researchers can replicate this work. And finally, we discuss future work that we will pursue in extending these ideas.

...read moreread less

215 citations

"Associating the visual representati..." refers background in this paper

...WinCuts allowed users to cut a sub-region of an existing window and create an independent live view of the source, but did not interpret its content [12]....
[...]

Proceedings Article•DOI•

GUI testing using computer vision

[...]

Tsung-Hsiang Chang¹, Tom Yeh², Robert C. Miller¹•Institutions (2)

Massachusetts Institute of Technology¹, University of Maryland, College Park²

10 Apr 2010

TL;DR: This paper presents a new approach to GUI testing using computer vision for testers to automate their tasks and shows how this approach can facilitate good testing practices such as unit testing, regression testing, and test-driven development.

...read moreread less

Abstract: Testing a GUI's visual behavior typically requires human testers to interact with the GUI and to observe whether the expected results of interaction are presented. This paper presents a new approach to GUI testing using computer vision for testers to automate their tasks. Testers can write a visual test script that uses images to specify which GUI components to interact with and what visual feedback to be observed. Testers can also generate visual test scripts by demonstration. By recording both input events and screen images, it is possible to extract the images of components interacted with and the visual feedback seen by the demonstrator, and generate a visual test script automatically. We show that a variety of GUI behavior can be tested using this approach. Also, we show how this approach can facilitate good testing practices such as unit testing, regression testing, and test-driven development.

...read moreread less

214 citations