scispace - formally typeset
Open AccessProceedings ArticleDOI

Towards comprehensive and collaborative forensics on email evidence

TLDR
This paper presents a systematic process for email forensics which it integrates into the normal forensic analysis workflow, and which accommodates the distinct characteristics of email evidence.
Abstract
The digital forensics community has neglected email forensics as a process, despite the fact that email remains an important tool in the commission of crime. At present, there exists little support for discovering, acquiring, and analyzing web-based email, despite its widespread use. In this paper we present a systematic process for email forensics which we integrate into the normal forensic analysis workflow, and which accommodates the distinct characteristics of email evidence. Our process focuses on detecting the presence of non-obvious artifacts related to email accounts, retrieving the data from the service provider, and representing email in a well-structured format based on existing standards. As a result, developers and organizations can collaboratively create and use analysis tools that can analyze email evidence from any source in the same fashion and the examiner can access additional data relevant to their forensic cases.

read more

Content maybe subject to copyright    Report

Towards Comprehensive and Collaborative
Forensics on Email Evidence
Justin Paglierani, Mike Mabey and Gail-Joon Ahn
Arizona State University
{jpaglier,mmabey,gahn}@asu.edu
Abstract—The digital forensics community has neglected email
forensics as a process, despite the fact that email remains an
important tool in the commission of crime. At present, there
exists little support for discovering, acquiring, and analyzing web-
based email, despite its widespread use. In this paper we present
a systematic process for email forensics which we integrate into
the normal forensic analysis workflow, and which accommodates
the distinct characteristics of email evidence. Our process focuses
on detecting the presence of non-obvious artifacts related to
email accounts, retrieving the data from the service provider,
and representing email in a well-structured format based on
existing standards. As a result, developers and organizations can
collaboratively create and use analysis tools that can analyze
email evidence from any source in the same fashion and the
examiner can access additional data relevant to their forensic
cases.
Index Terms—Email, forensics, collaboration.
I. INTRODUCTION
The recent investigation of a senior U.S. intelligence official
reaffirmed the importance of email forensics [
1
]. The inves-
tigation relied on the simple inspection of the drafts folder
of a shared email account, but if the suspects had taken the
time to make their correspondence more clandestine, a more
sophisticated approach would have been necessary to discover
relevant email evidence. Current methodologies do not address
the possible intricacies introduced when an investigation centers
around the analysis of various email sources and, furthermore,
simple inspection does not aid an examiner in detecting the
presence of email which is not locally stored [2, pg. 471].
Consider a scenario in which a suspected computer criminal
has communicated with many parties about the nature and
means of their actions using various communication methods,
including locally stored emails and webmail accounts. When
an examiner seizes a suspect’s hard drive, only the locally
stored or cached email would be directly available and
the webmail accounts would remain undiscovered without
substantial manual effort. These missing portions of data could
lead to an incomplete investigation report with respect to
the suspected act. Even if evidence resides locally in diverse
formats, it is likely that an examiner would need separate tools
and methods to analyze each of them.
In addition, an emerging notion called the Internet of
Things [
3
] (IoT) presents an environment in which any number
This work was partially supported by the grant from National Science
Foundation. All correspondences should be addressed to: Dr. Gail-Joon Ahn
at gahn@asu.edu.
of “things, or devices, connect to each other through the
Internet. Such a model clearly presents an even greater
challenge for attempting manual analysis of device data or
behavior as part of an investigation.
We aim to provide a systematic and forensically sound
methodology for discovering, extracting, and analyzing evi-
dence from email which resides through emerging technologies.
Our approach has several unique properties to support collab-
orative email forensics. First, we propose a process-driven
email forensics approach comprised of several sub-tasks for
collaboratively analyzing email evidence. Since each examiner
has different capabilities, this process-driven workflow can help
them conduct forensics tasks more efficiently and effectively
by reducing the number of backlogged cases and allowing
for the sharing of work in a collaborative manner. Second,
we attempt to build pluggable modules so that our framework
can be realized as a service to multiple examiners without
asking them to alter their forensic environment or tools. Third,
multiple examiners and tools have their own regulatory report
and proprietary data format that have been critical barriers to
them collaborating with each other. Hence, we introduce an
extended XML-based format to represent email evidence with a
uniform and interoperable evidence container for collaborative
email forensics. We hope that with the aid of our approach,
the digital forensics community can begin establishing best
practice standards to acquire, process, authenticate, analyze,
and present this distinct family of evidence.
A. A Note on Legality
We emphasize that our approach requires special considera-
tion to laws regarding the search and seizure of evidence. In
many territories, it may be necessary to secure a subpoena or
warrant before using an approach like ours. However, in the
event that the necessary procedures have been followed and the
service provider remains uncooperative, our method provides
examiners with an alternative means of acquisition for the sake
of prompt response, as discussed by Richard Littlehale
1
in
his testimony before the U.S. House Judiciary Subcommittee
on Crime, Terrorism, Homeland Security & Investigations on
March 19, 2013 [
4
]. We urge practitioners to consult proficient
legal counsel before utilizing the information contained herein.
1
Assistant Special Agent in Charge, Technical Services Unit, Tennessee
Bureau of Investigation.
WK,(((,QWHUQDWLRQDO&RQIHUHQFHRQ&ROODERUDWLYH&RPSXWLQJ1HWZRUNLQJ$SSOLFDWLRQVDQG:RUNVKDULQJ&ROODERUDWH&RP
k,&67
'2,LFVWFROODERUDWHFRP

TABLE I: Predicted daily email traffic in billions from 2012-
2016 as published by the Radicati Group
Year 2012 2013 2014 2015 2016
Total worldwide
144.8 154.6 165.8 178.3 192.2
emails/day (B)
% Change 7%7%8%8%
II. RELATED WORK
In case there was ever any doubt as to how important
email is to private and corporate communication, the Executive
Summary [
5
] of the Radicati Group’s report titled “Email
Market, 2012-2016” states that the total number of worldwide
emails sent each day in 2012 was about 144.8 billion, with
steady growth predicted for years to come as shown in Table I.
Furthermore, the report states that “the installed base of
Corporate Webmail Clients is expected to grow from 629
million in 2012 to over 1 billion by year-end 2016. Clearly
webmail is a significant communication medium.
As more interactions become digitized, ranging from com-
munications to finances, a number of forensic hurdles present
themselves [
6
]. Service-oriented computing presents an interest-
ing challenge, as we no longer see unified bodies of evidence
aggregated within traditional forensic mediums. Significant
research has gone into approaching specific services [
7
], [
8
], but
little work has gone into establishing a best practice approach to
such evidence starting with the initial acquisition of disk-based
evidence
2
.
Best practices have emerged in the forensic representation
of evidence. In particular, XML has become known as a
medium for the creation of well-structured forms of evidence
representation [
9
]. A well-known example of this is the Digital
Forensics XML (DFXML) format, used for representing disk
media as a combination of disk partitions, file systems, and file
metadata in XML [
10
]. These formats facilitate the storage,
authentication and analysis of evidence in various ways.
Improvements in the forensic analysis of email have largely
followed that of big data recent contributions to the field
include statistical and machine learning techniques used to
facilitate stylometric analyses, author attribution, and more
into a cohesive analysis technique [
11
]. While these methods
improve the analytic process of email forensics, there still lacks
a holistic approach.
Similar to disk forensics, email also contains indexable
metadata, in the form of headers, which can be useful to
direct the focus of an analysis. From this metadata alone,
an examiner can detect communication flows and evidence
tampering, among other things. As shown by Banday in [
12
],
the email headers are a valuable source of information in a
forensic investigation involving email.
III. M
ETHODOLOGY/FRAMEWORK
We seek to help reduce the disparity between the analysis
methods available for disk-based evidence and those available
2
By “disk-based evidence” we mean to include all forms of digital evidence
that have more traditionally been part of an investigation, not just hard drives.
for web-based evidence such as email. To that end, our process
for the acquisition and storage of online evidence makes
available the means whereby analysis tools can handle and
analyze such evidence.
In brief, our process consists of discovering online credentials
from acquired evidence, mapping those credentials to their
corresponding services, extracting evidence from each service,
authenticating and processing that evidence into a standardized
representation format, and then performing the actual analysis.
Fig. 1 depicts this flow. We now discuss each part of the
process.
A. Initial Acquisition
As in any investigation, once the examiner has secured the
evidence, the first step is to acquire a “forensic copy, which
for all purposes is an exact duplicate of the original. Forensic
copies serve as a protection for the original evidence since the
examiner works with these instead of the originals, allowing
them to perform analyses without the risk of compromising
the integrity of the evidence.
During the initial acquisition of a hard drive, the data that is
available is mostly limited to the structure of the drive as found
in the Master Boot Record (MBR) or in one of the Volume
Boot Records (VBRs). From these structures, the examiner
can also extract additional information about the file system
for a particular volume, but again this only provides structural
information, such as which sectors on the disk store parts
of a file. At this stage, there is no indication of where any
information related to the suspect’s online activity and accounts
may reside on the disk. For this reason, initial acquisition
requires the additional steps of credential discovery, evidence
mapping, and supplemental acquisition, as we will now describe
in Sections III-B, III-C, and III-D, respectively.
B. Credential Discovery
In our process we use the term “credential” to denote any data
which can identify the owner of the credential (e.g. the suspect)
in some useful way. The breadth of this definition allows for its
application without respect to the format in which the data is
stored or the type of service to which it is mapped. Also, while
other terms indicate a similar idea, such as “footprint” [
13
],
“fingerprint”, and “profile”, none of these convey their purpose
within our process, which is to reestablish a connection with
online services to extract evidence. To this end we define a
phase, “credential discovery”, in which we detect credentials
stored in a piece of evidence which an examiner can use to
further recover additional evidence for the investigation.
The formats of credentials range from simplistic (text files
containing user names and passwords) to complex (session
cookies), and discovering all types of credentials will require
an equally diverse set of approaches. Some possible credential
discovery approaches include:
Brute force:
Given a set of criteria (such as a regular
expression) for what may possibly be credentials, linearly
search the evidence, including any file slack or bad sectors.


Figure 1: The traditional forensic workflow combined with the email approach
This has the disadvantage of being neither intelligent nor
efficient.
Search known locations:
Search for files known to
regularly store credentials, such as key ring databases,
cookie files or databases, registry entries, etc. While more
efficient, this has a much narrower scope and may overlook
legitimate credentials.
Heuristic-based:
Using machine learning or similar tech-
niques, learn through past experiences and feedback from
the examiner what constitutes a usable credential when
investigating the raw data.
Practitioners may develop other approaches, and each
approach may have varying levels of success on different
data sets. As such, it may be necessary to use all available
approaches on each data set, depending on the computational
resources available. In the best case, a large-scale, distributed
system would be utilized with as many approaches as possible
including proprietary internal tools, remotely hosted tools,
and open source tools to discover the largest possible set
of credentials.
C. Evidence Mapping
Following the discovery of credentials, it is necessary to map
them to a source of evidence before performing any further
acquisition. In other words, this mapping determines what
online service the suspect accesses using the credential. De-
pending on the approach taken to discover a set of credentials,
the form in which they were stored, and any accompanying
data stored with the credentials, the difficulty of the mapping
process may vary. Examples include email addresses or cookies
which specify the domain to which they belong, spreadsheets
organized in a manner which makes this information evident
to an examiner, or a text file may store a user name and
password with no indication of the service for which they are
valid. Manual examination may be necessary to complete the
mapping process if this is the case.
D. Supplemental Acquisition
After mapping a credential to a service, the next step
is to acquire a forensically sound copy of the service’s
data. To help justify the use of certain acquisition methods
during this phase, it may be useful to draw parallels between
different types of traditional forensic acquisition and the
circumstances characterizing supplemental acquisition from
online sources. The two types of acquisition are static and
live, which correspond to acquiring data from unchanging or
volatile systems, respectively. When applied to performing
forensics on an email account, a static acquisition is equivalent
to acquiring data from stored Personal Storage Table (PST)
files, frozen accounts, or logs from journaling or Simple Mail
Transfer Protocol (SMTP) servers, whereas a live acquisition
is equivalent to acquisition performed on active email accounts
via Internet Message Access Protocol (IMAP) or using other
methods, running servers (SMTP, journaling, etc.), or webmail
services.
During the acquisition process, examiners must follow
established best practices for any data source from which
they extract evidence this can be best achieved by defining
an engine that utilizes modules which meet the requirements of
the rules of evidence to acquire this data. To ensure the process
is repeatable, we treat this portion of the process as a black-box
engine which follows a set of steps to present the recovered data
in a source-agnostic form so that the next module in the engine
can process the evidence without regard to the source from
which it originates. This engine should reuse the credentials
previously discovered, acquire the most complete representation
of the email (including headers and body), and then store them
as a separate copy in an intermediate format for the purpose of
evidence processing into a format which examiners will later
use. Once these steps have become well defined, automation
becomes trivial and should be implemented as a means to
prove that the process is repeatable and forensically sound.
The nature of online storage requires careful consideration
of data integrity and authentication issues; it is infeasible to
represent the data exactly as stored at the remote location using
our process; however, the focus of this process hinges on the
text-based content of email evidence (including the headers
and the body of the message) and not on the structure of the
data stored on disk. As each email is a discrete, individually
identifiable piece of data, we assert that a checksum of the plain
text content of the original form of an email message is the most
useful check against data integrity. As the acquisition process is
automated, repeatable, and the data yielded is verifiable using
a hash, we present the evidence acquired in this phase as a
forensic copy of the evidence in an intermediate format.
E. Evidence Processing and Authentication
To increase the value of the intermediate representation
mentioned in the previous step, it is necessary to facilitate
a common evidence representation for the acquired data. To
simplify working with this data, the representation should retain
metadata about the data source as well as the data itself, so


as to clearly identify any specific characteristics of the data
that would be important during the reporting process. Using
a common representation also adds the benefit of being able
to develop tools which treat the evidence in a source-agnostic
manner, since the representation abstracts away the differences
between webmail and locally stored emails, simplifying the
development and validation/verification process of forensic
tools.
A well-structured format lends itself to the above goals,
allowing for easy searching, classification, and general use
of the evidence while providing an extra layer of abstraction
from the raw evidence to help maintain the forensic integrity
of the information. When using a structured representation,
an examiner can employ verification techniques (such as
schema verification) to prove the accurate representation of the
evidence. Such a format and verification techniques also lends
to the use of our process in a collaborative environment; for
example, an organization may provide its implementation of an
analysis tool remotely through a Service Oriented Architecture
(SOA) implementation or multiple organizations may create
separate, yet interoperable, tools using a common evidence
representation.
In order to properly authenticate the data after acquisition
and processing, the evidence representation format used should
store the checksum generated during the acquisition phase. By
storing this checksum, examiners will be able to confirm that
the integrity of the data has not changed since it was first
acquired, or if necessary and possible, they can perform a
subsequent acquisition against the online service to check for
changes to the available data or verify the accuracy of the first
acquisition attempt.
F. Analysis
The next step after acquiring, processing, and authenticating
the evidence is to perform forensic analyses that will be
informative for the purposes of the investigation. Since our
methodology only provides a process for the acquisition and
storage of supplemental evidence, the implementation of new
analysis tools is beyond the scope of this work. However,
our methodology reduces the amount of effort required for
analysis of online evidence in two ways. First, our methodology
removes the need to manually acquire supplemental evidence
as part of the examination workflow. The steps of discovering
credentials, mapping them to online services, acquiring data
from the service, and processing the data into a standard format
are all performed automatically, saving time while providing
increased breadth to the incident report.
Second, our methodology specifies that the standardized
data storage format should have a way by which to validate
its structure. Three benefits arise from this requirement:
1) acquired data in a validated format gives tool developers
confidence in the structure and type of the data; 2) developers
do not need to write analysis tools to handle multiple formats,
since it is possible to convert evidence to the format used in
the process, making tools more reusable; 3) a comparison of
the output from multiple tools allows for checking accuracy
3
or for evaluating performance.
With these benefits, our approach provides significant advan-
tages in collaboratively discovering, collecting, and analyzing
evidence stored by online services.
IV. I
MPLEMENTATION DETAILS
To demonstrate our framework, we now give the details
of our plugin-based forensics framework for online evidence,
called PlugsE
4
.
PlugsE is a framework implemented in Python meant to act
as the black box engine mentioned in Section
III-D
consisting
of separate modules to handle each step of the forensic process
and a backbone which integrates them into a seamless tool. It
has been developed with extensibility in mind, where adding a
specific implementation of any step in our process is achieved
by a system administrator manually adding entries to one of
four manifest files which specify to the PlugsE backbone the
name of the module, the type of data (DFXML file, Google
cookies, keyring file, etc.) it handles, as well as how to access
the module from a programmatic standpoint. The access vector
could be, for example, a command-line executable or a service
available via a Remote Procedure Call (RPC) interface such
as REST. PlugsE stores a manifest file for each step in the
forensic process and parses them to create a vector table which
the backbone uses to map the different types of data it is
presented with to a specific implementation of a step. Through
the use of these manifests, each step in the forensic process
can be viewed as a collection of modules which implement
differing approaches to the specific forensic task at hand.
Each module must both accept as input and return as output
JavaScript Object Notation (JSON) representations of the data
being acted upon coupled with logging information (start/end
times, checksums, module name and version), which aids in
providing a common representation of data within the system,
facilitating interoperability of modules written by different
developers, organizations, or even in distinct languages.
This modular approach offers a number of interesting
benefits including that a developer can implement a number
of data flows within our forensic process and a step in the
forensic process may be offloaded to a remote server via
RPC to a module provided by another organization in a SOA
fashion, with the backbone and examiner being oblivious to
the geographical location or implementation details of the web
service. These qualities may benefit a distributed, collaborative
approach to forensics, such as the one laid out in the CUFF
framework [15].
As a proof of concept, we now show how to use PlugsE
and our online evidence acquisition steps from Section III
to retrieve the contents of a Gmail account using cookies
containing session information that is still valid.
3
As discussed in [
14
], validating forensic tools by comparing their output
is important, but requires executing the tools against the same evidence.
4
Available at https://bitbucket.org/jpaglier/plugse


Figure 2: The PlugsE framework
A. Initial Acquisition
In our implementation, we make the assumption that an
examiner has already completed the work of initial acquisition
(as described in Section
III-A
) of a hard drive from a desktop
computer and created a forensic copy. Ideally, this would be
performed using a system such as the one presented in [
15
],
which allows for the analysis of evidence to automatically begin
after acquisition. Also, the modules in our implementation
depend on the DFXML representation of the evidence, so the
examiner (or the tools used by the examiner) must ensure its
creation in this phase.
B. Credential Discovery
With a forensic copy of the target device accessible, it is now
possible to begin searching for credentials. By creating a PlugsE
discovery module, Henson
5
, that searches for browser cookies
utilizing a
Search known locations
approach, we easily
discover the cookies for the Chrome browser on a Windows
machine at
%USERPROFILE%\AppData\Local\Google\
Chrome\User Data\Default\Cookies
. While other
browsers’ cookies are also in known locations, this file is
the focus of our proof of concept.
We adopt a straightforward approach to searching for the
existence of a known path. It takes as input a list of paths for
which to search. First, it decodes all of the paths, meaning it
resolves any Windows system variables to all matching explicit
paths. Then it splits each path into its subdirectories and iterates
through them, checking for their presence in a representation of
the filesystem’s structure created beforehand from the DFXML
file. If the full path exists, this is recorded for later use.
The complexity of our algorithm is
O(n · m)
, where
n = |resolved paths|
and
m = |dir contents|
6
. We make
the assumption that the number of subdirectories in a given
path will be limited and add no more than a constant multiplier
to our algorithm’s complexity because we are searching for
known paths of common programs, meaning it does not
have the same capacity for expansion the way that
n
or
m
do. For example, in the case of Chrome cookie files
in Windows 7, the path specified previously will resolve
to
C:\User\<user name>\AppData\Local\Google\
Chrome\User Data\Default\Cookies
, which is a to-
tal of 9 directories before reaching the target file (
Cookies
).
5
Available at https://bitbucket.org/jpaglier/henson
6
Due to the page limit, we omit the details of our algorithm.
[
{
"source":"dfxml://file37",
"format":"cookie",
"md5" :"1e6c344157eb14a79fefc07a9800695c",
"found" :"2013-02-28T16:55:42-07:00"
}
]
Figure 3: Initial mapping structure created by a discovery
module
After discovering the cookie database, the last task the
module performs is to store important information about
the possible credential source in a JSON file for use in the
Evidence Mapping phase. While the discovery module cannot
map the credential to a service because it did not search the
contents of the database for service-specific information, it
stores the source, format, checksum, and time of discovery of
the credential in a JSON file as illustrated in Fig. 3. With this,
PlugsE’s logging process has the information it needs and the
relevant evidence mapping modules know which files to use
when carrying out their discovery attempts. The module then
returns the JSON file to the main PlugsE process which passes
it to any modules registered for handling a cookie credential.
C. Evidence Mapping
Now that the cookies have been discovered, PlugsE invokes
all evidence mapping modules registered to work with cookie
databases, passing the possible credential sources to each of
them. In some cases, it may be necessary at this point for
the examiner to manually map the credentials to a service, as
mentioned in Section
III-C
. PlugsE will determine that this
is the case when one of two things happens: 1) no mapping
module has been registered to work with the source and format
of a credential, or 2) none of the registered modules were
successful in mapping it to a service.
In our example, identifying the cookies for a Gmail account
is straightforward because the fields shown in Fig. 4 will
be present. Our mapping module for PlugsE searches for
these fields and upon detecting them creates an entry in
the mapping table which identifies this cookie database as
containing credentials for Gmail. Fig. 5 shows what this entry
looks like. Although the complexity of our module depends on
the efficiency of the Python sqlite3
7
library, it only searches
7
http://docs.python.org/2/library/sqlite3.html; complexity of individual oper-
ations not provided.


Citations
More filters
Proceedings ArticleDOI

An insight into digital forensics branches and tools

TL;DR: In this paper, various digital forensics branches along with the available forensics tools have been discussed and the efficiency and effectiveness of the varioustools have been compared based on their features.
Journal ArticleDOI

Live forensics of tools on android devices for email forensics

TL;DR: The subject of this research focused on Android-based email service to get as much digital evidence as possible on both tools to acquire digital evidence using National Institute of Standards and Technology method.
Book ChapterDOI

Challenges, Opportunities and a Framework for Web Environment Forensics

TL;DR: Despite the fact that data from the web is more challenging for forensic investigators to acquire and analyze, web environments continue to store more data than ever on behalf of users.

A Framework for Extended Acquisition and Uniform Representation of Forensic Email Evidence

TL;DR: An extensible framework implementing this novel process-driven approach has been implemented in an attempt to address the problems of comprehensiveness, extensibility, uniformity, collaboration/distribution, and consistency within forensic investigations involving email evidence.
Posted Content

On Geo Location Services for Telecom Operators.

TL;DR: This paper presents location based service for telecom providers and discusses the sharing location information via geo-messages, which lets users share location data on the peer to peer basis.
References
More filters

That ‘Internet of Things’ Thing

Kevin Ashton
TL;DR: The phrase "Internet of Things" started life as the title of a presentation I made at Procter & Gamble (P&G) in 1999 as mentioned in this paper, which was more than just a good way to get executive attention.
Journal ArticleDOI

Bringing science to digital forensics with standardized forensic corpora

TL;DR: It is explained why corpora are needed to further forensic research, a taxonomy for describing corpora is presented, and the availability of several forensic data sets are announced.
Journal ArticleDOI

Dropbox analysis: Data remnants on user machines

TL;DR: By determining the data remnants on client devices, research contributes to a better understanding of the types of terrestrial artifacts that are likely to remain for digital forensics practitioners and examiners.
Book

Guide to Computer Forensics and Investigations

TL;DR: This hands-on learning text provides clear instruction on the tools and techniques of the trade, introducing readers to every step of the computer forensics investigation-from lab set-up to testifying in court.
Proceedings ArticleDOI

Automating Disk Forensic Processing with SleuthKit, XML and Python

TL;DR: A program called |fiwalk| which produces detailedXML describing all of the partitions and files on a hard drive or disk image, as well as any extractable metadata from the document filesthemselves is developed.
Frequently Asked Questions (14)
Q1. What contributions have the authors mentioned in the paper "Towards comprehensive and collaborative forensics on email evidence†" ?

In this paper the authors present a systematic process for email forensics which they integrate into the normal forensic analysis workflow, and which accommodates the distinct characteristics of email evidence. Their process focuses on detecting the presence of non-obvious artifacts related to email accounts, retrieving the data from the service provider, and representing email in a well-structured format based on existing standards. As a result, developers and organizations can collaboratively create and use analysis tools that can analyze email evidence from any source in the same fashion and the examiner can access additional data relevant to their forensic cases. 

A current trend in digital forensics is the use of XML as a data representation format, allowing for a firm layer of abstraction “between feature extraction and analysis” and “a single, XML-based output format for forensic analysis tools” [9]. 

The final challenge to acquiring data from Gmail is that the only method for retrieving the raw email data is to essentially “screen scrape” the pages returned during a web session, parsing through the HTML and using regular expression patterns or searching through the Document Object Model (DOM) for the desired elements. 

After discovering the cookie database, the last task the module performs is to store important information about the possible credential source in a JSON file for use in the Evidence Mapping phase. 

Following the step of processing the evidence into EFXML, a number of verification tasks were carried out including reproducing checksums and comparing counts of messages between the original and EFXML representations. 

While the optimal acquisition method for retrieving a copy of all emails is to do so via IMAP, cookies are specific to the HTTP protocol and will not work to authenticate through IMAP. 

\\AppData\\Local\\Google\\ Chrome\\User Data\\Default\\Cookies, which is a total of 9 directories before reaching the target file (Cookies). 

Due to the nature of email data, it is possible to observe a size increase in particularly imperfect cases (e.g. where volume of header data exceeds the volume of body data) after the addition of the EFXML tags to the data20; however, their evaluations point toward an average case which does not approach this situation. 

As each email is a discrete, individually identifiable piece of data, the authors assert that a checksum of the plain text content of the original form of an email message is the most useful check against data integrity. 

The authors recognize that a few circumstances have to be ideal in order for this acquisition process to work, namely that the owner of the credentials is always signed in, that the cookies have not yet expired and are discoverable by some means, and that the notification banner of having added the delegate account will not compromise the investigation. 

Their implementation ensures the reliability and accuracy of evidence it handles by measuring the integrity of each message by taking its checksum during supplemental acquisition and evidence processing. 

The next step after acquiring, processing, and authenticating the evidence is to perform forensic analyses that will be informative for the purposes of the investigation. 

Because of this, the authors have defined two new representations which are more suitable for email forensics, but maintain some of the standard elements introduced in DFXML, such as byte runs of discrete pieces of evidence. 

It opens a browser and connects to Gmail, and as long as the cookies are still valid it performs each of the steps for adding a delegate as outlined in the Google help pages10, which takes O(1) time.