Analysis of EZproxy server logs to visualise research activity in Curtin’s online library
Summary (4 min read)
Introduction
- Curtin Library has a substantial dataset of logged, authenticated use of its online library collection, comprising databases, eJournals and eBooks dating from 2013.
- Making sense of and drawing meanings from the raw data, which is mainly in the textual format of URL codes, is nearly impossible.
- EZproxy offered a rich source of data for analysis, containing a detailed log of HTTP requests processed through the library’s authentication servers.
What is an EZproxy server?
- EZproxy is an internet proxy server that is primarily used to first authenticate then allow computers outside a library network to access content provided by the library without requiring additional logins.
- Figure 1 illustrates how this works: when a user accesses the Curtin catalogue and attempts to access content on an Online Database Vendor’s website, for example ProQuest, they are authenticated by the library’s authentication system, which then creates a session with the EZproxy server and redirects the user’s browser there, allowing access to all subscribed library content.
- EZproxy handles access by URL rewriting: taking the URL requested by a user and modifying it so that the web server holding the content accepts the request.
- An example is when a user tries to access the online database ‘proquest.com’: the user’s browser will be informed of the URL that contains the cookie, ‘proquest.dbgq.lis.curtin.edu.au’.
Dataset
- A literature review shows how other academic libraries handle their EZproxy data logs.
- These studies demonstrate that it is possible to extract useful information about trends and usage patterns from EZproxy datasets, and that these can be used to improve library services in universities.
- The grep tool is a Unix based pattern matcher that searches through plain text data sets in the target file and outputs all lines that match a regular expression of these patterns.
- This project offers a new and friendlier way for librarians to analyse datasets that have traditionally been difficult for them to understand.
- Joseph et al. (2013) state that information search behaviour comprises ‘search processes’ and ‘search activities’.
Research aims
- The current research does not have research questions; however, their project’s research aims will enable the development of research tools that answer questions raised in later research.
- The authors research aims are to develop interactive and immersive data visualisation prototypes with user-friendly interfaces to enable the Curtin library team to explore how the online library is being used by its patrons.
Research methodology
- Three sequential steps were performed to develop the desired visualisations for this project.
- First the EZprozy dataset was curated into a format that enabled easy extraction of useful data from the logs and the ability to remove all lines in the log file that did not represent actual search data.
- This included such things as the required webpage data (HTML or CSS) and images/logos (GIFs, JPEGs, PNGs).
- Unity scripts were written using the C# programming language to query and extract the data required to develop the visualisations.
- Unity’s simple user interface made it user-friendly and easy for beginners to use and learn.
Curation of the EZproxy Log File Dataset
- The authors project’s dataset includes five years of EZproxy log data; Curtin University Library has been logging data since 2013.
- The data is stored within text files, one file for each day, to 2016.
- O Codes starting with 2 are successful requests: the data that the user requested arrives.
- This refers to the user’s web browser (e.g. Google Chrome or Internet Explorer) sending the request, and identifies the browser and device being used to make the request.
- For privacy and anonymity reasons, patrons’ identities cannot be revealed.
Issues with Data
- A few key issues with the dataset had to be resolved before the authors could begin work on any visualisations of the dataset.
- All but one specific dataset persisted to present issues in retrieving meaningful data from the log files.
- The first of these was that EZproxy logs all HTTP requests, and many of these are for parts of a website that would not be useful to visualise as they do not represent a user using the library but rather parts of a webpage being sent to the users’ browser, such as images on the webpage or CSS stylesheets.
- This process took approximately 60 hours to run over a weekend, given the large amount of data, roughly 1.2 Terabytes of raw text data.
- The third issue, that of difficulty in extracting meaningful data, was the format of the URLs.
Results – Visualisation Prototype Developments
- The authors project developed two main visualisations to showcase research activity in Curtin’s online library space, described next.
- It uses a geographical map of the world as a platform to show from where each research request comes from, the time the request is made, and how large the file size of the request is .
- Square icons are also used to represent the status of users’ requests, colour coded to indicate the HTTP status of the request.
- User interface displaying specific details about an individual search request].
Database Usage Visualisation
- The first is to build a search interface that enables queries about the use of specific online databases currently subscribed.
- This feature would allow users to view what is most important to them, quickly and easily.
Discussion
- Given the limited research on dynamic 3D visualisation of EZproxy datasets, it was not possible to compare their project with other work.
- The literature review reports the use of data visualisation software like Tableau for visualising EZproxy data and 2D graphing software to create simple visualisations from EZproxy datasets.
- In comparison, the two Global and Database 3D visualisation prototypes developed for this project are more immersive and dynamic than the 2D presentations reported by Bhaskar et al. (2014).
- The ability to move around a scene provides a user-friendly interface to navigate and digest the complex EZproxy dataset information, as a large set of bar graphs cannot.
Design Considerations for the Prototypes
- When designing the visualisations for this project, the authors considered how the data would be viewed and how different interactive environments could be developed using the Unity software.
- Unity was chosen for its ability to create immersive environments quickly and easily, and to present the content on multiple platforms using the same scripts and assets; it also has options to develop and present content for all computing and mobile platforms and via web browsers.
- In short, the specifications were to design immersive and userfriendly interfaces to explore and make sense of the rich information dataset the EZproxy server logs files contained.
- The use of the HIVE screens assists in the presentation of visualisations, more than is possible with a regular desktop screen display.
- The environments in which the data is showcased are open spaces, allowing the user to move freely around and investigate the data.
Further technical development
- Continue development using the Unity 3D platform.
- Once all the data has been stored and hosted on a database that can hold large data quantities, a few changes will need to be made so that rather than reading files, Unity can query the database and then use that data to develop the visualisations.
- Time-based queries would also be possible, allowing viewers to select a more dynamic time, rather than starting at midnight and only one day at a time.
- Further development Global Visualisation Currently one icon is instantiated every frame, which causes issues when moving through the timeframe, also known as Existing Visualisation Models.
Inclusion of Faculty information
- Adding academic faculty information to the EZproxy dataset, such as student enrolment data by academic staff, by faculty, or by discipline area, would enable insights into information behaviour and resource use by these faculties.
- Adding this faculty information would offer a deeper understanding of how different faculties use the library.
- Precedent research in other universities (Chan, 2014; Coombs, 2005; Grace and Bremner, 2004) using EZproxy datasets indicate that different faculties use the library differently, with each having preferences for databases and eBooks, and different levels of use.
- Having such information would enable the provision of targeted e-library services like those reported by Chan (2014), and Grace and Bremner (2004, p. 164): for instance, the development of personalised library websites/portals for different disciplines.
- This model of library service delivery aligns with Curtin University’s strategic vision to ‘deliver a seamless, responsive and innovative digital environment’ for learning and student experience (Curtin University, 2017, p. 5).
Profiling the average Curtin Library User
- Profiling a visual image of Curtin University’s ‘average’ user will assist with planning initiatives in library service.
- Given that EZproxy logs contain all the search history of all who use the library, the ability to track a single user’s resource usage patterns over a period of time would be an interesting visualisation to develop.
- It would provide a glimpse into research activity over a longer period; and if performed with users from different discipline areas and faculties, could lead to the formulation of archetypes of researchers in different faculties, discipline areas and campuses.
- As Curtin operates in several locations internationally, it would be a useful exercise to compare the use of someone at the Bentley campus user versus someone at the Singapore campus.
- It could be fruitful to compare and contrast the ‘average’ user for different faculties and campuses to visualise their differences and similarities.
Understanding Users’ Information Search Habits
- From the EZproxy server logs, it is possible to observe aspects of users’ ‘search processes’—to examine the paths users take to reach databases.
- One key factor of visualising research activity is to see what users search for, as well as what they actually download, when they browse the online library.
Conclusion
- This short 10-week HIVE Intern research project highlights opportunities for developing interactive, user-friendly and immersive ways to visualise and make sense of the rich EZproxy dataset.
- It also indicates various avenues for expansion.
- Both the Global Visualisation and Database Usage Visualisation prototypes provide visual evidence of the high volume of usage of Curtin Library’s digital resources—eBooks and databases, and of the accessibility and usage of the library’s digital contents at any time and from anywhere.
- It offers evidence of how the library supports the university’s strategic goal of becoming a global campus by 2020, delivering courses internationally (Curtin University, 2016).
- First it curated EZproxy log files into formats required to feed into Unity software and develop visualisation prototypes.
Did you find this useful? Give us your feedback
Citations
4 citations
Cites background from "Analysis of EZproxy server logs to ..."
..., 2019), the number of abstract views and full-text downloads (Calvert, 2015; Greenberg & Bar-Ilan, 2017), or visualize the activity of the searchers (Joseph et al., 2019)....
[...]
1 citations
References
813 citations
357 citations
128 citations
71 citations
43 citations
Related Papers (5)
Frequently Asked Questions (11)
Q2. What are the future works in this paper?
As the foundational work for this project was developed in Unity, it is recommended that future work continues using this platform. Preliminary work has been done to export the data to a MySQL database, so this is an area that could be further developed. Once all the data has been stored and hosted on a database that can hold large data quantities, a few changes will need to be made so that rather than reading files, Unity can query the database and then use that data to develop the visualisations. Existing Visualisation Models: further development Global Visualisation
Q3. What is the name of the first visualisation prototype?
The first visualisation prototype is referred as the Global Visualisation of Curtin Research Activity (henceforth Global Visualisation).
Q4. What can be done to help improve the quality of the library’s services?
It can also enable the delivery of e-library services, such as the personalisation of library websites/portals for students and staff in specific discipline areas to provide rapid access to their most frequently used online resources (Chan, 2013; Grace and Bremner, 2004, p. 162;).
Q5. What is the effect of the cylinder screen?
The cylinder screen’s 180-degree, 3D field of view enables more content to be presented on screen, and in a more immersive manner, than a normal desktop computer screen creating a more immersive feel and effect, as shown earlier in Figures 9, 10 and in 15.
Q6. What is the common way to access the library’s online databases?
A user may access the library’s online databases from a personal browser by going to thelibrary catalogue or A – Z list of subscription databases, or through Google Scholar.
Q7. What is the use of the cylinder in the Database Visualisation prototype?
The use of the cylinder is helpful in the Database Visualisation prototype as it allows a wider field of view, giving the feeling of being immersed in the data.
Q8. What was the primary choice for the DatabaseVisualisation prototype?
The dome display, presented in Figures 16 and 17 was a secondary choice for the DatabaseVisualisation prototype, as it fully encompasses the viewer’s field of vision to create a feeling of being fully immersed (Figure 17) with the data.
Q9. How many hours did it take to run over a weekend?
This process took approximately 60 hours to run over a weekend, given the large amount of data, roughly 1.2 Terabytes of raw text data.
Q10. What is the data used by Curtin Library?
Curtin Library has a substantial dataset of logged, authenticated use of its online library collection, comprising databases, eJournals and eBooks dating from 2013.
Q11. What is the definition of information retrieval?
The authors consider information retrieval is a crucial activity in the information-seeking process,especially in the context of this research: that users wish to fulfil their informational needs from the widest possible pool of electronic resources.