scispace - formally typeset
Search or ask a question

Showing papers on "Web page published in 2020"


Journal ArticleDOI
TL;DR: Presentation de criteres pour l'evaluation des pages Web : precision, credit a apporter a l'auteur, objectivite, mise a jour, liens hypertexte...
Abstract: Presentation de criteres pour l'evaluation des pages Web : precision, credit a apporter a l'auteur, objectivite, mise a jour, liens hypertexte...

135 citations


Journal ArticleDOI
TL;DR: It has been hypothesized that the trend in the proposal of traditional methods to mitigate XSS attacks is greater than the proposals that use some artificial intelligence technique.

87 citations


Posted ContentDOI
05 May 2020-bioRxiv
TL;DR: Embigv.js is an embeddable JavaScript implementation of the Integrative Genomics Viewer that can be easily dropped into any web page with a single line of code and has no external dependencies.
Abstract: igv.js is an embeddable JavaScript implementation of the Integrative Genomics Viewer (IGV). It can be easily dropped into any web page with a single line of code and has no external dependencies. The viewer runs completely in the web browser, with no backend server and no data pre-processing required.

60 citations


Proceedings ArticleDOI
01 May 2020
TL;DR: AdGraph as mentioned in this paper uses a graph representation of the HTML structure, network requests, and JavaScript behavior of a webpage, and uses this unique representation to train a classifier for identifying advertising and tracking resources.
Abstract: User demand for blocking advertising and tracking online is large and growing. Existing tools, both deployed and described in research, have proven useful, but lack either the completeness or robustness needed for a general solution. Existing detection approaches generally focus on only one aspect of advertising or tracking (e.g. URL patterns, code structure), making existing approaches susceptible to evasion.In this work we present AdGraph, a novel graph-based machine learning approach for detecting advertising and tracking resources on the web. AdGraph differs from existing approaches by building a graph representation of the HTML structure, network requests, and JavaScript behavior of a webpage, and using this unique representation to train a classifier for identifying advertising and tracking resources. Because AdGraph considers many aspects of the context a network request takes place in, it is less susceptible to the single-factor evasion techniques that flummox existing approaches.We evaluate AdGraph on the Alexa top-10K websites, and find that it is highly accurate, able to replicate the labels of human-generated filter lists with 95.33% accuracy, and can even identify many mistakes in filter lists. We implement AdGraph as a modification to Chromium. AdGraph adds only minor overhead to page loading and execution, and is actually faster than stock Chromium on 42% of websites and AdBlock Plus on 78% of websites. Overall, we conclude that AdGraph is both accurate enough and performant enough for online use, breaking comparable or fewer websites than popular filter list based approaches.

51 citations


Journal ArticleDOI
TL;DR: Markdown is a flat text file that can have software code embedded in it that is employed for writing reproducible research documents which can be converted into web pages, presentations, or other presentations.
Abstract: Markdown is a flat text file that can have software code embedded in it. It is employed for writing reproducible research documents which can be converted into web pages, presentations, or other de...

50 citations


Journal ArticleDOI
TL;DR: There is a need to know for the great and awful effects of the web based life on the authors' instruction sectors as well as on their people to come and make an appropriate arrangement in the two cases if the use of webbased life is positive or negative.

49 citations


Journal ArticleDOI
TL;DR: This paper not only surveys the proposed methodologies in the literature, but also traces their evolution and portrays different perspectives toward this problem, finding that developing a detailed testbed along with evaluation metrics and establishing standard benchmarks remain a gap in assessing Web page classifiers.
Abstract: The explosive growth of the amount of information on Internet has made Web page classification essential for Web information management, retrieval, and integration, Web page indexing, topic-specific Web crawling, topic-specific information extraction models, advertisement removal, filtering out unwanted, futile, or harmful contents, and parental control systems. Owing to the recent staggering growth of performance and memory space in computing machines, along with specialization of machine learning models for text and image classification, many researchers have begun to target the Web page classification problem. Yet, automatic Web page classification remains at its early stages because of its complexity, diversity of Web pages’ contents (images of different sizes, text, hyperlinks, etc.), and its computational cost. This paper not only surveys the proposed methodologies in the literature, but also traces their evolution and portrays different perspectives toward this problem. Our study investigates the following: (a) metadata and contextual information surrounding the terms are mostly ignored in textual content classification, (b) the structure and distribution of text in HTML tags and hyperlinks are understudied in textual content classification, (c) measuring the effectives of features in distinguishing among Web page classes or measuring the contribution of each feature in the classification accuracy is a prominent research gap, (d) image classification methods rely heavily on computationally intensive and problem-specific analyses for feature extraction, (e) semi-supervised learning is understudied, despite its importance in Web page classification because of the massive amount of unlabeled Web pages and the high cost of labeling, (f) deep learning, convolutional and recurrent networks, and reinforcement learning remain underexplored but intriguing for Web page classification, and last but not least (g) developing a detailed testbed along with evaluation metrics and establishing standard benchmarks remain a gap in assessing Web page classifiers.

49 citations


Journal ArticleDOI
TL;DR: Webina is a new version of Vina that runs Vina entirely in a web browser, so users need only visit a Webina-enabled webpage to complete docking calculations.
Abstract: Motivation Molecular docking is a computational technique for predicting how a small molecule might bind a macromolecular target Among docking programs, AutoDock Vina is particularly popular Like many docking programs, Vina requires users to download/install an executable file and to run that file from a command-line interface Choosing proper configuration parameters and analyzing Vina output is also sometimes challenging These issues are particularly problematic for students and novice researchers Results We created Webina, a new version of Vina, to address these challenges Webina runs Vina entirely in a web browser, so users need only visit a Webina-enabled webpage The docking calculations take place on the user's own computer rather than a remote server Availability and implementation A working version of the open-source Webina app can be accessed free of charge from http://durrantlabcom/webina Supplementary information Supplementary data are available at Bioinformatics online

46 citations


Journal ArticleDOI
TL;DR: This study proposes a novel approach, namely UzunExt, which extracts content quickly using the string methods and additional information without creating a DOM Tree, which can easily be adapted to other DOM-based studies/parsers in this task to enhance their time efficiencies.
Abstract: Web scraping is a process of extracting valuable and interesting text information from web pages. Most of the current studies targeting this task are mostly about automated web data extraction. In the extraction process, these studies first create a DOM tree and then access the necessary data through this tree. The construction process of this tree increases the time cost depending on the data structure of the DOM Tree. In the current web scraping literature, it is observed that time efficiency is ignored. This study proposes a novel approach, namely UzunExt, which extracts content quickly using the string methods and additional information without creating a DOM Tree. The string methods consist of the following consecutive steps: searching for a given pattern, then calculating the number of closing HTML elements for this pattern, and finally extracting content for the pattern. In the crawling process, our approach collects the additional information, including the starting position for enhancing the searching process, the number of inner tag for improving the extraction process, and tag repetition for terminating the extraction process. The string methods of this novel approach are about 60 times faster than extracting with the DOM-based method. Moreover, using these additional information improves extraction time by 2.35 times compared to using only the string methods. Furthermore, this approach can easily be adapted to other DOM-based studies/parsers in this task to enhance their time efficiencies.

43 citations


Journal ArticleDOI
TL;DR: The OntoBestFit has been proposed which is an RDF driven approach for minimizing ambiguity in search results and increasing the diversity of results, thereby solving both the context irrelevance and the serendipity problem.

41 citations


Proceedings ArticleDOI
18 May 2020
TL;DR: A WF classifier that can scale to any open world set size is shown, and the use of precise classifiers to tackle realistic objectives in website fingerprinting is investigated, including different types of websites, identification of sensitive clients, and defeating websites fingerprinting defenses.
Abstract: Traffic analysis attacks to identify which web page a client is browsing, using only her packet metadata — known as website fingerprinting (WF) — has been proven effective in closed-world experiments against privacy technologies like Tor. We want to investigate their usefulness in the real open world. Several WF attacks claim to have high recall and low false positive rate, but they have only been shown to succeed against high base rate pages. We explicitly incorporate the base rate into precision and call it r-precision. Using this metric, we show that the best previous attacks have poor precision when the base rate is realistically low; we study such a scenario (r = 1000), where the maximum r-precision achieved was only 0.14.To improve r-precision, we propose three novel classes of precision optimizers that can be applied to any classifier to increase precision. For r = 1000, our best optimized classifier can achieve a precision of at least 0.86, representing a precision increase by more than 6 times. For the first time, we show a WF classifier that can scale to any open world set size. We also investigate the use of precise classifiers to tackle realistic objectives in website fingerprinting, including different types of websites, identification of sensitive clients, and defeating website fingerprinting defenses.

Journal ArticleDOI
TL;DR: A companion scheme to recognize brands of “zero hour” phishing web pages by localizing and classifying the target brand logos involved in page screenshots by solely use of computer vision methods in object detection manner is proposed.

Journal ArticleDOI
TL;DR: Two novel frameworks are proposed to compare the performance of web accessibility evaluation tools in detecting web accessibility issues based on WCAG 2.0 and the findings showed that the homepage of Taibah University is more accessible than the homepages of other Saudi universities.
Abstract: With the growth of e-services in the past two decades, the concept of web accessibility has been given attention to ensure that every individual can benefit from these services without any barriers. Web accessibility is considered one of the main factors that should be taken into consideration while developing webpages. Web Content Accessibility Guidelines 2.0 (WCAG 2.0) have been developed to guide web developers to ensure that web contents are accessible for all users, especially disabled users. Many automatic tools have been developed to check the compliance of websites with accessibility guidelines such as WCAG 2.0 and to help web developers and content creators with designing webpages without barriers for disabled people. Despite the popularity of accessibility evaluation tools in practice, there is no systematic way to compare the performance of web accessibility evaluators. This paper first presents two novel frameworks. The first one is proposed to compare the performance of web accessibility evaluation tools in detecting web accessibility issues based on WCAG 2.0. The second framework is utilized to evaluate webpages in meeting these guidelines. Six homepages of Saudi universities were chosen as case studies to substantiate the concept of the proposed frameworks. Furthermore, two popular web accessibility evaluators, Wave and SiteImprove, are selected to compare their performance. The outcomes of studies conducted using the first proposed framework showed that SiteImprove outperformed WAVE. According to the outcomes of the studies conducted, we can conclude that web administrators would benefit from the first framework in selecting an appropriate tool based on its performance to evaluate their websites based on accessibility criteria and guidelines. Moreover, the findings of the studies conducted using the second proposed framework showed that the homepage of Taibah University is more accessible than the homepages of other Saudi universities. Based on the findings of this study, the second framework can be used by web administrators and developers to measure the accessibility of their websites. This paper also discusses the most common accessibility issues reported by WAVE and SiteImprove.

Journal ArticleDOI
TL;DR: In this paper, the authors present a search engine for D3 visualizations that allows queries based on their visual style and underlying structure. And they show how researchers can use the search engine to identify commonly used visual design patterns and perform such a demographic design analysis across their collection of D3 charts.
Abstract: We present a search engine for D3 visualizations that allows queries based on their visual style and underlying structure. To build the engine we crawl a collection of 7860 D3 visualizations from the Web and deconstruct each one to recover its data, its data-encoding marks and the encodings describing how the data is mapped to visual attributes of the marks. We also extract axes and other non-data-encoding attributes of marks (e.g., typeface, background color). Our search engine indexes this style and structure information as well as metadata about the webpage containing the chart. We show how visualization developers can search the collection to find visualizations that exhibit specific design characteristics and thereby explore the space of possible designs. We also demonstrate how researchers can use the search engine to identify commonly used visual design patterns and we perform such a demographic design analysis across our collection of D3 charts. A user study reveals that visualization developers found our style and structure based search engine to be significantly more useful and satisfying for finding different designs of D3 charts, than a baseline search engine that only allows keyword search over the webpage containing a chart.

Journal ArticleDOI
TL;DR: The design and implementation of an automated smart hydroponics system using internet of things and a bot has been introduced to control the supply chain and for notification purposes to achieve the aim of the entire system implemented.
Abstract: This paper presents a design and implementation of an automated smart hydroponics system using internet of things. The challenges to be solved with this system are the increasing food demand in the world, the need of market of new sustainable method of farming using the Internet of Things. The design was implemented using NodeMcu, Node Red, MQTT and sensors that were chosen during component selection based on required parameters and sending it to the cloud to monitor and be processed. Investigation on previous works done and a review of Internet of Things and Hydroponic systems were done. First the prototype was constructed, programmed and tested, as well as sensors data between two different environments were taken and monitored on cloud-based web page with mobile application. Moreover, a bot has been introduced to control the supply chain and for notification purposes. The system improved its performance and allows it to successfully achieve the aim of the entire system implemented. There are some limitations which can be improved as future work such as including data science with the usage of the artificial intelligence to further improve the crops and get better outcome. Lastly to design end user platform to ease user interaction by using attractive design with no technical configuration involved.

Journal ArticleDOI
TL;DR: Results suggest relationships for both dimensions of visual complexity on all outcome variables using ANOVA and OLS regression procedures and that perceptions of visual informativeness and cues for engagement mediate the relationship between visual complexity and favourable initial impressions and behavioural intentions.
Abstract: Shortly after fixating on webpages, users form initial impressions. These initial impressions influence how much users will use and return to websites. Researchers have understudied how objective d...

Proceedings ArticleDOI
Shen Gao1, Xiuying Chen1, Zhaochun Ren2, Dongyan Zhao1, Rui Yan1 
09 Jul 2020
TL;DR: This paper focuses on the survey of these new summarization tasks and approaches in the real-world application of text summarization algorithms.
Abstract: Text summarization is the research area aiming at creating a short and condensed version of the original document, which conveys the main idea of the document in a few words. This research topic has started to attract the attention of a large community of researchers, and it is nowadays counted as one of the most promising research areas. In general, text summarization algorithms aim at using a plain text document as input and then output a summary. However, in real-world applications, most of the data is not in a plain text format. Instead, there is much manifold information to be summarized, such as the summary for a web page based on a query in the search engine, extreme long document (e.g., academic paper), dialog history and so on. In this paper, we focus on the survey of these new summarization tasks and approaches in the real-world application.

Journal ArticleDOI
24 Dec 2020
TL;DR: In addition to delivering e-learning teaching, it can monitor student performance and report student progress, and help learners with specific results.
Abstract: This paper will talk all about advantages of Online Learning. Which Online learning (also known as electronic learning or e-learning) is the result of teaching delivered electronically using computer-based media. The material is frequently accessed via a network, including websites, the internet, intranets, CDs and DVDs. E-learning not only accesses information (eg, putting up web pages), but also helps learners with specific results (eg achieving goals). In addition to delivering e-learning teaching, it can monitor student performance and report student progress.

Proceedings ArticleDOI
04 May 2020
TL;DR: This work proposes a novel deep learning architecture, Texception, that takes a URL as input and predicts whether it belongs to a phishing attack and is able to significantly outperform a traditional text classification method.
Abstract: Phishing is the starting point for many cyberattacks that threaten the confidentiality, availability and integrity of enterprises’ and consumers’ data. The URL of a web page that hosts the attack provides a rich source of information to determine the maliciousness of the web server. In this work, we propose a novel deep learning architecture, Texception, that takes a URL as input and predicts whether it belongs to a phishing attack. Architecturally, Texception uses both character-level and word-level information from the incoming URL and does not depend on manually crafted features or feature engineering. This makes it different from classical approaches. In addition, Texception benefits from multiple parallel convolutional layers and can grow deeper or wider. We show that this flexibility enables Texception to generalize better for new URLs. Our results on production data show that Texception is able to significantly outperform a traditional text classification method by increasing the true positive rate by 126.7% at an extremely low false positive rate (0.01%) which is crucial for our model’s healthy operation at internet scale.

Journal ArticleDOI
TL;DR: SnpHub is presented, a Shiny/R-based server framework for retrieving, analysing, and visualizing large-scale genomic variation data that can be easily set up on any Linux server and can be applied to any species.
Abstract: Background The cost of high-throughput sequencing is rapidly decreasing, allowing researchers to investigate genomic variations across hundreds or even thousands of samples in the post-genomic era. The management and exploration of these large-scale genomic variation data require programming skills. The public genotype querying databases of many species are usually centralized and implemented independently, making them difficult to update with new data over time. Currently, there is a lack of a widely used framework for setting up user-friendly web servers to explore new genomic variation data in diverse species. Results Here, we present SnpHub, a Shiny/R-based server framework for retrieving, analysing, and visualizing large-scale genomic variation data that can be easily set up on any Linux server. After a pre-building process based on the provided VCF files and genome annotation files, the local server allows users to interactively access single-nucleotide polymorphisms and small insertions/deletions with annotation information by locus or gene and to define sample sets through a web page. Users can freely analyse and visualize genomic variations in heatmaps, phylogenetic trees, haplotype networks, or geographical maps. Sample-specific sequences can be accessed as replaced by detected sequence variations. Conclusions SnpHub can be applied to any species, and we build up a SnpHub portal website for wheat and its progenitors based on published data in recent studies. SnpHub and its tutorial are available at http://guoweilong.github.io/SnpHub/. The wheat-SnpHub-portal website can be accessed at http://wheat.cau.edu.cn/Wheat_SnpHub_Portal/.

Journal ArticleDOI
TL;DR: A Deep Neural Network (DNN) based emotionally aware campus virtual assistant that provides a simple voice response interface, without the need for users to find information in complex web pages or app menus.
Abstract: With the advent of the 5G and Artificial Intelligence of Things (AIoT) era, related technologies such as the Internet of Things, big data analysis, cloud applications, and artificial intelligence have brought broad prospects to many application fields, such as smart homes, autonomous vehicles, smart cities, healthcare, and smart campus. At present, most university campus app is presented in the form of static web pages or app menus. This study mainly developed a Deep Neural Network (DNN) based emotionally aware campus virtual assistant. The main contributions of this research are: (1) This study introduces the Chinese Word Embedding to the robot dialogue system, effectively improving dialogue tolerance and semantic interpretation. (2) The traditional method of emotion identification must first tokenize the Chinese sentence, analyze the clauses and part of speech, and capture the emotional keywords before being interpreted by the expert system. Different from the traditional method, this study classifies the input directly through the convolutional neural network after the input sentence is converted into a spectrogram by Fourier Transform. (3) This study is presented in App mode, which is easier to use and economical. (4) This system provides a simple voice response interface, without the need for users to find information in complex web pages or app menus.

Journal ArticleDOI
TL;DR: Evaluating the effects of visual aesthetic of the Web pages on users’ behavior in online shopping environment revealed that the website aesthetics had the greatest direct impact on “perceived quality of online services,” “trust,’ “satisfaction” and “arousal” respectively.
Abstract: The success of e-commerce websites depends on their effective communication and influence on their users. At first glance, the users are impressed by the website design and, if inspired, they would continue their operations on the website. This paper aims to evaluate the effects of visual aesthetic of the Web pages on users’ behavior in online shopping environment. In particular, the paper aims to evaluate the elements of visual aesthetic on the organism variables (i.e. “satisfaction,” “arousal,” “perceived on-line service quality” and “trust”) and measure them on the users’ response (i.e. purchase, comparison and re-visit).,Using the stimulus–organism–response (S-O-R) framework, the authors first assessed direct and indirect effects of visual aesthetics of e-commerce websites on customer responses. Then, the Visual Aesthetics of Websites Inventory (VisAWI) method was used to examine the effects of four dimensions (i.e. craftsmanship, simplicity, diversity and colorfulness) on users’ perceived website aesthetics. To do so, DigiKala.com, a famous Iranian e-commerce website was selected and the questionnaires were distributed among its users.,The study results revealed that the website aesthetics in the S-O-R evaluation had the greatest direct impact on “perceived quality of online services,” “trust,” “satisfaction” and “arousal,” respectively. These variables also indirectly affected “shopping,” “revisit” and “comparison to similar products on other websites.” Regarding the evaluations based on the VisAWI, the component “craftsmanship” played the most central role in expressing the website aesthetics, followed by the variables “simplicity,” “diversity” and “colorfulness,” respectively.,Although the considerable effect of Web aesthetics on customers’ purchase behavior has been identified in previous research, it has not been accurately measured. Furthermore, studies on Web aesthetics are mostly limited to information systems’ users and do not concern consumers. Therefore, considering the increasing growth in online shopping and the significance of Web aesthetics to online consumers, investigating how consumers respond to Web aesthetics is of vital importance.

Journal ArticleDOI
01 Jun 2020
TL;DR: The method proposed uses the latent semantic analysis to retrieve significant information’s from the question raised by the user or the bulk documents and demonstrates the superiority of the method proposed in terms of precision, recall and F-score.
Abstract: Retrieving of information from the huge set of data flowing due to the day to day development in the technologies has become more popular as it assists in searching for the valuable information in a structured, unstructured or a semi structured data set like text, database, multimedia, documents, and internet etc. The retrieval of information is performed employing any one of the models starting from the simple Boolean model for retrieving information, or using other frame works such as probabilistic, vector space and the natural language modelling. The paper is emphasis on using a natural language model based information retrieval to recover the meaning insights from the enormous amount of data. The method proposed in the paper uses the latent semantic analysis to retrieve significant information’s from the question raised by the user or the bulk documents. The carried out method utilizes the fundamentals of semantic factor occurring in the data set to identify the useful insights. The experiment analysis of the proposed method is carried out with few state of art dataset such as TIME, LISA, CACM and the NPL etc. and the results obtained demonstrate the superiority of the method proposed in terms of precision, recall and F-score.

Proceedings ArticleDOI
27 Oct 2020
TL;DR: The differences between landing and internal (i.e., non-root) pages of 1000 web sites are characterized to demonstrate that the structure and content of internal pages differ substantially from those of landing pages, as well as from one another.
Abstract: There is a rich body of literature on measuring and optimizing nearly every aspect of the web, including characterizing the structure and content of web pages, devising new techniques to load pages quickly, and evaluating such techniques. Virtually all of this prior work used a single page, namely the landing page (i.e., root document, "/"), of each web site as the representative of all pages on that site. In this paper, we characterize the differences between landing and internal (i.e., non-root) pages of 1000 web sites to demonstrate that the structure and content of internal pages differ substantially from those of landing pages, as well as from one another. We review more than a hundred studies published at top-tier networking conferences between 2015 and 2019, and highlight how, in light of these differences, the insights and claims of nearly two-thirds of the relevant studies would need to be revised for them to apply to internal pages.Going forward, we urge the networking community to include internal pages for measuring and optimizing the web. This recommendation, however, poses a non-trivial challenge: How do we select a set of representative internal web pages from a web site? To address the challenge, we have developed Hispar, a "top list" of 100,000 pages updated weekly comprising both the landing pages and internal pages of around 2000 web sites. We make Hispar and the tools to recreate or customize it publicly available.

Journal ArticleDOI
TL;DR: A set of novel graph features are proposed to improve the phishing detection accuracy and leverage inherent phishing patterns that are only visible at a higher level of abstraction, thus making it robust and difficult to be evaded by direct manipulations on the webpage contents.

Proceedings ArticleDOI
27 Jun 2020
TL;DR: This study highlights the challenges posed in automatically inferring a model for any given web app and shows that even with the best thresholds, no algorithm is able to accurately detect all functional near-duplicates within apps, without sacrificing coverage.
Abstract: Automated web testing techniques infer models from a given web app, which are used for test generation. From a testing viewpoint, such an inferred model should contain the minimal set of states that are distinct, yet, adequately cover the app's main functionalities. In practice, models inferred automatically are affected by near-duplicates, i.e., replicas of the same functional webpage differing only by small insignificant changes. We present the first study of near-duplicate detection algorithms used in within app model inference. We first characterize functional near-duplicates by classifying a random sample of state-pairs, from 493k pairs of webpages obtained from over 6,000 websites, into three categories, namely clone, near-duplicate, and distinct. We systematically compute thresholds that define the boundaries of these categories for each detection technique. We then use these thresholds to evaluate 10 near-duplicate detection techniques from three different domains, namely, information retrieval, web testing, and computer vision on nine open-source web apps. Our study highlights the challenges posed in automatically inferring a model for any given web app. Our findings show that even with the best thresholds, no algorithm is able to accurately detect all functional near-duplicates within apps, without sacrificing coverage.

Proceedings ArticleDOI
20 Apr 2020
TL;DR: In this article, the authors measured the effect of Do53, DoT, and DoH on query response times and page load times from five global vantage points and provided several recommendations to improve DNS performance, such as opportunistic partial responses and wire format caching.
Abstract: Nearly every service on the Internet relies on the Domain Name System (DNS), which translates a human-readable name to an IP address before two endpoints can communicate. Today, DNS traffic is unencrypted, leaving users vulnerable to eavesdropping and tampering. Past work has demonstrated that DNS queries can reveal a user’s browsing history and even what smart devices they are using at home. In response to these privacy concerns, two new protocols have been proposed: DNS-over-HTTPS (DoH) and DNS-over-TLS (DoT). Instead of sending DNS queries and responses in the clear, DoH and DoT establish encrypted connections between users and resolvers. By doing so, these protocols provide privacy and security guarantees that traditional DNS (Do53) lacks. In this paper, we measure the effect of Do53, DoT, and DoH on query response times and page load times from five global vantage points. We find that although DoH and DoT response times are generally higher than Do53, both protocols can perform better than Do53 in terms of page load times. However, as throughput decreases and substantial packet loss and latency are introduced, web pages load fastest with Do53. Additionally, web pages load successfully more often with Do53 and DoT than DoH. Based on these results, we provide several recommendations to improve DNS performance, such as opportunistic partial responses and wire format caching.

Journal ArticleDOI
TL;DR: This survey explores the state-of-the-art citation recommendation models which are explored using the following seven criteria: platform used, data factors/features, data representation methods, methodologies and models, recommendation types, problems addressed, and personalization.
Abstract: Recommender systems have been used since the beginning of the Web to assist users with personalized suggestions related to past preferences for items or products including books, movies, images, research papers and web pages. The availability of millions research articles on various digital libraries makes it difficult for a researcher to find relevant articles to his/er research. During the last years, a lot of research have been conducted through models and algorithms that personalize papers recommendations. With this survey, we explore the state-of-the-art citation recommendation models which we categorize using the following seven criteria: platform used, data factors/features, data representation methods, methodologies and models, recommendation types, problems addressed, and personalization. In addition, we present a novel k-partite graph-based taxonomy that examines the relationships among surveyed algorithms and corresponding k-partite graphs used. Moreover, we present (a) domain’s popular issues, (b) adopted metrics, and (c) commonly used datasets. Finally, we provide some research trends and future directions.

Proceedings ArticleDOI
20 Apr 2020
TL;DR: JSCleaner, a JavaScript de-cluttering engine that aims at simplifying webpages without compromising their content or functionality, relies on a rule-based classification algorithm that classifies JS into three main categories: non-critical, replaceable, and critical.
Abstract: A significant fraction of the World Wide Web suffers from the excessive usage of JavaScript (JS). Based on an analysis of popular webpages, we observed that a considerable number of JS elements utilized by these pages are not essential for their visual and functional features. In this paper, we propose JSCleaner, a JavaScript de-cluttering engine that aims at simplifying webpages without compromising their content or functionality. JSCleaner relies on a rule-based classification algorithm that classifies JS into three main categories: non-critical, replaceable, and critical. JSCleaner removes non-critical JS from a webpage, translates replaceable JS elements with their HTML outcomes, and preserves critical JS. Our quantitative evaluation of 500 popular webpages shows that JSCleaner achieves around 30% reduction in page load times coupled with a 50% reduction in the number of requests and the page size. In addition, our qualitative user study of 103 evaluators shows that JSCleaner preserves 95% of the page content similarity, while maintaining nearly 88% of the page functionality (the remaining 12% did not have a major impact on the user browsing experience).

Proceedings ArticleDOI
30 Oct 2020
TL;DR: Slimium is presented, a debloating framework for a browser (i.e., Chromium) that harnesses a hybrid approach for a fast and reliable binary instrumentation that helps in focusing on security-oriented features andreasonably addressing a non-deterministic path problem raised from code complexity.
Abstract: Today, a web browser plays a crucial role in offering a broad spectrum of web experiences. The most popular browser, Chromium, has become an extremely complex application to meet ever-increasing user demands, exposing unavoidably large attack vectors due to its large code base. Code debloating attracts attention as a means of reducing such a potential attack surface by eliminating unused code. However, it is very challenging to perform sophisticated code removal without breaking needed functionalities because Chromium operates on a large number of closely connected and complex components, such as a renderer and JavaScript engine. In this paper, we present Slimium, a debloating framework for a browser (i.e., Chromium) that harnesses a hybrid approach for a fast and reliable binary instrumentation. The main idea behind Slimium is to determine a set of features as a debloating unit on top of a hybrid (i.e., static, dynamic, heuristic) analysis, and then leverage feature subsetting to code debloating. It aids in i) focusing on security-oriented features, ii) discarding unneeded code simply without complications, and iii)~reasonably addressing a non-deterministic path problem raised from code complexity. To this end, we generate a feature-code map with a relation vector technique and prompt webpage profiling results. Our experimental results demonstrate the practicality and feasibility of Slimium for 40 popular websites, as on average it removes 94 CVEs (61.4%) by cutting down 23.85 MB code (53.1%) from defined features (21.7% of the whole) in Chromium.