WebCDN - A Content Distribution Network from web browsers using WebRTC

 

Abstract

The aim of this project is to evaluate potential resource savings, performance improvements and security concerns of a content distribution network (CDN) from client web browsers. The implementation leverages the WebRTC framework to cache and distribute content between peers without relaying the data through an intermediate web server. The evaluation exploits a cloud computing infrastructure to access websites from automated browser instances located across multiple locations across the globe. Furthermore, a proxy server is used to inject the system’s client component and simulate peer-to-peer content delivery with real websites.

Motivation

The web as a popular mechanism for content distribution and its borderless nature enforces website operators to rely more and more on content delivery networks (CDNs). CDN servers collaboratively exchange data to deliver content as close as possible to the requesting client. Reducing latency - the time it takes for a message to travel from its point of origin to the point of destination is a significant performance enhancement for website delivery times. However, serving content on a large scale worldwide is a significant monetary burden for website operators, too. As a possible solution, operators could use peer-to-peer communication capabilities to distribute the costs and required bandwidth of serving content, extend their existing distribution network with additional client nodes and reduce latency times by storing content potentially closer to the requesting user. The idea of coupling CDNs and P2P networks is well known in literature. Several works have addressed techniques and architectures for finding ways to peer individual CDNs and interconnect content networks for better overall performance [1]. However, the systems are still mainly operated by the CDN provider. Recent approaches have worked towards spreading the distribution costs and required bandwidth among the actual content consumers [2,3,4]. This content distribution architecture is particularly interesting for temporary, high traffic scenarios, for instance caused by large content-curation websites like reddit. A peer-to-peer CDN would be capable of exploiting the sudden load peak by transforming the numerous website visitors into additional content providers to help the operator distribute content.

Goal

The goal of the project is to implement a content distribution network (CDN) from client browsers via WebRTC and to evaluate the system’s potential resource savings and performance improvements within a cloud-based testing environment

Methodology and Timeline

● Implement a WebRTC-based CDN from client browsers similar to that of Maygh[2].

● Setting up an emulation system based on geographically distributed virtual machines running various web browsers and a proxy server to test the system with a set of real-world, image-heavy websites.

● Investigate the system’s impact on website users and operators around three aspects: performance, resource savings and security.

 Timeline

June 2015

 WebCDN system implementation

July

 Evaluation system implementation

August

 Simulation 

September

 Data analysis and evaluation

October

 Final report

Solution Idea

The system implementation follows a centralized directory model: A Client wants to load a static resource: Either it has already stored the resource in its own cache or it issues a request to the coordinator. The service either serves the requested content or a list of clients which already cache this specific item. In the latter case, the client executes a sideload request to one of the clients in the list. The client receives the content and an optional caching directive from the coordinator, including a list of clients which should cache the content for future requests. The implementation establishes a peer-to-peer communication channel between web browsers via WebRTC and uses declarative HTML markup of website resources for P2P sharing [Figure 2 from [2]].

Evaluation

The research evaluation uses the following key metrics: Cache hit ratio the number of cached documents served by participating clients versus total documents requested; Bandwidth used by the website operator; Latency client perceived response time; Client utilization used network overhead, storage capacity and number of requests served.

The infrastructure that is used to execute our tests is composed of various virtual machines running on the Amazon Elastic Cloud Compute Cloud (EC2) . The instances are hosted in multiple locations worldwide and consist of regions and availability zones. A collection of machines distributed over the globe enables us to deploy the system to an environment which closely reflects the actual Internet environment. To simulate our testing scenarios we make use of the Selenium browser automation framework. The idea is to use real browser environments and interact with them through a custom test script. In order to facilitate the testing of our system on different environments, we make use of the Selenium Grid infrastructure consisting of a hub and several nodes. The hub assigns a node with the target platform to the test script. Hosts participating in the simulation spawn multiple browser instances each representing a peer. Each peer connects to an emulation mediator component that centralizes the log information and stores messages received from the participating peers in a NoSQL database for later analysis. We rely on a proxy server standing between the peer and the web server hosting the website to test. The proxy intercepts the response and injects our client script into the page to simulate peer-to-peer content delivery with a set of realworld, imageheavy websites.

Contact

  • Patrick Michelberger, patrick.michelberger(at)tum.de

References

  1. F Bronzino, R Gaeta, M Grangetto, Boelter Hall, and Los Angeles. An adaptive hybrid CDN / P2P solution for content delivery networks. Visual Communications and Image Processing (VCIP), IEEE, 2012.
  2. A Mislove L. Zhang F. Zhou and R Sundaram. Maygh: Building a CDN from client web browsers. 2013.
  3. Je Terrace, Harold Laidlaw, Hao Eric Liu, Sean Stern, and Michael J Freedman. Bringing P2P to the Web: Security and Privacy in the Firecoral Network. IPTPS Proceedings of the 8th International Conference on Peer-to-Peer Systems, pages 16, 2009.
  4. Manal El Dick, Esther Pacitti, and Bettina Kemme. Flower-CDN: a hybrid P2P overlay for ecient query processing in CDN. Proceedings of the 12th International, pages 427-438, 2009.