PDF.js Fetch Stream

Mukul Mishra bio photo By Mukul Mishra

Hello !!

This is the sixth post in the series of my Google Summer of Code 2017 experience. In last post, I gave overview of implementing of node_stream and how it is going to help to stream to stream PDF data in node.js environment. In this post, I am going to give updates of my project Streams API in PDF.js.

In this post I am going to talk about the implementation details of fetch_stream and why we need it. PDF.js uses XHR to request PDF data(both from remote and local system), but response of XHR is not streamable. This resists streaming data small chunks of data in netwotking part of PDF.js. Although we implemented node_stream, and it’s response is streamable, but we can only use this in node.js environment.

This is solvable using Fetch API, or even better substitute for XHR. fetch_stream implements IPDFStream to request/read PDF data using fetch api. As fetch response is streamable, we can read in small chunks and control the flow of internal buffer. PDFFetchStream implements stream reader to read PDF data using fetch api internally.

Comparison between XHR and Fetch API:

If we compare XHR and Fetch, you will see a huge difference in terms of API design. Fetch give a set a clean api that are easy to use as compare to XHR. Here is a small example to request json using these two technologies:

var xhr = new XMLHttpRequest();
xhr.open('GET', url);
xhr.responseType = 'json';

xhr.onload = () => {
  // Do something when request completes successfully.

xhr.onerror = () => {
  // Do somethig when request fails.

fetch(url).then((response) => {
  // Do something with retrieved data.
  // `body` property of response returns `ReadableStream`.
  let readableStream = response.body;
  let reader = readableStream.getReader();
}).catch((error) => {
  // Do something when error occured.

For full disussion and code, please see fetch stream logic for networking task of PDF.js.