Reading Data from PDF File using PDFBox in Selenium Web-Driver

Reading Data from PDF File using PDFBox in Selenium Web-Driver

Lots of Time we have a requirement in our project where we need to read and verify data or part of data from PDF file which is either located in our local machine or in some Web Page.

For achieving this through automation we use Third party API known as Apache-PDFBox which is extensively used in creating/removing/adding/reading Text from PDF file.

To know more Apache-PDFBox, Please click on this link.

To Read the PDF Data from a web page we need to follow below steps:

1)     Download PDF Box JARS/maven dependency from this link.
In this blog I am using below maven dependency.

2)     Below is the code Snapshot to read data from the webpage.

Below is the output of the code execution, here on the left-hand side is my web page and on the other side, the result is output generated on my eclipse console.

Please Note this library will read data along with any space even from images or tables within the PDF.



My Name is Ankur Jain and I am currently working as Automation Test Architect.I am ISTQB Certified Test Manager,Certified UI Path RPA Developer as well as Certified Scrum Master with total 12 years of working experience with lot of big banking clients around the globe.I love to Design Automation Testing Frameworks with Selenium,Appium,Protractor,Cucumber,Rest-Assured, Katalon Studio and currently exploring lot in Dev-OPS as well. I am currently staying in Mumbai, Maharashtra. Please Connect with me through Contact Us page of this website.

Previous Post
Next Post
April 24, 2020 at 8:44 PM

Hi Ankur,

By any chance did you try the same thing in Protractor? like reading the PDF contents that opened in new tab? as there are no URLConnection relevant library I could find in npm.

Vishal Ramteke