Lots of Time we have a requirement in our project where we need to read and verify data or part of data from PDF file which is either located in our local machine or in some Web Page.
For achieving
this through automation we use Third party API known as Apache-PDFBox which is
extensively used in creating/removing/adding/reading Text from PDF file.
To know more
Apache-PDFBox, Please click on this link.
To Read the PDF Data from a web page we need to follow below steps:
1) Download PDF Box JARS/maven
dependency from this link.
In this blog
I am using below maven dependency.
2) Below is the code Snapshot to read
data from the webpage.
Below is the output of
the code execution, here on the left-hand side is my web page and on the other side, the result is output generated on my eclipse console.
Please Note this library will read data along with any space even from images or tables within the PDF.
Hi Ankur,
ReplyBy any chance did you try the same thing in Protractor? like reading the PDF contents that opened in new tab? as there are no URLConnection relevant library I could find in npm.
Thanks,
Vishal Ramteke