How do you download a PDF file when the URL opens up a PDF in your Chrome browser in Python without needing to print the page or use special key presses? And how can you set the location of the PDF?
The trick to be able to download a PDF file using Selenium without the Chrome browser opening the PDF file within the browser window is to set the preferences of the browser to simply not open PDF’s automatically.
Once you’ve disabled the ability of Chrome to display and open the PDF file within its browser the last setting you would need to change is the location for where you want the file to be stored.
Here are the essential items you need to configure your Selenium instance and Chrome browser to do the same.
Set Chrome Browser Preferences
The first thing you need to do is to change the default behavior of your Chrome browser so that it doesn’t automatically open the downloaded PDF in the browser window.
To enable editing the Chrome browser’s preferences require the import of the
ChromeOptions
class:
from selenium.webdriver import Chrome, ChromeOptions
With this import statement I am importing both the
Chrome
driver (the browser) and
ChromeOptions
(
Options
class).
Next is to be able to create a new instance of the
ChromeOptions
class followed by creating a dictionary of the variables that need to be changed within:
from selenium.webdriver import Chrome, ChromeOptions
options = ChromeOptions()
chrome_prefs = {
"download.prompt_for_download": False,
"plugins.always_open_pdf_externally": True,
"download.open_pdf_in_system_reader": False,
"profile.default_content_settings.popups": 0,
}
options.add_experimental_option("prefs", chrome_prefs)
As you can see from the above code the
options
variable creates a new instance of the
ChromeOptions()
class. Next I create a dictionary of
key: value
pairs which will help to define the behavior of how to handle PDF’s should a link open a file up.
The last line then sets these properties using the
.add_experimental_options()
method.
Set Folder Location Of Downloaded PDF Files
If you want to set the location where the PDF files will be downloaded you will need to add one more property to the
chrome_prefs
variable, as seen here:
from selenium.webdriver import Chrome, ChromeOptions
# import os module...
import os
# set location using os.path.join or set it manually if needed...
path_loc = os.path.join(os.getcwd(), "temp")
options = ChromeOptions()
chrome_prefs = {
"download.prompt_for_download": False,
"plugins.always_open_pdf_externally": True,
"download.open_pdf_in_system_reader": False,
"profile.default_content_settings.popups": 0,
# add location preference...
"download.default_directory": path_loc
}
options.add_experimental_option("prefs", chrome_prefs)
In the above code I’ve imported the
os
module and added a new variable
path_loc
to set where the downloaded PDF’s should be stored. If you are going to modify the name of the PDF file downloaded then I would highly encourage sending the PDF’s to a
temporary
folder, like
temp
in my above code so that you can perform all the required changes and move the file later.
Once you’ve set the location of where the downloaded files will go it’s now just a matter of creating a new driver.
from selenium.webdriver import Chrome, ChromeOptions
import os
path_loc = os.path.join(os.getcwd(), "temp")
options = ChromeOptions()
chrome_prefs = {
"download.prompt_for_download": False,
"plugins.always_open_pdf_externally": True,
"download.open_pdf_in_system_reader": False,
"profile.default_content_settings.popups": 0,
"download.default_directory": path_loc
}
options.add_experimental_option("prefs", chrome_prefs)
# create new driver
driver = Chrome(service=Service(ChromeDriverManager().install()), options=options)
The last line in the above code creates a new Chrome browser instance and the service parameter uses the Webdriver Manager to help provide the needed drivers automatically without having to install them every time a new Chrome browser is launched.
The last
options
parameter is set by the
options
variable according to the established settings created.
To test if you’ve structured everything properly you only need to fetch a URL that would open up a PDF file in your browser window and append the following to your code:
driver.get("YOUR-PDF-URL")
When you run your script you should notice a PDF land in the directory you have set with the
path_loc
variable.
If you are planning on moving the file to a different location, you may want to look at how to move files around and how to increment the file names .
Summary
To be able to download PDF files without having to send through print or special key presses in your Selenium Chrome instance you can simply change the preferences of the driver (browser) so that it automatically downloads the PDF files rather than open it in the browser window.
An example code that can enable you to do this is below:
from selenium.webdriver import Chrome, ChromeOptions
import os
path_loc = os.path.join(os.getcwd(), "temp")
options = ChromeOptions()
chrome_prefs = {
"download.prompt_for_download": False,
"plugins.always_open_pdf_externally": True,
"download.open_pdf_in_system_reader": False,
"profile.default_content_settings.popups": 0,
"download.default_directory": path_loc
}
options.add_experimental_option("prefs", chrome_prefs)
driver = Chrome(service=Service(ChromeDriverManager().install()), options=options)
# test by inserting a URL you know that will open up a PDF file
driver.get("https://YOUR-PDF-URL")