Web Scraping: Introducing Selenium (Python)

So far I have used Python with the requests library and beautifulsoup library. This had allowed me to GET and POST requests to web servers, and to parse results back in a more friendly way.

Now I’m going to introduce Selenium. Selenium allows Python to interact with webpages by opening a web browser (e.g. FireFox, Google Chrome, Safari) with either the browser window opening on screen or without the browser window (in a mode called headless).

The GitHub for Selenium Python can be found at: https://github.com/baijum/selenium-python

I am using Selenium on my Apple MacBook, and I’m going to use it with the Safari browser. For this to work I have enabled the remote automation options within Safari.

Safari - Allow Remote Automation
Safari – Allow Remote Automation

To enable remote automation, first enable Safari’s Develop menu (Preferences>Advanced>Show Develop menu in menu bar) and then choose “Allow Remote Automation” on the Develop menu.

With remote automation on, Python (or other programs) can call on Safari.

Safari - Allow Remote Session
Safari – Allow Remote Session

 

 

geektechstuff_selenium_wiki_logon
Wikipedia logon script
def wiki_login():
    # import selenium webdriver
    from selenium import webdriver
    import time
    # set browser / browser options
    browser = webdriver.safari.webdriver.WebDriver(quiet=False)
    # get page
    browser.get(“https://en.wikipedia.org/w/index.php?       title=Special:UserLogin&returnto=Main+Page”)
    # web page IDs that handle log in
    username = browser.find_element_by_id(‘wpName1’)
    password = browser.find_element_by_id(‘wpPassword1’)
    login = browser.find_element_by_id(‘wpLoginAttempt’)
    # send details to log in IDs
    username.send_keys(“Geektechstuff”)
    password.send_keys(“WIKI_PASSWORD_HERE”)
    login.click()
    time.sleep(5)

 

I also imported the time library and got the program to wait 5 seconds at the end to make sure that it logs in.

geektechstuff_wiki_logged_in
Successful logon!

 

This post forms part of my learning around web scraping using Python. The previous posts are available at:

Part 1 – https://geektechstuff.com/2019/04/30/web-scraping-interacting-with-web-pages-python/

Part 2 – https://geektechstuff.com/2019/04/16/web-scrapping-part-2-python/

Part 3 – https://geektechstuff.com/2019/04/30/web-scraping-interacting-with-web-pages-python/

Welcome to GeekTechStuff

my home away from home and where I will be sharing my adventures in the world of technology and all things geek.

The technology subjects have varied over the years from Python code to handle ciphers and Pig Latin, to IoT sensors in Azure and Python handling Bluetooth, to Ansible and Terraform and material around DevOps.

Let’s connect