Web Scraping: Introducing Selenium (Python)

Wikipedia logon script

So far I have used Python with the requests library and beautifulsoup library. This had allowed me to GET and POST requests to web servers, and to parse results back in a more friendly way.

Now I’m going to introduce Selenium. Selenium allows Python to interact with webpages by opening a web browser (e.g. FireFox, Google Chrome, Safari) with either the browser window opening on screen or without the browser window (in a mode called headless).

The GitHub for Selenium Python can be found at: https://github.com/baijum/selenium-python

I am using Selenium on my Apple MacBook, and I’m going to use it with the Safari browser. For this to work I have enabled the remote automation options within Safari.

Safari - Allow Remote Automation
Safari – Allow Remote Automation

To enable remote automation, first enable Safari’s Develop menu (Preferences>Advanced>Show Develop menu in menu bar) and then choose “Allow Remote Automation” on the Develop menu.

With remote automation on, Python (or other programs) can call on Safari.

Safari - Allow Remote Session
Safari – Allow Remote Session

 

 

geektechstuff_selenium_wiki_logon
Wikipedia logon script
def wiki_login():
    # import selenium webdriver
    from selenium import webdriver
    import time
    # set browser / browser options
    browser = webdriver.safari.webdriver.WebDriver(quiet=False)
    # get page
    browser.get(“https://en.wikipedia.org/w/index.php?       title=Special:UserLogin&returnto=Main+Page”)
    # web page IDs that handle log in
    username = browser.find_element_by_id(‘wpName1’)
    password = browser.find_element_by_id(‘wpPassword1’)
    login = browser.find_element_by_id(‘wpLoginAttempt’)
    # send details to log in IDs
    username.send_keys(“Geektechstuff”)
    password.send_keys(“WIKI_PASSWORD_HERE”)
    login.click()
    time.sleep(5)

 

I also imported the time library and got the program to wait 5 seconds at the end to make sure that it logs in.

geektechstuff_wiki_logged_in
Successful logon!

 

This post forms part of my learning around web scraping using Python. The previous posts are available at:

Part 1 – https://geektechstuff.com/2019/04/30/web-scraping-interacting-with-web-pages-python/

Part 2 – https://geektechstuff.com/2019/04/16/web-scrapping-part-2-python/

Part 3 – https://geektechstuff.com/2019/04/30/web-scraping-interacting-with-web-pages-python/