GeekTechStuff

Web Scraping: Introducing Selenium (Python)

Published by

Geek_Dude

on

Web Scraping: Introducing Selenium (Python)

So far I have used Python with the requests library and beautifulsoup library. This had allowed me to GET and POST requests to web servers, and to parse results back in a more friendly way.

Now I’m going to introduce Selenium. Selenium allows Python to interact with webpages by opening a web browser (e.g. FireFox, Google Chrome, Safari) with either the browser window opening on screen or without the browser window (in a mode called headless).

The GitHub for Selenium Python can be found at: https://github.com/baijum/selenium-python

I am using Selenium on my Apple MacBook, and I’m going to use it with the Safari browser. For this to work I have enabled the remote automation options within Safari.

Safari - Allow Remote Automation — Safari – Allow Remote Automation

To enable remote automation, first enable Safari’s Develop menu (Preferences>Advanced>Show Develop menu in menu bar) and then choose “Allow Remote Automation” on the Develop menu.

With remote automation on, Python (or other programs) can call on Safari.

Safari - Allow Remote Session — Safari – Allow Remote Session

geektechstuff_selenium_wiki_logon — Wikipedia logon script

def wiki_login():

# import selenium webdriver

from selenium import webdriver

import time

# set browser / browser options

browser = webdriver.safari.webdriver.WebDriver(quiet=False)

# get page

browser.get(“https://en.wikipedia.org/w/index.php? title=Special:UserLogin&returnto=Main+Page”)

# web page IDs that handle log in

username = browser.find_element_by_id(‘wpName1’)

password = browser.find_element_by_id(‘wpPassword1’)

login = browser.find_element_by_id(‘wpLoginAttempt’)

# send details to log in IDs

username.send_keys(“Geektechstuff”)

password.send_keys(“WIKI_PASSWORD_HERE”)

login.click()

time.sleep(5)

I also imported the time library and got the program to wait 5 seconds at the end to make sure that it logs in.

geektechstuff_wiki_logged_in — Successful logon!

This post forms part of my learning around web scraping using Python. The previous posts are available at:

Part 1 – https://geektechstuff.com/2019/04/30/web-scraping-interacting-with-web-pages-python/

Part 2 – https://geektechstuff.com/2019/04/16/web-scrapping-part-2-python/

Part 3 – https://geektechstuff.com/2019/04/30/web-scraping-interacting-with-web-pages-python/

Hello,

Welcome to GeekTechStuff

My home away from home and where I will be sharing my adventures in the world of technology and all things geek.

The technology subjects have varied over the years from Python code to handle ciphers and Pig Latin, to IoT sensors in Azure and Python handling Bluetooth, to Ansible and Terraform and material around DevOps.

Let’s connect

Join the fun!

Stay updated with the sites latest blog posts by joining the newsletter.

Recent posts