*This notebook is part of a [three-part series](https://lawyerist.com/124089/hello-world-attorneys-learn-code/) on learning to code. If you haven't already, start by reading the accompanying [blog posts](https://lawyerist.com/?p=127093). This notebook is meant to supplement post number three--[Build a Bot; Banish FOMO](https://lawyerist.com/?p=127093).*

## Install libraries

If you haven't already, you may need to install some dependencies. On the command line, run the following to install/update gspread, oauth2client, PyOpenSSL, and python-twitter.
```
pip install gspread
pip install --upgrade oauth2client
pip install PyOpenSSL
pip install python-twitter
```
Library installs are one and done. So after doing this once, you should be all set. 

## Import modules and set variables

Now we're getting into the bot's code. This is what will run every time your bot is called. To make sure it behaves as expected, replace the placeholder values found below in the `document_key`, `credentials`, `consumer_key`, `consumer_secret`, `access_token_key`, and `access_token_secret` variables with relevant values (e.g., your access credentials). Once you've done that, run the code in the following cell. If everything works, you shouldn't see any errors.

Some of this code should look familiar as you set up Google credentials in the last homework assignment. In fact, when filling in your Google credentials, you should be using the .json file that you used last time. You will, however, need to create a new Google Sheet (same instructions as [last time](https://lawyerist.com/126074/online-forms-meet-local-document-automation-cut-and-paste-coding/)). You **MUST** add a first row with headings. If you don't, the below code won't work. In this example, just make four columns filled with zeros. Also, delete rows 2-999. This is because the code below appends values to the end of your sheet. So if you fail to remove rows 2-999, values will be appended to row 1000. Additionally, it looks at the last row of the sheet for your old values. So if you fail to delete 2-999, instead of seeing your row of zeros, it will look at the blank row 999.

As for a Twitter account and Twitter credentials, follow the instruction in [this post](https://lawyerist.com/?p=127093). 

*NOTE: You should be reading all of the comments (i.e., text following a #)*

In [None]:
# Load the module for visiting and reading websites.
import urllib.request
# Load the module for running regular expressions (regex).
import re 
# Load the module for date and time stuff.
import datetime
# Define the variable now as equal to the current date and time.
now = datetime.datetime.now()

# Load the module for accessing Google Sheets.
import gspread
# Load the module needed for securely communicating with Google Sheets.
from oauth2client.service_account import ServiceAccountCredentials
# The scope for your access credentials
scope = ['https://spreadsheets.google.com/feeds']

# Your spreadsheet's ID
document_key = "[YOUR DOCUMENT ID/KEY]"
# Your Google project's .json key
credentials = ServiceAccountCredentials.from_json_keyfile_name('[LOCATION OF JSON FILE]', scope)

# Use your credentials to authorize yourself.
gc = gspread.authorize(credentials)
# Open up the Sheet with the defined ID.
wks = gc.open_by_key(document_key)

#########################################
#
#  NOTE: The name of the sheet you are 
#  trying to access should be in the 
#  parenthetical below (e.g., Data). By
#  Default this is probably "Sheet1".
#
#########################################
worksheet = wks.worksheet("Sheet1")

# Count the number of rows in your Sheet &
# resize to remove blank rows.
worksheet.resize(worksheet.row_count)

# Import the relevant Twitter libraries so you can use Twitter.
import twitter
from twitter import TwitterError

# Set you Twitter API credentials.
api = twitter.Api(consumer_key='[YOUR KEY]',
                  consumer_secret='[YOUR SECRET]',
                  access_token_key='[TOKEN KEY]',
                  access_token_secret='[TOKEN SECRET]')

# Set the URLs you want to scrape.
url_1 = "http://election.princeton.edu/2012/09/29/the-short-term-presidential-predictor/"
url_2 = "https://www.betfairpredicts.com/politics"

print ("So far, so good.")

## Read the contents of your first webpage

When you run the next cell, your program will visit the first URL you defined above. It will then print out that page's HTML. In this example, we'll be looking at the Princeton Election Consortium's [short term predictor](http://election.princeton.edu/2012/09/29/the-short-term-presidential-predictor/).  

In [None]:
p_1 = urllib.request.build_opener(urllib.request.HTTPCookieProcessor).open(url_1).read()
print(p_1)

## Parse the site's contents

Scan the above HTML for the content you are trying to extract. In this case, I'm looking for the text following "Bayesian " and before "%". Cut and paste the HTML above into the TEST STRING box over at [Regex 101](https://regex101.com/) and craft a regex that captures your desired content. I like to include a little of the HTML as well. So I went with: 

`/the-short-term-presidential-predictor/">Bayesian\s*(.*)\%`

Remember the parenthetical is the group your pulling out. Once you have a working regex, plug it into the code below, and run the cell. If it worked, you'll see you scraped data as an output. In this example, it should be a number representing the current chance Clinton will win the 2016 election. 

In [None]:
res_1 = re.search(b'/the-short-term-presidential-predictor/">Bayesian\s*(.*)\%',p_1)
print(res_1.group(1).decode('UTF-8'))

## Read the contents of your second webpage

Same deal as above, but now we're looking at your second URL. 

In [None]:
p_2 = urllib.request.build_opener(urllib.request.HTTPCookieProcessor).open(url_2).read()
print(p_2)

## Parse the site's contents

Again, the same as above, but with a new regex on a new page.

In [None]:
res_2 = re.search(b'107373419", "percent_value": (\d*\.\d*), "slug": "hillary',p_2)  
print(res_2.group(1).decode('UTF-8'))

## Average the values and Tweet out updates

Now we're going to take the values you found above and do something with them. In this case, we want to average them. If all you want to do is Tweet out the above values, you can delete that bit. The newest thing you'll be seeing in this code is the If statement. In Python, if you type `if [some evaluation]:` then the code directly below that statement and indented once will run only if that evaluation is true. For example:

In [None]:
# The If statment below says: If the variables res_1 and res_2 actually exist, do what follows.
if res_1 and res_2: 
    # Make sure res_1 is in a format we can read (that's the "decode" part), make sure its treated as a number  
    # (that's the "float" part). Round the number (that's the "round" part) and set the new variable 
    # output_1 equal to the rounded number.
    output_1 = round(float(res_1.group(1).decode('UTF-8')))
    # Do the same thing as above but for res_2
    output_2 = round(float(res_2.group(1).decode('UTF-8')))
    # Average output_1 and Output_2. Then store the value in the variable named "average."
    average = round((float(output_1) + float(output_2))/2)
    
    # Print out the old values stored in your sheet and the new values pulled from your pages
    print("%s, %s, %s | %s"%(worksheet.row_values(worksheet.row_count)[1],worksheet.row_values(worksheet.row_count)[2],worksheet.row_values(worksheet.row_count)[3],worksheet.row_values(worksheet.row_count)[2]))
    print("%s, %s, %s | %s"%(average,output_1,output_2,now))

Note: The first time you run the above code, the first row will be empty as nothing has yet to be stored in your sheet.

## Post to Twitter and Save to Google

In [None]:
if res_1 and res_2: 
    # Again, the above tells the program to continue with what follows only if res_1 and res_2 exist
    
    if (float(worksheet.row_values(worksheet.row_count)[1]) != output_1) or (float(worksheet.row_values(worksheet.row_count)[2]) != output_2):
        # The above If statment, says to continue only of the old sheet vales and 
        # the new pulled values are not equal (!=) to eachother. 
        
        if (float(worksheet.row_values(worksheet.row_count)[1]) != output_2):
            # The above If statment says to continue only if the first value is 
            # different from the last version stored in the sheet.
            
            if float(worksheet.row_values(worksheet.row_count)[1]) > output_1:
                # If the old value is bigger than the new value,
                # set direction equal to "down."
                direction = "down"
            else:
                # If the old value is smaller than the new value,
                # set direction equal to "up."
                direction = "up"
                
            # Go ahead and tweet out the update. Here you need to know about a Twitter API limitation.
            # Twitter will not Tweet the same tweet a second time if it is too close to the first instance.
            # In such cases, it will throw an error. The `try:` and `except TwitterError:` constructions are
            # similar to If statements. However, they will try the first block of code first, and only try 
            # the second block if it runs into a Twitter error. Here, the second try tweaks the language
            # just enough that it isn't a duplicate Tweet.
            try:
                # Post to Twitter including the old and new values and a statment about the direction of change.
                status = api.PostUpdate('.@Princeton Election Consortium puts Clinton\'s chance of winning at %s%% (%s from %s%%). %s'%(output_1,direction,worksheet.row_values(worksheet.row_count)[1],url_1))
                print(status.text)
            except TwitterError:
                # Post to Twitter including the old and new values and a statment about the direction of change.
                status = api.PostUpdate('.@Princeton Election Consortium pegs Clinton\'s chance of winning at %s%% (%s from %s%%). %s'%(output_3,direction,worksheet.row_values(worksheet.row_count)[3],url_3))
                print(status.text)

        # What follows is effctivly the above but for the second value.         
        if ((float(worksheet.row_values(worksheet.row_count)[2]) != output_2)):

            if float(worksheet.row_values(worksheet.row_count)[2]) > output_2: 
                direction = "down"
            else:
                direction = "up"
            try:
                status = api.PostUpdate('.@BetfairUSA puts Clinton\'s chance of winning at %s%% (%s from %s%%). %s'%(output_2,direction,worksheet.row_values(worksheet.row_count)[2],url_2))
                print(status.text)
            except TwitterError:
                status = api.PostUpdate('.@BetfairUSA pegs Clinton\'s chance of winning at %s%% (%s from %s%%). %s'%(output_2,direction,worksheet.row_values(worksheet.row_count)[2],url_2))
                print(status.text)

        # NOTE: BetfairUSA moves around a lot and in so doing it tends to repeat itself. 
        # So in the actual code behind @meanvoter, I had to add a few more nested try and except blocks
                
        # Below we introduce an If statment with an Else. 
        # If the old average and new average are not the same the code indented below the "if" will run.
        # If that isn't true (i.e., if they are the same), the code below the "else" will run.
        if (float(worksheet.row_values(worksheet.row_count)[3]) != average):
            
            # Again we figure out the direction of change for inclusion in our Tweet.
            if float(worksheet.row_values(worksheet.row_count)[3]) > average: 
                direction = "down"
            else:
                direction = "up"
            try:
                status = api.PostUpdate('The 6-forecast average puts Clinton\'s chance of winning at %s%% (%s from %s%%). http://bit.ly/meanvoter'%(average,direction,worksheet.row_values(worksheet.row_count)[3]))
                print(status.text)
            except TwitterError:
                status = api.PostUpdate('The 6-forecast average pegs Clinton\'s chance of winning at %s%% (%s from %s%%). http://bit.ly/meanvoter'%(average,direction,worksheet.row_values(worksheet.row_count)[3]))
                print(status.text)
        else:
            try:
                status = api.PostUpdate('The 6-forecast average still puts Clinton\'s chance of winning at %s%%. http://bit.ly/meanvoter'%(average))
                print(status.text)
            except TwitterError:
                status = api.PostUpdate('The 6-forecast average still places Clinton\'s chance of winning at %s%%. http://bit.ly/meanvoter'%(average))
                print(status.text)

        # NOTE: In the actual code behind @meanvoter, I had to add a few more nested try and except blocks
        # for the average tweet as well.
                
        worksheet.append_row([now,output_1,output_2,average])

If you're looking just to tweet out single bits of info, just strip away all the excess, cutting lines and running the code until you get it to work. 

Good luck!