Quick question
I'm messing with python 2.7, and attempted to run a program to scape a site.
@ line 46 (code below) It stated I needed to create an array, which I think is correct.
Yet I keep getting the following error.
And suggestions?
I'm messing with python 2.7, and attempted to run a program to scape a site.
@ line 46 (code below) It stated I needed to create an array, which I think is correct.
Yet I keep getting the following error.
Code:
Traceback (most recent call last):
File "./bloom1.py", line 51, in <module>
for pg in quote_page:
NameError: name 'quote_page' is not defined
And suggestions?
Code:
# !/usr/local/bin/python
# python 2
#
# Source code
#
# craw a website (http://www.bloomberg.com/quote/SPX:IND.com), list all url under a specific given path
# To use type in "python" and name of file, then run.
#import libraries
from bs4 import BeautifulSoup
import requests
import urllib2
bloombergFile = urllib2.urlopen('http://www.bloomberg.com/quote/CCMP:IND','http://www.bloomberg.com/quote/SPX:IND')
bloombergHtml = bloombergFile.read()
bloombergFile.close()
# parse the html using beautiful soap and store in variable `soup`
soup = BeautifulSoup(bloombergHtml)
bloombergAll = soup.find_all("a")
for links in soup.find_all('a'):
print (links.get('href'))
# query the website and return the html to the variable 'page'
page = urllib2.urlopen('http://www.bloomberg.com/quote/CCMP:IND','http://www.bloomberg.com/quote/SPX:IND')
# Take out the <div> of name and get its value
name_box = soup.find('h1', attrs={'class': 'name'})
# strip() is used to remove starting and trailing
name = name_box.text.strip()
print name
# get the index price
price_box = soup.find('div', attrs={'class':'price'})
price = price_box.text
print price
# Python csv module and the datetime module
import csv
from datetime import datetime
# Extracting multiple indices at the same time. NOT WORKING
bloombergHtml = ['http://www.bloomberg.com/quote/CCMP:IND','http://www.bloomberg.com/quote/SPX:IND']
# for loop
data = []
for pg in quote_page:
# query the website and return the html to the variable 'page'
page = urllib2.urlopen('http://www.bloomberg.com/quote/CCMP:IND','http://www.bloomberg.com/quote/SPX:IND')
# parse the html using beautiful soap and store in variable `soup`
soup = BeautifulSoup(bloombergHtml, 'html.parser')
# Take out the <div> of name and get its value
name_box = soup.find('h1', attrs={'class': 'name'})
name = name_box.text.strip() # strip() is used to remove starting and trailing
# get the index price
price_box = soup.find('div', attrs={'class':'price'})
price = price_box.text
# save the data in tuple
data.append((name, price))
# open a csv file with append, so old data will not be erased
with open('index.csv', 'a') as csv_file:
writer = csv.writer(csv_file)
# The for loop
for name, price in data:
writer.writerow([name, price, datetime.now()])