
This will be needed later to get the audio features for each track from the API. I also extracted the Track ID from each song’s URL and stored it in a separate column.


The data scraped from the charts does not contain fields for the date or region, so I needed to add them during processing. streams = pd.DataFrame() while start_date < end_date: url = '' % start_date.strftime('%Y-%m-%d') result = requests.get(url) if result.status_code = 200: fileobj = cStringIO.StringIO(('utf8', 'ignore')) try: df = pd.read_csv(fileobj, header=1) except Exception as e: pass df = start_date df = 'Global' df = df.str streams = streams.append(df) start_date += datetime.timedelta(1) else: print result.status_code print result.text raise Exception('Response code not 200') Instead, I used Python to write a script which would scrape each day’s chart, process the file, and join the data together to create a single CSV file for analysis. Although Spotify Charts provides a handy “download to CSV” link at the top of the page, manually downloading each day for the full year is too time-consuming, and will also result in 365 files to be imported for analysis.
