Use Python’s pandas library to extract the data from each website for each year. Record the code under the file “fortune_python_project.py”, using Python Spyder. Create one function to extract the data from 1955 to 2005. Create another function to extract data from 2006 to 2012. The tables from 1955 to 2012 are HTML tables. The tables from 2013 to 2022 are javascript tables. So, I had to download the JSON file for each year from 2013 to 2022. Finally, create a function that can extract the data from each JSON file with the help of the json library. Use the three functions to combine all the relevant data under a single dataframe, stored in the variable “fortune_500_final”.
Transfer the final dataframe to a MySQL database with the schema name “fortune_500” and table name “fortune”, using sqlalchemy and pymysql libraries.
RClean the data in the “fortune_500” schema, using MySQL.
Create a CTE labeled “fortune_ranking” to assign a row number to each row as rank and label the column name as “fortune_rank”.
Select only distinct rows to exclude any duplicate rows.
From the top 10 rankings for each year, standardize each company’s name, using CASE statements. For example, every instance of “General Motors (GM)” or “General Motors Company” will become “General Motors”.
Return all rows where “fortune_rank” is less than or equal to 10.
How has the ranking of each company changed over the past 67 years?
Use Tableau Public to visually represent the data.
The main focus of this visualization will be to create an animated bar chart race, depicting how the top 10 ranking has changed over time by revenue.
Use the excel file, “fortune_500_alternate_data_storage” to help support in making the visualization.