The data you receive will rarely be perfect. Especially with such multi-dimensional collections of information, it is likely that systematic problems or anomalies will be present. There are a number of checks you will have to do and choices you will have to make to deal with the imperfections of large-scale datasets. Data cleaning is a skill that comes with lots of practice. Each dataset you encounter will have different quirks and a different level of “uncleanness.”
For this assignment, narrow down your dataset to the variables that you will likely be using in your analysis. Write a one-page (single-spaced) summary of issues you identify that you will have to address in cleaning your data. Be sure to answer the following questions: