Cleaning, exploring, and more cleaning
Part 2 of your data visualization project
Last updated May 2021
Building on Part 1, Part 2 incorporates working with the data used in the visualization project. When you work with data, the objective does not change. It is still a problem to solve, a question to answer, or a hypothesis to test.
How does focus affect cleaning and exploring data?
It is very important to understand the data and how the different fields are interrelated. However, if a field of the data is not relevant to the project, or is outside the scope of the project, that data should not continue to be included in the cleaning and exploration process.
Consider the research question that is the focus of this visualization project.11 In this translation, I used Vermont. However, the state that you will use is based on the first first name listed in Blackboard. Using the first letter of that name, you can find the state you will use.
A-D Ohio
E-J Illinois
K-N Georgia
O-S Arizona
T-V Washington (state, not D.C.)
W-Z New Jersey
Using the wrong state will cause a loss of no less than 20% of the possible points.
“Is there a relationship between a consumers’ local area population, the consumers’ local median household income, and the days of delay between receiving and forwarding consumer complaints from Vermont in 2020?”
And consider the data dictionary
Table 1
The Data Dictionary
FieldsDescriptiondate_receivedThe date the CFPB received the complaintproductThe type of product the consumer identified in the complaintissueThe issue the consumer identified in the complaintcompanyThe complaint is about this companystateThe state the consumer resides inzip_codeThe mailing ZIP code provided by the consumersubmitted_viaHow the complaint was submitted to the CFPBdate_sent_to_companyThe date the CFPB sent the complaint to the companycompany_response_to_consumerThe response from the company about this complainttimely_responseIndicates whether the company gave a timely response or notcomplaint_idThe unique identification number for a complaintdelayThe number of days between the date received by the CFPB and the date the complaint was submitted to the companypopulationPopulation based on the zip code of the consumermedian_household_incomeThe median household income based on the zip code of the consumer
Note. This is the data dictionary based on the data adapted from CFPB (n.d.)22 Consumer Financial Protection Bureau. (n.d.) Consumer complaint database API docs [data set and code book]. Office of Civil Rights. Retrieved April 28, 2021, from https://cfpb.github.io/api/ccdb/index.html and Rozzi (2021).33 Rozzi, G. (2021). Data & functions for working with US zip codes. GitHub. https://github.com/gavinrozzi/zipcodeR/
Click here to download the data. (complaints.RData)
Knowing the scope of this analysis, what purpose does the field zip_code serve? Could this field be pertinent to providing the proper context or the whole picture? Probably not. Potentially this field could be used to validate the cleanliness of the data in the state field, however, other than that this field is not within the scope of this project. Therefore the data in this field is not something that is necessary to clean or manipulate in the exploration of the data.
Investigate the data sample within the scope of this project. Ensure the data is clean by checking for abnormal entries, incorrect field types, and other inconsistencies.
Before executing the necessary actions in an R script file, think about the plan that you prepared in Part 1. As you work with the data, you will be able to identify things that need to change in your original plan.
Update the plan
For this week’s objective, you will update the plan and add information about the status of the project, after you work with the data.
Consider all of the questions that you did when you made the original plan. What needs to change? What can be clarify better? Don’t write about what is different, only write about the current state of the project. If I want to know what is different, I can reread your Part 1 submission.
Formulating the brief answers these questions
Why is this interesting or important? What about it is important?
What requires clarification?
What pitfalls could cause the analysis to be incomplete or incorrect?
Who is the audience? What do you think the audience expects?
How much time do you have to complete the project?
What are the project conditions?
What tools do you have access to?
Or, as is the case in this course, what are you limited to, regarding software?
Can the evidence be summarized in one visualization? Two? Several?
Will the results of this analysis be an exhibit (evidence), an explanation (presentation), or an exploration (audience interaction)?
Working with data
When you work with the data, whether cleaning, investigating, or exploring, ask yourself questions as you progress through the process. These questions may include
Are the right data types assigned?
How many observations are associated with the state I’m assigned?
How do I filter for the complaints specific to the analysis I’m assigned?
Which fields apply to this analysis?
What is the range of the median household income, the population, and the delay between receiving and forwarding customer complaints?
Should no or zero delay observations be separated? (Is management more interested in the overall or what or how the response times can be improved?)
Are there fields of data that would add to the data story that are outside the scope? Does the scope need to be modified? Perhaps you identified that a specific product or company was associated with all of the delays exceeding 50 days. That could be very useful information to the management team. Another possibility might be that there is one type of company response to the customer that has the higher delay time. Perhaps focusing on the delays exceeding a certain number of days offers very different insight than no or only a few days of delay? Yet another possibility? Perhaps the time of year, like a particular season, coincides with the length of the delay?
It may be helpful to develop these questions in your script file as comments before working with the data, then add the coding applicable to each question as you answer them.
Note: we will be using Arizona
I will provide the data when ready.
How does focus affect cleaning and exploring data?
Struggling With a Similar Paper? Get Reliable Help Now.
Delivered on time. Plagiarism-free. Good Grades.
What is this?
It’s a homework service designed by a team of 23 writers based in Carlsbad, CA with one specific goal – to help students just like you complete their assignments on time and get good grades!
Why do you do it?
Because getting a degree is hard these days! With many students being forced to juggle between demanding careers, family life and a rigorous academic schedule. Having a helping hand from time to time goes a long way in making sure you get to the finish line with your sanity intact!
How does it work?
You have an assignment you need help with. Instead of struggling on this alone, you give us your assignment instructions, we select a team of 2 writers to work on your paper, after it’s done we send it to you via email.
What kind of writer will work on my paper?
Our support team will assign your paper to a team of 2 writers with a background in your degree – For example, if you have a nursing paper we will select a team with a nursing background. The main writer will handle the research and writing part while the second writer will proof the paper for grammar, formatting & referencing mistakes if any.
Our team is comprised of native English speakers working exclusively from the United States.
Will the paper be original?
Yes! It will be just as if you wrote the paper yourself! Completely original, written from your scratch following your specific instructions.
Is it free?
No, it’s a paid service. You pay for someone to work on your assignment for you.
Is it legit? Can I trust you?
Completely legit, backed by an iron-clad money back guarantee. We’ve been doing this since 2007 – helping students like you get through college.
Will you deliver it on time?
Absolutely! We understand you have a really tight deadline and you need this delivered a few hours before your deadline so you can look at it before turning it in.
Can you get me a good grade? It’s my final project and I need a good grade.
Yes! We only pick projects where we are sure we’ll deliver good grades.
What do you need to get started on my paper?
* The full assignment instructions as they appear on your school account.
* If a Grading Rubric is present, make sure to attach it.
* Include any special announcements or emails you might have gotten from your Professor pertaining to this assignment.
* Any templates or additional files required to complete the assignment.
How do I place an order?
You can do so through our custom order page here or you can talk to our live chat team and they’ll guide you on how to do this.
How will I receive my paper?
We will send it to your email. Please make sure to provide us with your best email – we’ll be using this to communicate to you throughout the whole process.
Getting Your Paper Today is as Simple as ABC
No more missed deadlines! No more late points deductions!
You give us your assignments instructions via email or through our order page.
Our support team selects a qualified writing team of 2 writers for you.
In under 5 minutes after you place your order, research & writing begins.
Complete paper is delivered to your email before your deadline is up.
Want A Good Grade?
Get a professional writer who has worked on a similar assignment to do this paper for you