Organize and analyze a dataset to determine trends influencing company outcomes
BackgroundCareers in data science span from cleaning and converting data to analyzing and experimenting with data to drive company strategy. At the heart of data science is the ability to extract information from data to solve problems and/or answer questions that drive an organization forward. Data scientists utilize statistics and computer programming but also curiosity, analytical creativity and strong communication skills. As such, there is a wide range of ways a data scientist interacts with data which spans from straightforward and operationalized to more creative and strategic. While in some circumstances the outcome sought from the data analysis is known, in most cases careful listening and productive dialogue are required to determine the main driver of value for the company. After that determination is made, the data scientist spends time deciding on parameters, trying out different models, seeking out patterns and making connections within a data set. Throughout, it is important to remember that communicating back what you learn from the data and how you learned it will be essential to successfully conveying the value of the information to team members and/or clients. |
The ProcessOverview of the data science process
The exercise:Using this data set, complete the tasks outlined below that move through Steps 2, 3, 4 and 5 of the data science process. These tasks demonstrate the range of ways data scientists interact with data. |
|
Task 1: Data preparation – Cleaning and Organizing
|
|
Task 2:You have been tasked by the company to determine which factors influence sales (denoted as the SALES column in the data set). If possible, derive a method to predict sales.
|
|
Task 3:Now that you have figured out which variables influence sales – the company wants to know if there are other interesting trends in the data. Do certain customers tend to buy at certain times? What drives the deal size?
|
The Deliverable
At the end of these tasks you should be able to deliver a Word document which clearly outlines the following:
- question you were trying to ask
- methods you used to answer the question
- results of the analysis
The document should include tables and graphs were appropriate to support your findings. Make sure to include the specifics of how the data was processed in each case.
Sample Deliverable 1:
data science sample deliverable
Resources:
Pre-Requisite Knowledge
- Statistics
- General mathematics
- Linear algebra
- Programming languages including R, Python, SAS
- Machine learning
- Optimization
Web Resources:
- https://www.datacamp.com/courses
- https://www.kaggle.com/learn/overview
- https://www.datascienceweekly.org/
- http://scikit-learn.org/stable/
Skills used to perform this task:
- Organization
- Patience
- Statistical Analysis
- Data intuition
- Analytical
Skills used in Data Science:
- Communication
- Programming
- Data visualization
- Creativity
Additional tasks:
- Machine Learning
You are viewing a job simulation. To get started, set up SMART Goals to perform this simulation in a reasonable timeline. If you have completed the task, fill out the Self-Reflection Sheet.
Simulation author – Sarah Peterson, PhD
Simulation vetted by data professionals in the Greater Atlanta area