# Study Guide

Please use this study guide to create your certification self-study plan. We’ve included the objectives you should meet for each assessed competency, with links to relevant skill assessments.
Exam DS101: Exploratory Analysis, Statistical Experimentation, and Data Management in R or Python
1.1 Calculate metrics to effectively report characteristics of data and relationships between features
• Calculate the measures of center (e.g. mean, median, mode) for variables using R or Python.
• Calculate the measures of spread (e.g. range, standard deviation, variance) for variables using R or Python.
• Calculate the skewness for variables using R or Python.
• Calculate the missingness for variables and understand its influence on reporting characteristics of data and relationships in R or Python.
• Calculate the correlation between variables using R or Python.
1.2 Create data visualizations in coding language to demonstrate the characteristics of data
• Create and customize the bar chart using R or Python.
• Create and customize the box plot using R or Python.
• Create and customize the line graph using R or Python.
• Create and customize the histogram graph using R or Python.
1.3 Create data visualizations in coding language to represent the relationships between features
• Create and customize the scatterplot using R or Python.
• Create and customize the heatmap using R or Python.
• Create and customize the pivot table using R or Python.
1.4 Identify and reduce the impact of characteristics of data
• Describe when the transformation applies to variables and implement suitable transformation methods using R or Python.
• Identify the missing data and implement suitable imputation methods to reduce its impact on analysis or modeling using R or Python.
• Identify and remove the outliers using R or Python.

#### Related Assessments

2.1 Apply sampling methods to data
• Distinguish between different types of random sampling techniques and apply the methods using R or Python
• Sample data from a statistical distribution (e.g. normal, binomial, Poisson, exponential, etc.) using R or Python
• Calculate the probability from a statistical distribution (e.g. normal, binomial, Poisson, exponential, etc.) using R or Python
2.2 Implement methods for performing statistical tests
• Use different types of graphs to analyze the normality of the samples using R or Python.
• Run simple statistical tests (e.g. t-test, ANOVA test, chi-square test) using R or Python.
• Run suitable statistical tests in the context of the business question using R or Python.
• Interpret the results of the statistical tests running from R or Python.

#### Related Assessments

1.1 Perform standard data import, joining and aggregation tasks
• Import data from flat files and databases using R or Python.
• Aggregate numeric, categorical variables and dates by groups using R or Python.
• Combine multiple tables by rows or columns using R or Python.
• Filter the data based on different criteria using R or Python.
1.2 Perform standard cleaning tasks to prepare data for analysis
• Match the string with different specific patterns from the dataset using R or Python.
• Identify different data types in R or Python and convert values between types.
• Clean categorical and text data by manipulating the string in R or Python.
• Clean date and time data by manipulating the dates and times in R or Python.
• Explain the concept of tidy data and transform the messy data into tidy data using R or Python.
1.3 Assess data quality and perform validation tasks
• Identify, calculate and replace the missing values using R or Python.
• Identify, calculate and remove the duplicates using R or Python.
• Perform different types of data validation tasks (e.g. constraint validation, data range validation, code validation, data type validation) using R or Python.
1.4 Collect data from non-standard formats by modifying existing code
• Import data from API using R or Python.
• Identify the structure of HTML and JSON data and parse them into a usable format for data processing and analysis using R or Python.

#### Exam DS201: Data Management in SQL; Modeling and Programming in R or Python

1.1 Perform data extraction, joining and aggregation tasks
• Aggregate numeric, categorical variables and dates by groups using PostgreSQL.
• Interpret the database schema and combine multiple tables by rows or columns using PostgreSQL.
• Extract the data based on different conditions using PostgreSQL.

#### Related Assessment

2.1 Prepare data for modeling by implementing relevant transformations.
• Create new categories from existing data (e.g. seasons from date, categories from continuous data, combing categories from categorical data) using R or Python.
• Explain the importance of splitting data and split data for training, testing, and validation using R or Python.
• Explain the importance of scaling data and implement the scaling using R or Python.
• Transform categorical data into numerical data using R or Python.
2.2 Implement standard modeling approaches for supervised learning problems.
• Identify the problems that supervised learning models are targeted at.
• Select the regression and classification models and implement the model using R or Python.
• Select the ensemble methods and implement the model using R or Python.
2.3 Implement approaches for unsupervised learning problems.
• Identify the problems that unsupervised learning models are targeted at.
• Select the clustering models and implement the model using R or Python.
• Explain the dimensionality reduction techniques and implement the techniques using R or Python.
2.4 Use suitable methods to assess the performance of a model.
• Select the metrics to evaluate the regression models and calculate the metrics using R or Python.
• Select the metrics to evaluate the classification models and calculate the metrics using R or Python.
• Select the metrics to evaluate the clustering models and calculate the metrics using R or Python.

#### Related Assessments

3.1 Use common programming constructs to write repeatable production quality code for analysis.
• Define, write and execute functions in R or Python.
• Use and write the control flow statements in R or Python.
• Use and write the loops and iterations in R or Python.
3.2 Demonstrates best practices in production code including version control, testing, and package development.
• Describe the basic flow and structures of the package development in R or Python.
• Explain how to document codes in package, subpackage, or module in R or Python.
• Explain the importance of the testing and write the testing statements in R or Python.
• Use version control and interpret the changes between versions from history files in R or Python.