Certification 2022
Search…
⌃K

Study Guide

Please use this study guide to create your certification self-study plan. We’ve included the objectives you should meet for each assessed competency, with links to relevant skill assessments.
Exam DS101: Exploratory Analysis, Statistical Experimentation, and Data Management in R or Python
1.1 Calculate metrics to effectively report characteristics of data and relationships between features
  • Calculate the measures of center (e.g. mean, median, mode) for variables using R or Python.
  • Calculate the measures of spread (e.g. range, standard deviation, variance) for variables using R or Python.
  • Calculate the skewness for variables using R or Python.
  • Calculate the missingness for variables and understand its influence on reporting characteristics of data and relationships in R or Python.
  • Calculate the correlation between variables using R or Python.
1.2 Create data visualizations in coding language to demonstrate the characteristics of data
  • Create and customize the bar chart using R or Python.
  • Create and customize the box plot using R or Python.
  • Create and customize the line graph using R or Python.
  • Create and customize the histogram graph using R or Python.
1.3 Create data visualizations in coding language to represent the relationships between features
  • Create and customize the scatterplot using R or Python.
  • Create and customize the heatmap using R or Python.
  • Create and customize the pivot table using R or Python.
1.4 Identify and reduce the impact of characteristics of data
  • Describe when the transformation applies to variables and implement suitable transformation methods using R or Python.
  • Identify the missing data and implement suitable imputation methods to reduce its impact on analysis or modeling using R or Python.
  • Identify and remove the outliers using R or Python.

Related Assessments

2.1 Apply sampling methods to data
  • Distinguish between different types of random sampling techniques and apply the methods using R or Python
  • Sample data from a statistical distribution (e.g. normal, binomial, Poisson, exponential, etc.) using R or Python
  • Calculate the probability from a statistical distribution (e.g. normal, binomial, Poisson, exponential, etc.) using R or Python
2.2 Implement methods for performing statistical tests
  • Use different types of graphs to analyze the normality of the samples using R or Python.
  • Run simple statistical tests (e.g. t-test, ANOVA test, chi-square test) using R or Python.
  • Run suitable statistical tests in the context of the business question using R or Python.
  • Interpret the results of the statistical tests running from R or Python.

Related Assessments

1.1 Perform standard data import, joining and aggregation tasks
  • Import data from flat files and databases using R or Python.
  • Aggregate numeric, categorical variables and dates by groups using R or Python.
  • Combine multiple tables by rows or columns using R or Python.
  • Filter the data based on different criteria using R or Python.
1.2 Perform standard cleaning tasks to prepare data for analysis
  • Match the string with different specific patterns from the dataset using R or Python.
  • Identify different data types in R or Python and convert values between types.
  • Clean categorical and text data by manipulating the string in R or Python.
  • Clean date and time data by manipulating the dates and times in R or Python.
  • Explain the concept of tidy data and transform the messy data into tidy data using R or Python.
1.3 Assess data quality and perform validation tasks
  • Identify, calculate and replace the missing values using R or Python.
  • Identify, calculate and remove the duplicates using R or Python.
  • Perform different types of data validation tasks (e.g. constraint validation, data range validation, code validation, data type validation) using R or Python.
1.4 Collect data from non-standard formats by modifying existing code
  • Import data from API using R or Python.
  • Identify the structure of HTML and JSON data and parse them into a usable format for data processing and analysis using R or Python.

Related Assessments

Exam DS201: Data Management in SQL; Modeling and Programming in R or Python

1.1 Perform data extraction, joining and aggregation tasks
  • Aggregate numeric, categorical variables and dates by groups using PostgreSQL.
  • Interpret the database schema and combine multiple tables by rows or columns using PostgreSQL.
  • Extract the data based on different conditions using PostgreSQL.

Related Assessment

2.1 Prepare data for modeling by implementing relevant transformations.
  • Create new categories from existing data (e.g. seasons from date, categories from continuous data, combing categories from categorical data) using R or Python.
  • Explain the importance of splitting data and split data for training, testing, and validation using R or Python.
  • Explain the importance of scaling data and implement the scaling using R or Python.
  • Transform categorical data into numerical data using R or Python.
2.2 Implement standard modeling approaches for supervised learning problems.
  • Identify the problems that supervised learning models are targeted at.
  • Select the regression and classification models and implement the model using R or Python.
  • Select the ensemble methods and implement the model using R or Python.
2.3 Implement approaches for unsupervised learning problems.
  • Identify the problems that unsupervised learning models are targeted at.
  • Select the clustering models and implement the model using R or Python.
  • Explain the dimensionality reduction techniques and implement the techniques using R or Python.
2.4 Use suitable methods to assess the performance of a model.
  • Select the metrics to evaluate the regression models and calculate the metrics using R or Python.
  • Select the metrics to evaluate the classification models and calculate the metrics using R or Python.
  • Select the metrics to evaluate the clustering models and calculate the metrics using R or Python.

Related Assessments

3.1 Use common programming constructs to write repeatable production quality code for analysis.
  • Define, write and execute functions in R or Python.
  • Use and write the control flow statements in R or Python.
  • Use and write the loops and iterations in R or Python.
3.2 Demonstrates best practices in production code including version control, testing, and package development.
  • Describe the basic flow and structures of the package development in R or Python.
  • Explain how to document codes in package, subpackage, or module in R or Python.
  • Explain the importance of the testing and write the testing statements in R or Python.
  • Use version control and interpret the changes between versions from history files in R or Python.

Related Assessments