Data Science with Python

Data Science using Python (Duration 40 Hours , Fees INR 22,000 Inc. Taxes)

“The world’s most valuable resource is no longer oil, but data”. With rapid digitalization technologies have rapidly been assimilated into our everyday lives. Businesses are gathering huge amounts of data about their customers, employees, operations and so on..

Data Science is all about mining and extracting hidden insights of data pertaining to trends, behaviour, interpretation and inferences to enable informed decisions to support the business. The professionals who perform these activities are said to be a Data Scientist. Data Science is the most high-in-demand profession and as per Harvard and the most sort after profession in the world.

With NextGen Education Data Science with Python course you become job ready for roles such as Data Scientist business analysts, data analysts, data engineer, analytics engineer etc.


Module1: Introduction

  1. What Data Science?
  2. Common Terms in Analytics
  3. Analytics vs. Data warehousing, OLAP, MIS Reporting
  4. Relevance in industry and need of the hour
  5. Types of problems and business objectives in various industries
  6. How leading companies are harnessing the power of analytics?
  7. Critical success drivers
  8. Overview of analytics tools & their popularity
  9. Analytics Methodology & problem-solving framework
  10. List of steps in Analytics projects
  11. Identify the most appropriate solution design for the given problem statement
  12. Project plan for Analytics project & key milestones based on effort estimates
  13. Build Resource plan for analytics project
  14. Why Python for data science?

Module2: Core Python

  1. Overview of Python- Starting with Python
  2. Introduction to installation of Python
  3. Introduction to Python Editors & IDE’s(Canopy, pycharm, Jupyter, Rodeo, Ipython etc…)
  4. Python Syntax
  5. Variables & Data Types
  6. Operators
  7. Conditional Statements
  8. Working With Numbers & Strings
  9. Collections API
  10. LISTS
  11. TUPLES .
  13. Date and Time
  14. Function & Modules
  15. File handling
  16. Exception Handling
  17. OOPS Concepts in python
  18. Regular Expression

Module 3: Python Libraries for Data Science

  1. Numpy
  2. Scify
  3. Pandas
  4. Scikitlearn
  5. Statmodels
  6. nltk

Module 4: Python Modules for Access, Import/Export Data

  1. Importing Data from various sources (Csv, txt, excel, access etc.)
  2. Database Input (Connecting to database)
  3. Viewing Data objects – subsetting, methods
  4. Exporting Data to various formats
  5. Important python modules: Pandas, beautiful soup

Module 5: Data Manipulation, Cleansing and Munging

  1. Cleansing Data with Python
  2. Data Manipulation steps (Sorting, filtering, duplicates, merging, appending, subsetting, derived variables, sampling, Data type conversions, renaming, formatting etc.)
  3. Data manipulation tools (Operators, Functions, Packages, control structures, Loops, arrays etc.)
  4. Python Built-in Functions (Text, numeric, date, utility functions)
  5. Python User Defined Functions
  6. Stripping out extraneous information
  7. Normalizing data
  8. Formatting data
  9. Important Python modules for data manipulation (Pandas, Numpy, re, math, string, datetime etc.)

Module 6: Data Analysis and Visualization

  1. Introduction exploratory data analysis
  2. Descriptive statistics, Frequency Tables and summarization
  3. Univariate Analysis (Distribution of data & Graphical Analysis)
  4. Bivariate Analysis(Cross Tabs, Distributions & Relationships, Graphical Analysis)
  5. Creating Graphs- Bar/pie/line chart/histogram/ boxplot/ scatter/ density etc.)
  6. Important Packages for Exploratory Analysis(NumPy Arrays, Matplotlib, seaborn, Pandas and scipy.stats etc.)
  7. Data visualization with tableau.

Module 7: Statistics

  1. Basic Statistics – Measures of Central Tendencies and Variance
  2. Building blocks – Probability Distributions – Normal distribution – Central Limit Theorem
  3. Inferential Statistics -Sampling – Concept of Hypothesis Testing
  4. Statistical Methods – Z/t-tests( One sample, independent, paired), Anova, Correlations and Chi-square
  5. Important modules for statistical methods: Numpy, Scipy, Pandas

Module 8: Predictive Modelling

  1. Concept of model in analytics and how it is used?
  2. Common terminology used in analytics & modelling process
  3. Popular modelling algorithms
  4. Types of Business problems – Mapping of Techniques
  5. Different Phases of Predictive Modelling

Module 9: Data Exploration for Modelling

  1. Need for structured exploratory data
  2. EDA framework for exploring the data and identifying any problems with the data (Data Audit Report)
  3. Identify missing data
  4. Identify outliers data
  5. Visualize the data trends and patterns

Module 10: Data Preparation

  1. Need of Data preparation
  2. Consolidation/Aggregation – Outlier treatment – Flat Liners – Missing values- Dummy creation – Variable Reduction
  3. Variable Reduction Techniques – Factor & PCA Analysis

Module 11: Solving Segmentation Problems

  1. Introduction to Segmentation
  2. Types of Segmentation (Subjective Vs Objective, Heuristic Vs. Statistical)
  3. Heuristic Segmentation Techniques (Value Based, RFM Segmentation and Life Stage Segmentation)
  4. Behavioral Segmentation Techniques (K-Means Cluster Analysis)
  5. Cluster evaluation and profiling – Identify cluster characteristics
  6. Interpretation of results – Implementation on new data

Module 12: Linear Regression

  1. Introduction – Applications
  2. Assumptions of Linear Regression
  3. Building Linear Regression Model
  4. Understanding standard metrics (Variable significance, R-square/Adjusted R-square, Global hypothesis ,etc)
  5. Assess the overall effectiveness of the model
  6. Validation of Models (Re running Vs. Scoring)
  7. Standard Business Outputs (Decile Analysis, Error distribution (histogram), Model equation, drivers etc.)
  8. Interpretation of Results – Business Validation – Implementation on new data

Module 13: Logistic Regression

  1. Introduction – Applications
  2. Linear Regression Vs. Logistic Regression Vs. Generalized Linear Models
  3. Building Logistic Regression Model (Binary Logistic Model)
  4. Understanding standard model metrics (Concordance, Variable significance, Hosmer Lemeshov Test, Gini, KS, Misclassification, ROC Curve etc)
  5. Validation of Logist ic Regression Models (Re running Vs. Scoring)
  6. Standard Business Outputs (Decile Analysis, ROC Curve, Probability Cut-offs, Lift charts, Model equation, Drivers or variable importance, etc)
  7. Interpretation of Results – Business Validation – Implementation on new data

Module 14: Time Series Forecasting

  1. Introduction – Applications
  2. Time Series Components (Trend, Seasonality, Cyclicity and Level) and Decomposition
  3. Classification of Techniques (Pattern based – Pattern less)
  4. Basic Techniques – Averages, Smoothening, etc
  5. Advanced Techniques – AR Models, ARIMA, etc
  6. Understanding Forecasting Accuracy – MAPE, MAD, MSE, etc