From constructing a confidence interval for a point predictor to hypothesis testing, statistics can be a complex discipline to start to unravel. Luckily, this guide will help you start to understand the broad field of data analysis by walking you through the basics of its origins and composition.

## Descriptive Statistics Basics

Whether you’ve built a histogram for a science project or regularly use data visualization tools at work, you’ve participated in one of the most important branches in the discipline of statistics: descriptive statistics. Split into two main branches, this first branch deals with data post data collection and strives to use statistical techniques to understand the composition of the data set. Often the first part of any study design, descriptive statistics reveal vital insights into the qualitative or quantitative data being considered. Whether the data is ordinal, categorical or numerative, there are two categories descriptive statistics can fall under: measures of central tendency or of variability. Measures of central tendency are used when someone wants to understand what the average looks like for one or more metrics. These measures involve aspects like the sample mean, median and mode. These three, while seemingly similar, are appropriate in different circumstances depending on whether or not the data has a large amount of outliers. Even the most experienced data scientists are not able to interpret anything about their data sets before conducting preliminary descriptive, statistical analyses. Measures of variability, on the other hand, include characteristics like standard deviation, covariance or the variances. These are used when someone would like to know the spread of the data, which tells you how far the data is spread around the centre, or average. This can be extremely helpful when understanding what percentage of your data falls under a certain range. When applied to financial statistics, the standard deviation can also be seen as the volatility of a particular data set. Descriptive statistics are mostly only for univariate analysis, which is the act of analysing one variable. While this acts as a way of understanding the makeup of things like income or sales, it can also be helpful when comparing the makeup of multiple variables. For example, if a small business wants to take advantage of the sales data it has for a particular event, they can use descriptive statistics to determine the percentage of its customers that are over or under a certain age. Descriptive statistics make up the vast majority of the statistics used by individuals, companies and governments. While forecasting future events is extremely important, many people only need measures of central tendency and variability to extract meaningful information for their decision making. Some of the most powerful measures and included in descriptive statistics are:

• Correlation coefficient
• Simple data visualization
• Distributions (binomial, normal, Laplace, etc.)

You can take this cost-effective online statistics course Canada here.

## Inferential Statistics

The next branch of the disciplines combines probability and statistics in order to understand not only what is inside the data, but to use that data to make predictions. This type of statistical analysis, called inferential statistics, typically draws from probability theory and a probability distribution in order to conduct multivariate, or several variable, analysis. Also known as mathematical statistics, the statistical theory involved under this branch can also reveal important relationships within the data without the use of probability distributions with non-parametric models. The types of models used in the majority of inferential, statistical data analysis involve mostly parametric models such as general linear regression models or analysis of variance (ANOVA) tests. Regardless of whether it’s a parametric or non-parametric test, however, the mathematician or statistician will have to meet two criteria: have a set of variables they’d like to test and have their data meet certain assumptions. The first criterion is simple and involves a process we all understand, which involves picking one dependent variable or several in order to try to predict one independent variable or more. The second criterion is where most statisticians have trouble because most data sets do not strictly follow most assumptions required for using certain models, such as the data following a normal distribution. The Gauss-Markov assumptions for classical linear models are the most commonly known and are key to understanding inferential statistics. Inferential statistics is also distinct from descriptive statistics because it involves testing a null hypothesis against an alternative hypothesis. Using the models available, along with statistical software such as R or SPSS, you will be able to derive estimators and predictions on the mean along with their confidence intervals. If you’re just starting to learn about statistics, some of the most common parametric models include:

• General linear models
• Logistic regression models

On the other hand, some of the more common non-parametric models include:

• Cluster analysis
• Factor analysis
• Discriminate analysis

Along with these models, ANOVA is a common way in which statisticians determine which model can be more precise by comparing the variances of two or more models.

## Tips and Resources for statistics

From understanding what statistical methodology to employ with categorical data analysis to comprehending how the concept of a random variable effects least squares and regression analysis - here are some statistics tips and resources to follow if you need any sort of statistics help.

Need help interpreting the statistical significance of your dependent variable or knowing which parametric test to employ on your observational data? Heading over to Stack Exchange, a statistics forum, will most likely give you the answer to your question. If you’re interested in getting tutored in Statistics, browse through Superprof’s community of almost 150,000 maths teachers in the UK. From chi-square tests to drawing inferences from data sets, a maths teacher can guide you through the field.

### Programming

Stackoverflow is another great online forum that can help you with everything coding related, from including only certain outliers in your experimental design to running a regression analysis, they’ll help you troubleshoot your coding problems.

Need a Data Analysis teacher?     5.00/5 - 1 review(s) Loading... 