What is Data Science?

Data science is the art and science of acquiring knowledge through data. Data science is all about how we take data, use it to acquire knowledge, and then use that knowledge to do the following:

Make decisions
Predict the future
Understand the past/present
Create new industries/products

The data science Venn diagram

The basic areas of Data Science are:

Math/statistics: This is the use of equations and formulas to perform analysis
Computer programming: This is the ability to use code to create outcomes on the computer
Domain knowledge: This refers to understanding the problem domain (medicine, finance, social science, and so on)

The following Venn diagram provides a visual representation of how the three areas of data science intersect:

Those with hacking skills can conceptualize and program complicated algorithms using computer languages. Having a Math & Statistics Knowledge base allows you to theorize and evaluate algorithms and tweak the existing procedures to fit specific situations. Having Substantive Expertise (domain expertise) allows you to apply concepts and results in a meaningful and effective way.

Data Science is the intersection of the three key areas mentioned earlier. In order to gain knowledge from data, we must be able to utilize computer programming to access the data, understand the mathematics behind the models we derive, and above all, understand our analyses’ place in the domain we are in. This includes the presentation of data.

Why Python in Data science?

Python is an extremely simple language to read and write, even if you’ve never coded before
It is one of the most common languages
The language’s online community is vast and friendly.
Python has prebuilt data science modules that both the novice and the veteran data scientist can utilize

The last is probably the biggest reason because we should focus on Python. Some of these modules are as follows:

pandas
sci-kit learn
seaborn
numpy/scipy
requests (to mine data from the Web)
BeautifulSoup (for the Web-HTML parsing)

Some more terminology

Machine learning: This refers to giving computers the ability to learn from data without explicit “rules” being given by a programmer.
Probabilistic model: This refers to using probability to find a relationship between elements that includes a degree of randomness.
Statistical model: This refers to taking advantage of statistical theorems to formalize relationships between data elements in a (usually) simple mathematical formula.

Exploratory data analysis (EDA) refers to preparing data in order to standardize results and gain quick insights.
Data mining is the process of finding relationships between elements of data. Data mining is the part of data science where we try to find relationships between variables.

What is Data Science and why use Python

What is Data Science?

The data science Venn diagram

Why Python in Data science?

Some more terminology

Lascia un commento Annulla risposta

Data Analysis: Cleaning data with Jupyter Part1-Remove Outliers

Data Analysis: Loading Data from CSV & Excel Files in Jupyter Notebook

Data Analysis: Anaconda & Jupyter

Data Analysis: Exploration Data Analysis (EDA) & Data Mining

What is Data Science and why use Python