Data Analysis: Exploration Data Analysis (EDA) & Data Mining

Data Analysis: Exploration Data Analysis (EDA)

In Data Analysis workflow, one of the most important step is the Exploration Data Analysis. This step comes soon after deciding the purpose of your Data Analysis and collecting your data.

There doesn’t seem to exist a short, unique definition that satisfies everything that involves EDA, but we can say that EDA refers to preparing data in order to standardize results and gain quick insights.

we can consider it It’s an approach, not a set of techniques, but an attitude or philosophy about how a data analysis should be carried out. EDA alone may not be enough to construct completely precise conclusions, but for sure it will help to support them. Aid in decision making and planning.

Data mining is the process of finding relationships between elements of data. Data mining is the part of data science where we try to find relationships between variables.

Data mining should be done to solve high-priority, high-value problems. Data can be analyzed at multiple levels of granularity and could lead to a large number of interesting combinations of data and interesting patterns.

Here are brief descriptions of some of the most important data mining techniques used to generate insights from data.

  • Decision Trees: They help classify populations into classes. Decision trees are the most popular and important data mining technique. There are many popular algorithms to make decision trees.
  • Regression: This is a well-understood technique from the field of statistics. The goal is to find a best fitting curve through the many data points. The best fitting curve is that which minimizes the (error) distance between the actual data points and the values predicted by the curve.
  • Artificial Neural Networks: Originating in the field of artificial intelligence and machine learning, ANNs are multi-layer non-linear information processing models that learn from past data and predict future values.
  • Cluster analysis: This is an important data mining technique for dividing and conquering large data sets. The data set is divided into a certain number of clusters, by discerning similarities and dissimilarities within the data.

I will explain in detail all this techniques for data mining in the next articles in my blog. If are you interested to about this stay tuned, and if you want suggest me which topic focus on leaving a comment below.

Also, if you want you can donate me a little contribution on my paypal to help me with my researches.

Lascia un commento

Il tuo indirizzo email non sarĂ  pubblicato. I campi obbligatori sono contrassegnati *