Data Mining Techniques

What are Data Mining Techniques?

Data mining involves exploring and analyzing large datasets to identify patterns and relationships. There are many techniques that businesses can use to gain insight from the data they have collected¹:

Clustering. The clustering technique involves grouping a series of different data points by their characteristics. The clustered data is organized into subsets which provides insights into groups behaviors. Clustering methods include partitioning, hierarchal, density-based, grid-based, and model-based methods.

Association. Association techniques are used to correlations between points in datasets. The two primary association methods are single-dimensional association, involving looking for one repeating instance of an attribute, and multi-dimensional association, involving looking for multiple attributes in a data set.

Data cleaning. Data cleaning involves organizing data to eliminate duplications, data corruption, and/or missing values. Data cleaning methods include verification, conversion, removal of irrelevant data, duplication elimination, error removal, and addressing missing values.

Data visualization. Data visualization is the translation of data into graphic forms to illustrate meaning to stakeholders. Methods for data visualization include comparison charts, maps, density plots, heat maps, word clouds, histograms, network diagrams, and scatterplots.

Classification. Classification is a key technique in data mining where data points from large data sets are assigned to categories based on how they are being used. Methods used for classification include logistic regression, K-nearest neighbors (KNN), decision trees, Support Vector Machine (SVM), and naive Bayes.

Machine learning. Machine learning involves the use of computer algorithms to learn on their own how to perform tasks better based on the data they have gathered. Machine learning methods include supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning.

Outlier detection. Outlier detection involves looking for unique data points that diverge from the overall sample. Methods for outlier detection include numeric outliers, Z-score, isolation forest, and DBSCAN.

Neural networks. Neural networks combine many computer processors to process data, make decisions, and to learn, as a human would. Neural networks have three layers: input, hidden, and output.

Predictive modeling. Predictive modeling techniques involve examining data sets to find patterns and trends, and then calculates the probabilities of future outcomes. Predictive modeling techniques include classification modeling, forecast modeling, cluster modeling, and time series modeling.

Data warehousing. Data warehousing is the process by which data is collected and stored prior to evaluation. The data warehousing process occurs prior to data mining. The process of data warehousing involves “ETL”, extracting, transforming, and finally loading the data to the data warehouse.

¹ Georgia Tech, 2022, “10 Key Data Mining Techniques and How Businesses Use Them”