277 Data Science Key Terms, Explained

This is a collection of 277 data science key terms, explained with a no-nonsense, concise approach. Read on to find terminology related to Big Data, machine learning, natural language processing, descriptive statistics, and much more.
This post presents a collection of data science related key terms with concise, no-nonsense definitions, organized into 12 distinct topics. Starting with Big Data and progressing through to natural language processing, this definition train has stops at machine learning, databases, Apache Hadoop, and several more. It may take come time, but once you get through the terminology presented herein, you should have a good idea of the key terms of importance in data science. And don’t worry if the definitions are too slim for you; links abound for expanded related reading opportunities where appropriate.
Big Data. If somehow you’ve made it to this website and have not heard the term since it first gained momentum toward becoming a popular term at least a decade and a half ago, I really don’t know what to say.
But just because one has heard the term, or has taken part in (or opposed) its flippant usage, that really doesn’t mean one knows what it actually means, or what it fully encompasses. Indeed, trying to exhaustively describe what Big Data is in a single post would be nonsensical, not the least of which reason being that there is no agreed-upon exhaustive description, nor should there be. Collecting some key terms associated with Big Data is not a bad idea, however, as it lays a common foundation from which to work forward.
This is the first in a series of such posts on KDnuggets which will offer concise explanations of a related set of terms (machine learning, in this case), specifically taking a no-frills approach for those looking to isolate and define. After some thought, it was determined that these foundational-yet-informative types of posts have not been given enough exposure in the past.
So, let’s start with a look at machine learning and related topics.
Clustering is a method of data analysis which groups data points together in order to “maximizing the intraclass similarity and minimizing the interclass similarity,” (by Han, Kamber & Pei) without using predefined labels of points (i.e., an unsupervised learning technique). This post introduces key words for common techniques in cluster analysis.
Deep learning is a relatively new term, although it has existed prior to the dramatic uptick in online searches of late. Enjoying a surge in research and industry, due mainly to its incredible successes in a number of different areas, deep learning is the process of applying deep neural network technologies – that is, neural network architectures with multiple hidden layers – to solve problems. Deep learning is a process, like data mining, which employs deep neural network architectures, which are particular types of machine learning algorithms.
Data needs to be curated, coddled, and cared for. It needs to be stored and processed, so that it may be transformed into information, and further refined into knowledge. The mechanism for storing data, subsequently facilitating these transformations, is, clearly, the database.
This post presents 16 key database concepts and their corresponding concise, straightforward definitions.


