Successful companies capitalize on their organizational data assets through effective understanding of how to best leverage the similar, but notably different, practices and concepts of Data Management vs. Data Governance. According to Dr. Peter Aiken, Facebook has an estimated worth of more than five times that of United Airlines, $200 billion vs. $34 billion. The […]
What Is Data Lineage?
An apt metaphor for data lineage is that of complex book keeping. But, instead capturing financial transactions in a ledger, data asset values populate a given platform or document. Data lineage describes data origins, movements, characteristics, and quality across the data lifecycle. Typically, data lineage has been thought of as map of tables and joins, […]
Data Curation 101: The What, Why, and How
Humans have an imperative to practice Data Curation. People have and continue to gather, maintain, and archive data at ever greater volumes, and they always have. They drive to get useful data for today and tomorrow. As Mike Schmoker elegantly states, “Things get done only if the data we gather can inform and inspire those […]
What Is a Data Scientist?
Data scientists emphasize rigor and performance when obtaining, scrubbing, exploring, modeling, and interpreting data. Data scientists provide a different context than data analysts to their work, through high-powered math. Josh Wills, a software engineer, once described a data scientist as a “person who is better at statistics than any software engineer and better at software […]
Taxonomy vs Ontology: Machine Learning Breakthroughs
The difference between Taxonomy vs Ontology is a topic that often perplexes even the most seasoned data professionals, Data Scientists, Data Analysts, and many a technology writer. Yet, taxonomies and ontologies form the underpinnings of how machines learn and understand, a group of technologies that are quickly improving in perception and cognition. Cognitive Computing technologies […]
What Is Extract, Transform, and Load (ETL)?
Extract, Transform, and Load (ETL) describes the process of integrating raw data from various data sources into a repository such as a data warehouse, with the main purpose of maintaining Data Quality and trust. ETL requires three operations, as described below by Paul Varley: Extract: “Getting a copy of data from a source, which could be […]
What Is The Internet of Things (IoT)?
The Internet of Things (IoT) is a: “System using multiple technologies, ranging from the Internet to wireless communication and from micro-electromechanical systems (MEMS) to embedded systems. The traditional fields of automation (including the automation of buildings and homes), wireless sensor networks, GPS, control systems, and others.” Other Definitions of IOT include: “A more comprehensive view […]
What Is Machine Learning?
Machine Learning (ML) “…explores the construction and study of learning algorithms.” Furthermore, Machine Learning: “…is about building programs with adaptable parameters that automatically adjust based on the data the programs receive. By adapting to previously seen data, the programs are able to improve their behavior. They also generalize data, meaning that the programs can perform […]
What Is Metadata?
Metadata is information about the data collected. According to the Data Management Body of Knowledge (DMBoK), metadata “includes information about technical and business processes, data rules and constraints, and logical and physical data structures.” Think of it as a wrapper around data that describes it, like how packaging tells what food is in a box or […]
What Is Ontology?
Ontology is often considered a subset of taxonomy. An ontology: Is a domain; contains more information about the behavior of entities and the relationships between them; includes formal names, definitions and attributes of entities; and, may be constructed using OWL, the Ontology Web Language from the W3C. Other Definitions of Ontology Include: “A data model […]