Click to learn more about author Pete Aven. Data Wrangling is bad. Yet we all do it, every single day. In a nutshell, Data Wrangling , also known by the more technical term of “data munging,” is the process or transforming data from one shape into another to prepare it for analysis and deliver some unified […]
Five Ways Data Engineers Are Leveraging Self-Service Data Prep Solutions
Click to learn more about author Farnaz Erfan. With the growth of Big Data Analytics, data engineers are now gaining a lot of popularity. And, while the majority of them have coding and technical skills, ETL knowledge, or can program in MapReduce, many have found that applying a Self-service Data Prep solution can help them […]
Three Questions You Aren’t Asking That Will Make Your Data Strategy Hum
Click to learn more about author Joe deBuzna. In the world where IoT, AI, blockchain and Cloud-connected devices are redefining everything from energy and finance to supply chains and services, this is the new reality: Companies across all industries are becoming data companies. As a result, tools for managing data workloads, like modern Data Lakes, have […]
Cloud Platforms for Analytics: The House Brand Ain’t Always Enough – Conclusion
Click to learn more about author Andrew Brust. In the first part of this column we looked at industry trends that have contributed to the creation of market demand for Public Cloud solutions, what’s in the Cloud Analytics stack and Amazon integration pairs. If you missed it, check out part one first. A Difference of […]
Cloud Platforms for Analytics: The House Brand Ain’t Always Enough
Click to learn more about author Andrew Brust. Today’s leading Cloud Platforms include numerous components for storing, processing and analyzing large volumes of data. All the basics are there: storage, analysis and processing, streaming data processing, data pipelining, data warehousing, BI and even AI. But while it’s great to have all those raw components, how […]
Computational Social Science
Click to learn more about author Steve Miller. It seems I write something on the nature of Data Science every year. Early on, my take on DS was in motion, but now it’s much more grounded. My points of Data Science departure are the iconic “what is DS” pronouncements from Drew Conway and David Donoho. […]
Data Quality and Data Governance: A Resurgence of Interest and Future Maturity
Data Governance and Data Quality have been around for quite a long time, but there has recently been a renewed focus on these essential Data Management practiices. In a recent DATAVERSITY® interview, Harald Smith, Director of Product Management at Syncsort gave his perspective on this resurgence and where the future is heading for Data Governance […]
Self-Serve Data Preparation Doesn’t Mean Traditional ETL is Dead
Click to learn more about author Kartik Patel. Extract, Transform and Load (ETL) refers to a process of connecting to data sources, integrating data from various data sources, improving Data Quality, aggregating it and then storing it in staging data source or Data Marts or Data Warehouses for consumption of various business applications including BI, Analytics […]
Alation Delivers Governance for Insight in Data Lakes, Both On-premises and in the Cloud
by Angela Guess A new press release states, “Alation Inc., the collaborative data company, today announced deeper support for cataloging data lakes deployed both on-premises and in the cloud, including data lakes built with Amazon Simple Storage Service (S3) and the Hadoop Distributed File System (HDFS). The Alation Data Catalog is the first data catalog […]
Amazon Web Services Makes AWS Glue Available To All Customers
by Angela Guess A recent press release reports, “Today, Amazon Web Services, Inc. (AWS), an Amazon.com company, launched AWS Glue, a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data into Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon Relational Database Service (Amazon […]