Click to learn more about author Evelyn Johnson. Why do big data projects fail? They do; that’s for sure. Gartner estimated that 60 percent of big data projects fail to achieve their desired objectives. A year later, they revised this figure to 85 percent, admitting they were “too conservative” with the original estimate. So, going […]
From Modeling to Scoring: Finding an Optimal Classification Threshold based on Cost and Profit
Click to learn more about co-author Maarit Widmann. Click to learn more about co-author Alfredo Roccato. Wheeling like a hamster in the Data Science cycle? Don’t know when to stop training your model? Model evaluation is an important part of a Data Science project and it’s exactly this part that quantifies how good your model is, […]
Enterprise Data Literacy: Understanding Data Management
To truly understand data-as-an-asset requires Enterprise Data Literacy, an organizational capability to take, analyze, and use data to remain secure and competitive. But achieving a high Enterprise Data Literacy can remain daunting when business and IT interact together. All too often in the middle of a project sprint, IT gets stuck on a minor problem, […]
Demystifying the New Normal with Data and Analytics
Click to learn more about author Devansh Sharma. Analytics has been a key enabler for businesses in the past few years. We have seen the entire world shift from process-driven to data-driven decision-making. And now that we are standing at the brink of a new normal amidst these unprecedented times, the usage of data and […]
Metadata Repository Essential Use Cases
“Even though we’re working with very sophisticated Data Science resources, we still hear over and over, ‘I don’t even know what data is available,’” said Susan Swanson, the Senior Manager of Data Modeling and Architecture at Health Care Service Corporation (HCSC), an organization that offers regional health care coverage and services. Speaking at the DATAVERSITY® […]
Guided Labeling Episode 4: From Exploration to Exploitation
Click to learn more about author Paolo Tamagnini. One of the key challenges in using supervised machine learning for real world use cases is that most algorithms and models require a sample of data that is large enough to represent the actual reality your model needs to learn. These data need to be labeled. These […]
What is the KMeans Clustering Algorithm and How is it Used to Analyze Data?
Click to learn more about author Kartik Patel. This article provides a brief explanation of the KMeans Clustering algorithm. What is the KMeans Clustering algorithm? The KMeans Clustering algorithm is a process by which objects are classified into number of groups so that they are as much dissimilar as possible from one group to another, and […]
Mise en Place for Data Science
Click to learn more about author Curt Bergmann. When guests arrive at a great restaurant, the chef and all the cooks have already planned and assembled everything they need to quickly deliver excellence on a plate. Their process, called mise en place, is used by chefs all over the world. Emerging after the introduction of […]
Fundamentals of Machine Learning Enabled Analytics
The famous theoretical physicist Stephen Hawking said, “It’s tempting to dismiss the notion of highly intelligent machines as mere science fiction.” Artificial intelligence (AI), the game-changer technology of the global business world, comprises three distinct sub-disciplines: machine learning (ML), natural language processing (NLP), and cognitive computing. Automated solutions in business analytics use all these sub-technologies, […]
K-Anonymization: An Introduction for First Graders
Click to learn more about author John Murray. Now that privacy-enhancing technologies (PETs) have become a subject of dinner table conversations, our research team continues to field questions on these complex topics which can be difficult to explain. As part of this series, I will attempt to explain another PET, K-anonymization, in a first-grade context: […]