Click to learn more about author Kartik Patel. This article discusses the analytical method of Hierarchical Clustering and how it can be used within an organization for analytical purposes. What is Hierarchical Clustering? Hierarchical Clustering is a process by which objects are classified into a number of groups so that they are as much dissimilar as […]
Good, Clean Data: How to Get Your Organization’s Data in Tip-Top Shape
Click to learn more about author Paul Barba. As the type and volume of data we deal with increases, so does the challenge of managing and cleaning it. Unstructured data, data from multiple sources, and data whose values continually change all create “messiness” – datasets riddled with noise, inaccuracies, and duplications. This is a big […]
Are Black Swans in Your Data Lakes Obstructing AI Progress?
Click to learn more about author Kim Kaluba. It seems every company is anxious to jump on the boat and launch an AI program to improve organizational performance. But do they know what’s swimming in the data lake alongside their boat?According to a recent Infosys survey, half of organizations reported that they will not be […]
How Culture, Data, and Skills Serve as Barometers for Organizational Data Science Preparedness
Click to learn more about author Steve MacLauchlan. I’ve had the good fortune over my career of working with many different organizations across many different industries. Often, I’ve been in an advisory role assisting organizations with getting a handle on their data assets and finding value in the insights those assets can provide. When data is […]
Data Science in 90 Seconds: Artificial Neural Networks
Click to learn more about video blogger Laura Kahn. This is Lesson 15 in the Data Science in 90 Seconds video blog series from host Laura Kahn. The series covers some of the most prominent questions in Data Science such as Supervised and Unsupervised Learning, K-Means Clustering, Naive Bayes, Decision Trees and Random Forests, Ridge Regression, […]
Top Programming Languages for Data Science and Machine Learning
Click to learn more about author Manan Ghadawala. Software developers love arguing about which programming language is the best. However, the criterion for what is “best” is confusing. When we discuss software development for the machine learning and data science fields, this question is timeless and will never lose its relevance. Most useful programming languages […]
Data Delivery vs. Amazon Delivery: Making Data Scientists More Efficient
Click to learn more about author Dipti Borkar. If Amazon can deliver an order in 1 hour, why does it take days or weeks for data scientists to access their datasets? Embrace chaos, embrace data silos, orchestrate, and deliver. As I head to AWS Summit today, I’ve been thinking about the beginnings—how it began for […]
dotData Updates its dotData Enterprise and dotDataPy Data Science Acceleration Platforms
According to a recent press release, “dotData, the first and only company focused on delivering full-cycle data science automation and operationalization for the enterprise today announced the availability of Version 1.6 of dotData Data Enterprise and Version 1.2 of dotDataPy. The new updates add significant enhancements to both versions of its data science automation platform, […]
GreenSky Launches Data Science Division to Strengthen its Consumer Protections
According to a recent press release, “GreenSky, Inc., a leading financial technology company Powering Commerce at the Point of Sale®, today announced the launch of a newly established Data Science division to boost the Company’s consumer protection program. The company is investing in artificial intelligence and machine learning to more accurately identify suspicious activity, alert […]
From a Single Decision Tree to a Random Forest
Click to learn more about author Rosaria Silipo. The co-author of this column was Kathrin Melcher. Decision trees are a set of very popular supervised classification algorithms. They are very popular for a few reasons: They perform quite well on classification problems, the decisional path is relatively easy to interpret, and the algorithm to build […]