Skip to content

Should You Do Data-Driven Analysis From Scratch?

By: Fero Labs Logo light
• November 2021
Adobe Stock 393791829

At Fero we often get asked the question: Why can't I do this data-driven analysis myself?

You can—but it will be expensive and inefficient. And it might not work.

Below, we highlight some of the main issues you'll run into. These issues broadly fall into three categories: data, collaboration, and broader issues around trust.

1. Data issues

Without quality data, you won't get quality insights. That means even before doing any analysis, you'll need to make sure the data coming from the factory floor is clean, without any outliers. This brings us to our first issue:

Data is hard to clean in real time.

It's easy to export static data into Excel and manually clean it by deleting rows that don’t make sense and removing outliers by hand. But cleaning live data is more challenging—especially if you don't know anything about the factory or process. As a result, data scientists can spend months cleaning data for a single process.

Fero software has process and data engines that auto-clean data in real time. Once your static data has been cleaned and you want to start analyzing live data, you don't need to do any extra work—just enable live optimization mode and Fero will automatically clean and analyze data as it's generated on the factory floor.

Reproducing analyses often requires starting again from scratch.

Say you start a project to find the root cause of an unusually high scrap rate. Later, you want to study emissions generated during the same process and figure out how to reduce them.

If the data scientist has created custom code to deal with the first project, it won't be helpful for the second. Rather, you'll need to start from scratch as your new files and data sources will have many small but significant differences.

Fero makes it easy to ask multiple questions about the same process. You don't need to do any re-work. Rather, just create a new analysis and adjust the factors you want to examine, and the software will get you answers in minutes.

2. Collaboration issues

Once the models have been built, they'll then need to actually be implemented. This leads to another set of practical challenges:

Splitting work between the data scientist and the domain expert is inefficient.

The person preparing and cleaning the data is typically the data scientist, but the person who actually knows the data is the domain expert. So this requires a lot of back and forth: the data scientist has to talk to the domain expert to get some code, they'll share the result, they have to look at the graphs to see if it makes sense, etc.

We cut this back and forth out. When you set up a process in Fero, the domain expert adds information about the factory, such as the number of stages in the process and how long items spend in each stage. This information is then used to clean and prepare the data and build accurate models—so the data scientists can spend their time analyzing and discussing the results with their domain expert colleagues.

The data scientist leaves.

They wrote custom code, but now you don't know how to maintain it or what to do with it. You need to find another data scientist to help deal with those issues. (This issue is compounded when it's a student intern doing the data science work!)

With Fero, you no longer have to worry about orphaned projects like this. Anyone can use Fero easily—no code required.

3. Trust issues

Building a model is only part of the challenge. A model with X% accuracy doesn't automatically translate into ROI. If this model is sitting on some engineer's workstation, it doesn't bring value.

Trust is the main obstacle hindering the adoption of machine learning in the industrial space. If you're planning to build your own models, you'll likely use off-the-shelf models. These aren't built for industry, so they're not explainable—users won't be able to see how they work or be able to use them to get a deep understanding of their processes. That means those who actually work on the factory floor may not buy into them, thus wasting your investment.

If you don't already employ data scientists, hiring contractors will add significant cost to your budget. And if you already have your own data scientists, great! You should leverage them to address as many data-driven analyses as possible by providing them good software to cover the more routine parts of their jobs.

As we've seen in a plethora of recent surveys, many manufacturers are eager to invest in AI and machine learning, yet few companies actually see value from their AI investments. To buck the trend, you'll need a solution that works, and that engineers and operators can be confident will make their work easier.