Automate Hazardous Data Detection in Safety Data Sheets

Some data consists of customers’ numbers; others show when a bill is due. In this article, we’ll specifically dive into «hazardous» data and how companies are retrieving this type of information without incurring errors.

Article content:

Safety data sheets (SDSs): what are they?
What hazardous data do SDSs contain?
How do companies extract data from them?

Manual data extraction

Automated data extraction

How does it work?

So what are Safety Data Sheets (SDSs)?

Safety data sheets, aka SDSs, typically contain chemical properties, health & environmental hazards, and safety precautions for storing, handling, and transporting chemicals.

They are standardised documents by which chemical manufacturers communicate chemical hazard information to chemical handlers.

What hazard data do SDSs contain?

Typically, the hazardous information stored within SDSs includes:

The hazard classification of the chemical (e.g., flammable liquid, category)
Signal word
Hazard statement(s)
Pictograms (may be presented as graphical reproductions of the symbols in black and white or be a description of the name of the symbol (e.g., skull and crossbones, flame)
Precautionary statement(s)
Description of any hazards not otherwise classified

But these are merely a handful of the many more data types stored within SDSs.

How do companies extract data from SDSs?

Before elaborating on how companies extract data from these complex documents, it’s crucial to identify the urgency behind the data. The Occupational Safety and Health Administration (OSHA) states that employers must ensure that SDSs, and the information within them, are readily available to anybody who needs them.

And when we consider that many SDSs are documents that contain complex structures spanning hundreds of pages in length, there’s no wonder why companies seek alternative solutions to extract data from them.

Here is one way companies are doing so...

Manual data extraction

Humans will look at a document, study where relevant data is located on each page, and then will manually enter each piece of information into an application.

However…

While this seems like the simplest method to extract data from SDSs, there’s always a risk of making errors, especially if someone struggles with fatigue.

Particularly when processing hazardous data from SDSs, some consequences of making errors can be:

Mis-identifying harmful chemicals
Incorrectly calculating quantities/volumes
Falsely labelling flammable items as non-flammable

The main idea is that typos from SDSs are significantly more catastrophic than your usual email typo to your boss.

So this begs the question: how do companies avoid these outcomes?

Automated data extraction

As the name suggests, automated data extraction is a faster, more efficient way to process data. Autonomy is often derived from platforms that use machine learning (ML) and artificial intelligence (AI) to learn how to process document data effectively.

Automated data extraction platforms are typically «intelligent» because they can understand data like a human rather than simply reading it into a system.

Example: non-intelligent vs intelligent data extraction

A non-intelligent system, such as OCR (optical character recognition) will read that there are five icons on a given page, but it will not understand what those icons mean.

Whereas an intelligent system, like Intelligent Document Processing (IDP) would identify the meaning of each icon.

Furthermore, intelligent solutions can also identify specific data points contained in complex structures (e.g., figures, tables, etc.), which is particularly useful for SDSs that are often full of them.

How does automated data extraction work?

A team member would upload the safety data sheets to an Intelligent Document Processing platform and tell the system which data points they need to process.
The system would then automatically extract the data, whether they are tables, figures, pictograms, etc.
Thanks to «human in the loop» features, team members can assess the results to double-check whether the outcomes are 100% accurate. With machine learning, the system will then continuously learn and prevent previous mistakes.
Data can be automatically exported to internal/external systems in a variety of formats.

The bottom line

SDSs have high priority when retrieving accurate information from them, particularly in a time-efficient way.

And ultimately, automating data extraction, particularly hazardous pictograms, speeds up the entire processing time of SDSs and significantly increases how accurately we can extract their data.