Untitled design (2)-May-07-2024-02-46-39-1135-PM

With over 170 consumer health brands in its innovative global portfolio, Bayer Consumer Health empowers people to manage their health needs in the areas of dermatology, nutritional supplements, pain management, cardiovascular risk prevention, digestive health...

Capgemini Engineering is a global leader in partnering with companies to transform and manage their business by harnessing the power of technology. It is a responsible and diverse organization of nearly 350,000 team members in more than 50 countries.

Social Media Posts
Customer Success Story
Bayer Consumer Health and Capgemini Engineering with Acodis Case Study

Extract the raw data from clinical study reports from historic studies and map multiple studies to a common data model to perform meta-analysis across several studies.


Data extraction of over 200 tables, which ranged in formatting and length in over 3500 pages of documents and data validation for a finished common data model in 3 weeks.

Key Benefits
Scalability, flexibility, speed, and accuracy. Ease of use: No coding is required, data extraction is done by a subject-matter expert.
Created with Sketch.

By extracting raw data from scanned PDF copies of historic study reports, we were able to re-use the clinical data and conduct further exploration and analysis by combining datasets from various studies. Without the new technology, this would not have been possible within reasonable costs and timelines.

Gregor Bieri
Head Clinical Data Sciences, Analytics & QM at Bayer Consumer Health

The Challenge

  • Conducting clinical trial studies is a costly process for pharmaceutical companies, requiring extensive effort, time, and resources. Costly patient recruitment, screening and retention, research and development, and regulatory requirements contribute to the high expense of conducting clinical trials.


  • Over-the-counter (OTC) consumer health products are often based on data generated years or even decades ago. In many cases, the raw data from these older studies in electronic format has been lost, and the clinical data are only accessible through scanned copies of the listings in the appendices of integrated study reports.


  • Bayer Consumer Health and Capgemini Engineering aimed to extract the raw data from clinical study reports from historic studies. While the clinical trial studies were conducted many years prior, the studies held valuable key insights, such as patient demographics, drug exposures, and outcomes, that could be used for analytical purposes. In addition, mapping multiple studies to a common data model allowed to pool together studies to perform meta-analysis across several studies.


  • The team faced a challenge, as much of the clinical trial data was stored in unstructured data formats. The data was not readily available or accessible for analysis and required extraction from old, scanned copies of PDF documents. The team required a solution that allowed for high-quality extraction from multiple different tabular formats. In addition, some tables had differing (or lack of) border and row structuring, and/ or a high level of skewness due to the scanning. Furthermore, some scanned PDF files were of poor quality, with noisy data.
Untitled design (1)-May-07-2024-02-34-39-1403-PM


  • While identifying multiple available options that can be used to extract data, the challenge remained that data could not be extracted from the unstructured formats of the PDF files within this case study. In addition, the team faced limitations with the scalability and flexibility of a solution that allowed for consistent data extraction of over 200 tables, which ranged in formatting and length.


  • The team partnered with Acodis to use the data extraction solution, which allowed for the AI supported identification of data from each table. This solution provided flexibility across the ranging data formats and ensured that the data extracted from the PDF files was of the best available quality. As the data extraction tool allowed for identification of rows and columns, without the need for hard coding, it allowed for the extraction to be driven by a subject-matter expect, with no reliance on technical resources. This provided an additional benefit as it allowed for flexibility in the use of the tool and reduced architectural limitations.


  • Furthermore, the Acodis table recognition solution improved the scalability and speed of the high volume of data to be extracted, by recognizing tables that had a similar format, and reduced the level of fine-tuning required. The solution became familiar with the format, recognizing the columns, rows, and cells and reducing the manual effort required.
I am very pleased with your services. Special thanks to the manager. I will definitely come back again..-1

Complex Table Recognition and Extraction Demo with AI Data Extraction

To learn more about  Extract Tables From Documents in Bulk click here


The use of Acodis within the project allowed for the ability to extract good-quality data that was otherwise trapped in unusable formats. With the Acodis solution, the team extracted data from over 3500 pages of documents with complex 200 tables in them. The speedy and easy-to-use solution meant that the entire project was done in 3 weeks. 



> 200 Complex Tables


> 3500 Pages of Documents



This resulted in the ability to clean, map, and standardize raw clinical data. As a result, the team could analyze a high volume of data that was previously inaccessible and provide additional business insights, helping to reduce the need for conducting additional clinical trials.  

Youtube Thumbnail

This webinar offers actionable insights from a real case study with Bayer Consumer Health and Capgemini Engineering.

Get the actionable insights how to build a data pipeline that efficiently extracts and transforms data from clinical trial studies, all without requiring coding expertise.

How does it apply to your documents? Book a discovery call.

During the discovery call:


  • Discuss your specific business challenges and needs
  • See how the AI data extraction platform works
  • Test the platform on example documents