DATAFOREST logo
Data parsing case image
Home page  / Cases
Data parsing

Data parsing

We helped a law consulting company create a unique instrument to collect and store data from millions of pages from 5 different court sites. The scraped information included PDF, Word, JPG, and other files. The scripts were automated, so the collected files were updated when information changed.

14.8

mln

pages processed daily

43

sec

updates checking
Data parsing case image

About the client

Law Consulting company based in South America. Client automate, manage, classify and store files of court cases, documents and contracts of all kinds via AI algorithms.

Tech stack

Python icon
Python
Pandas icon
Pandas
PostqreSQL icon
PostqreSQL
Elasticsearch icon
Elasticsearch
GCP icon
GCP

The client's needs

Challenges & solutions

Challenge

Scrape data out of millions of pages and documents from five separate judicial websites; don’t overload them.

Solution

Created a distributed system architecture with Linux nodes and dynamic pipeline which makes managing high peaks and set priorities possible.

Created an algorithm, which scrapes new files immediately during the daytime based on traffic and makes massive updates during the nighttime.

Challenge

Daily collect updated data from the judicial cases files. Not only the structured data but also built-in PDF, Word, JPG files, etc.

Solution

Created a cloud SQL database with daily dumps to Elasticsearch to keep data we use; files directly uploaded to Elasticsearch.

Challenge

Keep scripts continuously running, add new cases files and update old ones if something had changed.

Solution

Implemented proxies and AI technologies used to overcome bot protection and process 14.8 million pages daily. Daily we download about 14 Gb of important data.

Challenge

Solution

Challenge

Solution

Challenge

Solution

Challenge

Solution

Challenge

Solution

Data parsing first slider image
Data parsing first slider image
Data parsing second slider image
Data parsing first slider image
Data parsing second slider image
Data parsing first slider image
Data parsing second slider image
Data parsing first slider image
Data parsing second slider image
gradient quote marks

These guys are fully dedicated to their client's success and go the extra mile to ensure things are done right.

Sebastian Torrealba photo

Sebastian Torrealba

CEO, Co-Founder DeepIA, Software for the Digital Transformation

Steps of providing data scraping services

Consultation icon

Step 1 of 5

Free consultation

It's a good time to get info about each other, share values and discuss your project in detail. We will advise you on a solution and try to help to understand if we are a perfect match for you.
Analysis icon

Step 2 of 5

Discovering and feasibility analysis

One of our core values is flexibility, hence we work with either one page high level requirements or with a full pack of tech docs.  At this stage, we need to ensure that we understand the full scope of the project. Receive from you or perform a set of interviews and prepare the following documents: list of features with detailed description and acceptance criteria; list of fields that need to be scraped, solution architecture. Ultimately we make a project plan which we strictly follow. We are a result-oriented company, and that is one of our core values as well.
Solutions icon

Step 3 of 5

Solution development

At this stage, we develop the scraping engine core logic. We run multiple tests to ensure that the solution is working properly. We map the fields and run the scraping. While scraping, we keep the full raw data so the final model can be enlarged easily. Ultimately we store data in any database and run quality assurance tests.
Data delivery icon

Step 4 of 5

Data delivery

After quality assurance tests are completed, we deliver data and solutions to the client. Though we have over 15 years of expertise in data engineering, we expect client’s participation in the project. While developing and crawling data, we provide midterm results so you can always see where we are and provide us with feedback. By the way, a high-level of communication is also our core value.
Support improvement icon

Step 5 of 5

Support and continuous improvement

We understand how crucial the solutions that we code for our clients are! Our goal is to build long-term relations, so we provide guarantees and support agreements. What is more, we are always happy to assist with further developments and statistics show that for us, 97% of our clients return to us with new projects.

Success stories

Check out a few case studies that show why DATAFOREST will meet your business needs.

E-commerce scraping

The dropshipping company needed a way to automatically monitor prices and stock availability for over 100,000 products from over 1,500 stores. We created a system using custom scripts and a web interface that could check 60 million pages daily. This led to a reduction in manual work and errors, and improvements in customer experience and a $50-70k increase in monthly profits.
1000h+

manual work reduced

60 mln

pages processed daily

Jonathan Lien photo

Jonathan Lien

CEO Advanced Clear Path, Inc., E-commerce Company
View case study
E-commerce scraping case image
gradient quote marks

They always find cutting-edge solutions, and they help bring our ideas to life.

Stock relocation solution

The client was faced with the challenge of creating an optimal assortment list for more than 2,000 drugstores located in 30 different regions. They turned to us for a solution. We used a mathematical model and AI algorithms that considered location, housing density and proximity to key locations to determine an optimal assortment list for each store. By integrating with POS terminals, we were able to improve sales and help the client to streamline its product offerings.
10%

productivity boost

7%

increase in sales

Mark S. photo

Mark S.

Partner Pharmacy network
View case study
E-commerce scraping case image
gradient quote marks

The team reliably achieves what they promise and does so at a competitive price. Another impressive trait is their ability to prioritize features more critical to the core solution.

Bank Data Analytics Platform

The Bank Data Analytics Platform project aims to develop a web-based application for the Client and its customers to query and analyze data relating to various banks and other financial institutions. The project involves building an interactive B2B web app with custom dashboards and analytics features, as well as using AI functionality to empower the application's development and analytics capabilities.
30+

system integrations

19%

CX boost

Stuart Theobald photo

Stuart Theobald

Chairman Intellidex, Financial Services and Consulting company
View case study
Bank Data Analytics Platform preview
gradient quote marks

They've had a quick grasp of what we are trying to do and delivered to our spec without a fuss.

Mailing Platform

We created a mailing platform that helps a multi-faceted union to send information about events, opportunities, and grants to all its member companies. The platform stores information about all member companies and makes it easy to send messages to the right people at the right time. This helps member companies to stay informed and improves the union's ability to support the country's businesses.
15%

CX boost

20%

20%

Gonzalo Ramos photo

Gonzalo Ramos

Partner Trade Union
View case study
Mailing Platform preview
gradient quote marks

DATAFOREST is reliable in delivering quality solutions. Besides, they demonstrate effective project management and communication despite working remotely.

Dominate the Market with Advanced Communication Efficiency Using AI
Lead the Curve with Advanced AI Agents and Smart Chatbots! Don’t Wait—Improve Your Communication Strategy and Get Ahead While Others Catch Up
Book a call

Latest publications

All publications
Preview article image
February 11, 2026
33 min

An Executive's Review: Top 15 ETL Tools for Data Transformation in 2026

Article preview
February 10, 2026
13 min

Best Data Engineering Companies for Enterprises in 2026

Article preview
February 9, 2026
14 min

13 Best AI Tools for Audio Editing in 2026

Latest publications

All publications
Preview article image
February 11, 2026
33 min

An Executive's Review: Top 15 ETL Tools for Data Transformation in 2026

Article preview
February 10, 2026
13 min

Best Data Engineering Companies for Enterprises in 2026

Article preview
February 9, 2026
14 min

13 Best AI Tools for Audio Editing in 2026

We’d love to hear from you

Share project details, like scope or challenges. We'll review and follow up with next steps.

form image
top arrow icon