Data Forest logo
Data parsing case image
Home page  / Cases
Data parsing

Data parsing

We helped a law consulting company create a unique instrument to collect and store data from millions of pages from 5 different court sites. The scraped information included PDF, Word, JPG, and other files. The scripts were automated, so the collected files were updated when information changed.

14.8 mln

pages processed daily

43 sec

updates checking
Data parsing case image

About the client

Law Consulting company based in South America. Client automate, manage, classify and store files of court cases, documents and contracts of all kinds via AI algorithms.

Tech stack

Python icon
Python
Pandas icon
Pandas
PostqreSQL icon
PostqreSQL
Elasticsearch icon
Elasticsearch
GCP icon
GCP

The client's needs

Challenges & solutions

Challenge

Scrape data out of millions of pages and documents from five separate judicial websites; don’t overload them.

Solution

Created a distributed system architecture with Linux nodes and dynamic pipeline which makes managing high peaks and set priorities possible.

Created an algorithm, which scrapes new files immediately during the daytime based on traffic and makes massive updates during the nighttime.

Challenge

Daily collect updated data from the judicial cases files. Not only the structured data but also built-in PDF, Word, JPG files, etc.

Solution

Created a cloud SQL database with daily dumps to Elasticsearch to keep data we use; files directly uploaded to Elasticsearch.

Challenge

Keep scripts continuously running, add new cases files and update old ones if something had changed.

Solution

Implemented proxies and AI technologies used to overcome bot protection and process 14.8 million pages daily. Daily we download about 14 Gb of important data.

Challenge

Solution

Challenge

Solution

Challenge

Solution

Challenge

Solution

Challenge

Solution

Results

Data parsing first slider image
Data parsing first slider image
Data parsing second slider image
Data parsing first slider image
Data parsing second slider image
Data parsing first slider image
Data parsing second slider image
Data parsing first slider image
Data parsing second slider image
gradient quote marks

These guys are fully dedicated to their client's success and go the extra mile to ensure things are done right.

Sebastian Torrealba photo

Sebastian Torrealba

CEO, Co-Founder DeepIA, Software for the Digital Transformation

Steps of providing
data scraping services

Consultation icon

Step 1 of 5

Free consultation

It's a good time to get info about each other, share values and discuss your project in detail. We will advise you on a solution and try to help to understand if we are a perfect match for you.
Analysis icon

Step 2 of 5

Discovering and feasibility analysis

One of our core values is flexibility, hence we work with either one page high level requirements or with a full pack of tech docs.  At this stage, we need to ensure that we understand the full scope of the project. Receive from you or perform a set of interviews and prepare the following documents: list of features with detailed description and acceptance criteria; list of fields that need to be scraped, solution architecture. Ultimately we make a project plan which we strictly follow. We are a result-oriented company, and that is one of our core values as well.
Solutions icon

Step 3 of 5

Solution development

At this stage, we develop the scraping engine core logic. We run multiple tests to ensure that the solution is working properly. We map the fields and run the scraping. While scraping, we keep the full raw data so the final model can be enlarged easily. Ultimately we store data in any database and run quality assurance tests.
Data delivery icon

Step 4 of 5

Data delivery

After quality assurance tests are completed, we deliver data and solutions to the client. Though we have over 15 years of expertise in data engineering, we expect client’s participation in the project. While developing and crawling data, we provide midterm results so you can always see where we are and provide us with feedback. By the way, a high-level of communication is also our core value.
Support improvement icon

Step 5 of 5

Support and continuous improvement

We understand how crucial the solutions that we code for our clients are! Our goal is to build long-term relations, so we provide guarantees and support agreements. What is more, we are always happy to assist with further developments and statistics show that for us, 97% of our clients return to us with new projects.

How we provide data integration solutions

Consultation icon

Step 1 of 5

Free consultation

It's a good time to get info about each other, share values and discuss your project in detail. We will advise you on a solution and try to help to understand if we are a perfect match for you.
Analysis icon

Step 2 of 5

Discovering and feasibility analysis

One of our core values is flexibility, hence we work with either one page high level requirements or with a full pack of tech docs.  

At this stage, we need to ensure that we understand the full scope of the project. We receive from you or perform a set of interviews and prepare the following documents: integration pipeline (which data we should get and where to upload), process logic (how system should work); use cases and acceptance criteria; solution architecture. Ultimately we make a project plan which we strictly follow.
Solutions icon

Step 3 of 5

Solution development

At this stage, we build ETL pipelines and necessary APIs to automate the process. We attract our DevOps team to build the most efficient and scalable solution. Ending up with unit tests and quality assurance tests to ensure that the solution is working properly. Focus on Results is one of our core values as well.
Data delivery icon

Step 4 of 5

Solution delivery

After quality assurance tests are completed, we deliver solutions to the client. Though we have over 15 years of expertise in data engineering, we are expecting client’s participation in the project. While developing the integration system, we provide midterm results so you can always see where we are and provide us with feedback. By the way, a high-level of communication is also our core value.
Support improvement icon

Step 5 of 5

Support and continuous improvement

We understand how crucial the solutions that we code for our clients are! Our goal is to build long-term relations, so we provide guarantees and support agreements. What is more, we are always happy to assist with further developments and statistics show that for us, 97% of our clients return to us with new projects.

Steps of providing web applications services

Consultation icon

Step 1 of 7

Web development discovery

In the initial stage of the web-based development project, professional business analysts make detailed documentation of the project requirements and the approximate structure of the future web application. DATAFOREST is a custom web application development agency, guided by extensive experience in multiple industries. We give you detailed project documentation and then assemble the team according to your time and budget.
Analysis icon

Step 2 of 7

UX and UI design

Based on your wishes, the needs of your target audience, and the best web application design and development practices, our UX and UI experts create an aesthetically pleasing and user-friendly interface for your app to satisfy even the most demanding users.
Solutions icon

Step 3 of 7

Web-based application development

At DATAFOREST we are following the best programming design principles and approaches. Being a data engineering company, we build high load platforms, with a significant level of flexibility and result orientation. We keep our deadlines and follow SOC 2 compliance requirements.
Data delivery icon

Step 4 of 7

Integration

With DATAFOREST, integrating the application into your business won’t stop your processes for a minute. We provide seamless integration with your software infrastructure and ensure smooth operation in no time.
Quality assurance icon

Step 5 of 7

Quality assurance

We use a multi-level quality assurance system to avoid any unforeseen issues. Working with DATAFOREST, you can be confident that your web app development service solutions arrive to the market polished and in full compliance with all certification requirements.
Support improvement icon

Step 6 of 7

24/7 support

Once a product is released to the market, it’s crucial to keep it running smoothly. That’s why our experts provide several models of post-release support to ensure application uptime and stable workflows, increasing user satisfaction.
Web app improvement icon

Step 7 of 7

Web app continuous improvement

Every truly high-quality software product has to constantly evolve to keep up with the times. We understand this, and therefore we provide services for updating and refining our software, as well as introducing new features to meet the growing needs of your business and target audience.

The way we deal with your task and help achieve results

Consultation icon

Step 1 of 5

Free consultation

It's a good time to get info about each other, share values and discuss your project in detail. We will advise you on a solution and try to help to understand if we are a perfect match for you.
Analysis icon

Step 2 of 5

Discovering and feasibility analysis

One of our core values is flexibility, hence we work with either one page high level requirements or with a full pack of tech docs.  

In Data Science, there are numerous models and approaches, so at this stage we perform a set of interviews in order to define project objectives. We elaborate and discuss a set of hypotheses and assumptions. We create solution architecture, a project plan, and a list of insights or features that we have to achieve.
Solutions icon

Step 3 of 5

Solution development

The work starts with data gathering, data cleaning and analysis. Feature engineering helps to determine your target variable and build several models for the initial review. Further modeling requires validating results and selecting models for the further development. Ultimately, we interpret the results. Nevertheless, data modeling is about a process that requires lots of back and forth iterations. We are result focused, as it’s one of our core values as well.
Data delivery icon

Step 4 of 5

Solution delivery

Data Science solutions can be a list of insights or a variety of different models that consume data and return results. Though we have over 15 years of expertise in data engineering, we expect client’s participation in the project.  While modeling, we provide midterm results so you can always see where we are and provide us with feedback. By the way, a high-level of communication is also our core value.
Support improvement icon

Step 5 of 5

Support and continuous improvement

We understand how crucial the solutions that we code for our clients are! Our goal is to build long-term relations, so we provide guarantees and support agreements. What is more, we are always happy to assist with further developments and statistics show that for us, 97% of our clients return to us with new projects.

The way we deal with your issue and achieve result

Consultation icon

Free consultation

Step 1 of 5

It's a good time to get info about each other, share values and discuss your project in detail. We will advise you on a solution and try to help to understand if we are a perfect match for you.
Analysis icon

Step 2 of 5

Discovering and feasibility analysis

One of our core values is flexibility, hence we work with either one page high level requirements or with a full pack of tech docs.  

Depending on project objectives, DevOps activity requires auditing the current approach, running metrics measurement, performing monitoring and checking logs. By having a set of interviews, we ensure that we understand the full scope of the project. Ultimately we make a project plan which we strictly follow. We are a result-oriented DevOps service provider company, and that is one of our core values as well.
Solutions icon

Step 3 of 5

Solution development

At this stage, our certified DevOps engineers refine the product backlog. We deliver great results within digital transformation, cost optimization, CI/CD setup, containerization, and, last but not least, monitoring and logging. We are a result focused company – it’s one of our core values.
Data delivery icon

Step 4 of 5

Solution delivery

After quality assurance tests are completed, we deliver solutions to the client. Though we have over 15 years of expertise in data engineering, we expect client’s participation in the project. By the way, a high-level of communication is also our core value.
Support improvement icon

Step 5 of 5

Support and continuous improvement

We understand how crucial the solutions that we code for our clients are! Our goal is to build long-term relations, so we provide guarantees and support agreements. What is more, we are always happy to assist with further developments and statistics show that for us, 97% of our clients return to us with new projects.

Success stories

Check out a few case studies that show why DATAFOREST will meet your business needs.

E-commerce scraping

The dropshipping company needed a way to automatically monitor prices and stock availability for over 100,000 products from over 1,500 stores. We created a system using custom scripts and a web interface that could check 60 million pages daily. This led to a reduction in manual work and errors, and improvements in customer experience and a $50-70k increase in monthly profits.
1000h+

manual work reduced

60 mln

pages processed daily

Jonathan Lien photo

Jonathan Lien

CEO Advanced Clear Path, Inc., E-commerce Company
View case study
E-commerce scraping case image
gradient quote marks

They always find cutting-edge solutions, and they help bring our ideas to life.

Stock relocation solution

The client was faced with the challenge of creating an optimal assortment list for more than 2,000 drugstores located in 30 different regions. They turned to us for a solution. We used a mathematical model and AI algorithms that considered location, housing density and proximity to key locations to determine an optimal assortment list for each store. By integrating with POS terminals, we were able to improve sales and help the client to streamline its product offerings.
10%

productivity boost

7%

increase in sales

Mark S. photo

Mark S.

Partner Pharmacy network
View case study
Stock relocation preview
gradient quote marks

The team reliably achieves what they promise and does so at a competitive price. Another impressive trait is their ability to prioritize features more critical to the core solution.

Bank Data Analytics Platform

The Bank Data Analytics Platform project aims to develop a web-based application for the Client and its customers to query and analyze data relating to various banks and other financial institutions. The project involves building an interactive B2B web app with custom dashboards and analytics features, as well as using AI functionality to empower the application's development and analytics capabilities.
30+

system integrations

19%

CX boost

Stuart Theobald photo

Stuart Theobald

Chairman Intellidex, Financial Services and Consulting company
View case study
Bank Data Analytics Platform preview
gradient quote marks

They've had a quick grasp of what we are trying to do and delivered to our spec without a fuss.

Mailing Platform

We created a mailing platform that helps a multi-faceted union to send information about events, opportunities, and grants to all its member companies. The platform stores information about all member companies and makes it easy to send messages to the right people at the right time. This helps member companies to stay informed and improves the union's ability to support the country's businesses.
15%

CX boost

20%

cost optimization

Gonzalo Ramos photo

Gonzalo Ramos

Partner Trade Union
View case study
Mailing Platform preview
gradient quote marks

DATAFOREST is reliable in delivering quality solutions. Besides, they demonstrate effective project management and communication despite working remotely.

Latest publications

All publications
Article image preview
February 8, 2024
13 min

RPA vs. Intelligent Automation — Rules Against Learning

Article image
February 8, 2024
12 min

Digital Transformation for Small Businesses: Get Ahead!

Article image preview
February 8, 2024
11 min

Data Life Cycle Management: Advancing Business Via Science and Security

All publications

We’d love to hear from you

Share the project details – like scope, mockups, or business challenges. We will carefully check and get back to you with the next steps.

Thanks for your submission!

DATAFOREST worker
DataForest, Head of Sales Department
DataForest worker
DataForest company founder
top arrow icon