We helped a law consulting company create a unique instrument to collect and store data from millions of pages from 5 different court sites. The scraped information included PDF, Word, JPG, and other files. The scripts were automated, so the collected files were updated when information changed.
Law Consulting company based in South America. Client automate, manage, classify and store files of court cases, documents and contracts of all kinds via AI algorithms.
Scrape data out of millions of pages and documents from five separate judicial websites; don’t overload them.
Created a distributed system architecture with Linux nodes and dynamic pipeline which makes managing high peaks and set priorities possible.
Created an algorithm, which scrapes new files immediately during the daytime based on traffic and makes massive updates during the nighttime.
Daily collect updated data from the judicial cases files. Not only the structured data but also built-in PDF, Word, JPG files, etc.
Created a cloud SQL database with daily dumps to Elasticsearch to keep data we use; files directly uploaded to Elasticsearch.
Keep scripts continuously running, add new cases files and update old ones if something had changed.
Implemented proxies and AI technologies used to overcome bot protection and process 14.8 million pages daily. Daily we download about 14 Gb of important data.
Sebastian Torrealba
Step 1 of 5
Step 2 of 5
Step 3 of 5
Step 4 of 5
Step 5 of 5
Step 1 of 5
Step 2 of 5
Step 3 of 5
Step 4 of 5
Step 5 of 5
Step 1 of 7
Step 2 of 7
Step 3 of 7
Step 4 of 7
Step 5 of 7
Step 6 of 7
Step 7 of 7
Step 1 of 5
Step 2 of 5
Step 3 of 5
Step 4 of 5
Step 5 of 5
Step 1 of 5
Step 2 of 5
Step 3 of 5
Step 4 of 5
Step 5 of 5
manual work reduced
pages processed daily
Jonathan Lien
They always find cutting-edge solutions, and they help bring our ideas to life.
productivity boost
increase in sales
Mark S.
The team reliably achieves what they promise and does so at a competitive price. Another impressive trait is their ability to prioritize features more critical to the core solution.
system integrations
CX boost
Stuart Theobald
They've had a quick grasp of what we are trying to do and delivered to our spec without a fuss.
CX boost
cost optimization
Gonzalo Ramos
DATAFOREST is reliable in delivering quality solutions. Besides, they demonstrate effective project management and communication despite working remotely.
Share the project details – like scope, mockups, or business challenges. We will carefully check and get back to you with the next steps.
Thanks for your submission!