This is the final project of the topics discipline; initials MS902; offered by IMECC - UNICAMP at the first semester of 2021.
The code was implemented on Databricks, using the student version. Its documentation is in Portuguese, including the tables.
Credits to:
Matheus Souza
Vinicius Jameli
Bryan Prado
Raul Teixeira
Gabriel Borin
Pedro Pietrafeza
Trademap is a mobile and desktop platform created to compile tools and contents that allow investors to monitor the market in real time, manage their investments and operate quickly, easily and safely.
One of these tools is its "News" interface, as showed in the image below.
In light of this, the project aims to create an alternative version of the web crawler that is already implemented in the app.
More specifically, Trademap uses 24 different websites to get the news. Then, our goal is to acess thoses websites and extract just the relevant news and show them for the user.
To acomplish that, we used a table with all companys listed in B3, the Brazilian stock market, including some international and important ones.
That is, to each line on the table, the crawler will search if there's any new in all websites that corresponds to the name of the company, or its name in the stock-market.
It's important to say that the crawler just search the initial page of the website.
Here's an example of how the news are disposed on the initial page, tooking the website https://www.infomoney.com.br/ as an example.
Every correspondency is saved in the table below:
Finally, when the user searches for news regarding any company, the table above is accessed and the output is the news in the dataframe about the company in question.
Number of companys: 517
Number of crypto coins: 10
Number of sites: 11 sites of 24 sites in the TradeMap
Average computational time of execution: 36 minutes