Skip to content
forked from Evoluty/tdle

PageRank of Wikipedia france and search engine with ElasticSearch

Notifications You must be signed in to change notification settings

Obiwan1995/tdle

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TDLE

Project for the TDLE class in ENSIIE. This project should use elasticsearch, hadoop and map-reduce

Useful links

Elastic Search

Download it here: https://www.elastic.co/downloads/elasticsearch

Then, use ParsePageRank.java to parse the pagerank file into JSON format, which is needed to insert data into ElasticSearch.

To be able to insert everything at once, you need to increase the http request size in the file config/elasticsearch.yml: http.max_content_length: 500mb

You also need to increase the ElasticSearch heap size in the file config/jvm.options:

-Xms4G
-Xmx4G

Finally, type the following command in a shell:
curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@pagerank.json"
It may take a while so be patient. At the end, a lot of prints will be displayed in the shell.

About

PageRank of Wikipedia france and search engine with ElasticSearch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 88.0%
  • HTML 7.1%
  • Python 3.5%
  • Other 1.4%