scrapinghub · chekunkov · Jul 31, 2013 · Jul 31, 2013 · Jul 31, 2013 · Jul 31, 2013
diff --git a/.gitignore b/.gitignore
@@ -24,6 +24,7 @@ pip-log.txt
 # Unit test / coverage reports
 .coverage
 .tox
+cover
 nosetests.xml
 
 # Translations
@@ -35,5 +36,7 @@ nosetests.xml
 .pydevproject
 
 # Other
+.idea
 webstruct_data/datastore
-
+.ipynb_checkpoints
+docs/_build
diff --git a/README.rst b/README.rst
@@ -0,0 +1,27 @@
+Webstruct
+=========
+
+Webstruct is a library for creating statistical NER_ systems that work
+on HTML data, i.e. a library for building tools that extract named
+entities (addresses, organization names, open hours, etc) from webpages.
+
+Unlike most NER systems, webstruct works on HTML data, not only
+on text data. This allows to define features that use HTML structure,
+and also to embed annotation results back into HTML.
+
+Read the docs_ for more info.
+
+License is MIT.
+
+.. _docs: http://webstruct.readthedocs.org/en/latest/
+.. _NER: http://en.wikipedia.org/wiki/Named-entity_recognition
+
+Contributing
+------------
+
+* Source code: https://github.com/scrapinghub/webstruct
+* Bug tracker: https://github.com/scrapinghub/webstruct/issues
+
+To run tests, make sure nose_ is installed, then run ``runtests.sh`` script.
+
+.. _nose: https://github.com/nose-devs/nose
diff --git a/block_model/README.md b/block_model/README.md
diff --git a/block_model/convert_html.py b/block_model/convert_html.py
diff --git a/block_model/convert_labeled_data.py b/block_model/convert_labeled_data.py
diff --git a/block_model/data/1.html b/block_model/data/1.html
diff --git a/block_model/data/1.txt b/block_model/data/1.txt