Welcome to Datahub Cloud!
[!info] You can publish your dataset or your data story quickly and easily with Datahub Cloud. The current version of Datahub Cloud runs only off Github.
There are two ways you can start publishing quickly with Datahub Cloud:
- Starting off from our template and customizing it to fit your needs
- Or if you want to live dangerously, you can start from scratch
[!info] We recommend starting with the template first to better understand how it works and what's possible.
- Go to https://github.com/datahubio/datahub-cloud-template
- Click "Use this template" at the top right and create the repository (can be public or private - works with both)
- Go to the app and create a new site by selecting the repository you just created (leave the "Root Dir" field empty)
- That's it! It is now published. Hit "Visit" at the top right to see what it looks like.
Now that you have the template deployed in your repository, you can add more features to it, or tidy it up in case you don't need everything.
TODO
If you want to eg.
[!info] This currently requires some basic technical background.
You can create a new repository in Github. Currently, any Github repository with the following is supported:
- Any markdown file + a
datapackage
frontmatter field (Frictionless data package spec) - Any
index.md/README.md
+ same leveldatapackage.{json/yaml/yml}
file
Note: Having data files that are not specifically listed in the data package are ignored.
Refer to the full list and API of available data visualization components here https://storybook.portaljs.org
- Click on the desired component you want to add from the sidebar list.
- Look at the right side of your screen to see what this component would look like
- Navigate to the bottom right of the component and click on "Show code"
- Copy-paste the code into your markdown file in your Github repository (if you haven't added one while creating the repo, do it now)
- TODO
- Catalog
- Excel
- Map
- Data Table
- Line Charts and Bar Charts
- Plotly Charts
- VegaLite Charts
You can use a number of data preview and data visualization components in order to make your dataset more insightful. Continue reading to explore other currently supported features.
If your dataset is part of a larger dataset collection, you may want to start by listing the related datasets here in a catalog-like component with search and facets like this one:
<Catalog datasets={[ { _id: '07026b22d49916754df1dc8ffb9ccd1c31878aae', file_path: 'content/dataset-4/index.md', metadata: { 'details-of-task': 'Detect and categorise abusive language in social media data', collection: 'Climate data', 'level-of-annotation': [ 'Posts' ], 'link-to-data': 'https://doi.org/10.6084/m9.figshare.19333298.v1', 'link-to-publication': 'https://arxiv.org/abs/2107.13592', medium: [ 'Text' ], 'percentage-abusive': 13.2, platform: [ 'Instagram', 'Youtube' ], reference: 'Nurce, E., Keci, J., Derczynski, L., 2021. Detecting Abusive Albanian. arXiv:2107.13592', 'size-of-dataset': 11874, 'task-description': 'Hierarchical (offensive/not; untargeted/targeted; person/group/other)', title: 'Climate Change' }, url_path: 'dataset-4' }, { _id: '42c86cf3c4fbbab11d91c2a7d6dcb8f750bc4e19', file_path: 'content/dataset-1/index.md', metadata: { 'details-of-task': 'Enriched versions of the OffensEval/OLID dataset with the distinction of explicit/implicit offensive messages and the new dimension for abusive messages. Labels for offensive language: EXPLICIT, IMPLICT, NOT; Labels for abusive language: EXPLICIT, IMPLICT, NOTABU', collection: 'Economy', 'level-of-annotation': [ 'Tweets' ], 'link-to-data': 'https://github.com/tommasoc80/AbuseEval', 'link-to-publication': 'http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.760.pdf', medium: [ 'Text' ], 'percentage-abusive': 20.75, platform: [ 'Twitter' ], reference: 'Caselli, T., Basile, V., Jelena, M., Inga, K., and Michael, G. 2020. "I feel offended, don’t be abusive! implicit/explicit messages in offensive and abusive language". The 12th Language Resources and Evaluation Conference (pp. 6193-6202). European Language Resources Association.', 'size-of-dataset': 14100, 'task-description': 'Explicitness annotation of offensive and abusive content', title: 'Economic Data and Indicators' }, url_path: 'dataset-1' }, { _id: '80001dd32a752421fdcc64e91fbd237dc31d6bb3', file_path: 'content/dataset-2/index.md', metadata: { 'details-of-task': 'Incivility', collection: 'Stock Market', 'level-of-annotation': [ 'Posts' ], 'link-to-data': 'http://alt.qcri.org/~hmubarak/offensive/AJCommentsClassification-CF.xlsx', 'link-to-publication': 'https://www.aclweb.org/anthology/W17-3008', medium: [ 'Text' ], 'percentage-abusive': 0.81, platform: [ 'AlJazeera' ], reference: 'Mubarak, H., Darwish, K. and Magdy, W., 2017. Abusive Language Detection on Arabic Social Media. In: Proceedings of the First Workshop on Abusive Language Online. Vancouver, Canada: Association for Computational Linguistics, pp.52-56.', 'size-of-dataset': 32000, 'task-description': 'Ternary (Obscene, Offensive but not obscene, Clean)', title: 'Stock Market Data' }, url_path: 'dataset-2' }, { _id: '96649d05d8193f4333b10015af76c6562971bd8c', file_path: 'content/dataset-3/index.md', metadata: { 'details-of-task': 'Detectioning CDC from abusive comments', collection: 'Climate data', 'level-of-annotation': [ 'Posts' ], 'link-to-data': 'https://github.com/shekharRavi/CoRAL-dataset-Findings-of-the-ACL-AACL-IJCNLP-2022', 'link-to-publication': 'https://aclanthology.org/2022.findings-aacl.21/', medium: [ 'Newspaper Comments' ], 'percentage-abusive': 100, platform: [ 'Posts' ], reference: 'Ravi Shekhar, Mladen Karan and Matthew Purver (2022). CoRAL: a Context-aware Croatian Abusive Language Dataset. Findings of the ACL: AACL-IJCNLP.', 'size-of-dataset': 2240, 'task-description': 'Multi-class based on context dependency categories (CDC)', title: 'Air pollution data' }, url_path: 'dataset-3' } ]} facets={[ 'collection', 'platform' ]} />
This makes it easy to navigate and quickly find or filter down the data you're looking for. You can add as many facets as you'd like.
==Excel==
If you're working a lot with Excel files, you can embed a preview of your file such as this one:
Simply upload your Excel file to your Github repository and replace the URL with the relative path of the uploaded file.
Note: This component allows you to present all tabs in your Excel file. You can sort the rows by clicking on the Column name and you can also filter each column by clicking on the triple bar symbol next to the Column name.
In case you're dealing with geo data, you can visualize your data on a GeoJSON polygons and points map with auto zoom in the points layer:
<Map autoZoomConfiguration={{ layerName: 'Points' }} center={{ latitude: 45, longitude: 0 }} layers={[ { data: 'https://opendata.arcgis.com/datasets/9c58741995174fbcb017cf46c8a42f4b_25.geojson', name: 'Points', tooltip: true }, { colorScale: { ending: '#00ff00', starting: '#ff0000' }, data: 'https://d2ad6b4ur7yvpq.cloudfront.net/naturalearth-3.3.0/ne_10m_geography_marine_polys.geojson', name: 'Polygons', tooltip: true } ]} title="Polygons and points" zoom={2} />
Let's continue by adding a table of your data like the one below:
There are also other types of charts and graphs you can use to enhance your dataset. You can create charts with Plotly, see how it renders:
[!info] Bar Chart
<PlotlyBarChart data={[ { temperature: -0.41765878, year: '1850' }, { temperature: -0.2333498, year: '1851' }, { temperature: -0.22939907, year: '1852' }, { temperature: -0.27035445, year: '1853' }, { temperature: -0.29163003, year: '1854' } ]} xAxis="year" yAxis="temperature" />
[!info] Line Chart
<PlotlyLineChart data={[ { temperature: -0.41765878, year: '1850' }, { temperature: -0.2333498, year: '1851' }, { temperature: -0.22939907, year: '1852' }, { temperature: -0.27035445, year: '1853' }, { temperature: -0.29163003, year: '1854' } ]} xAxis="year" yAxis="temperature" />
<VegaLite data={{ table: [ { x: 1850, y: -0.418 }, { x: 2020, y: 0.923 } ] }} spec={{ $schema: 'https://vega.github.io/schema/vega-lite/v4.json', data: { name: 'table' }, encoding: { x: { field: 'x', type: 'ordinal' }, y: { field: 'y', type: 'quantitative' } }, mark: 'bar' }} />
For a full list and API of available data visualisation components visit https://storybook.portaljs.org