-
Notifications
You must be signed in to change notification settings - Fork 1
Home
A vision:
We argue for commensurable knowledge, and invest a substantial amount of time and effort into research cartography to that end, and yet aspects of this work are highly specialized and reify fundamental challenges to commensurability.
The knowledge atlas aims to unify our cartography efforts and to make research cartography an accessible and extensible activity for all researchers.
At the outset, there are 3 conceptual pieces:
- #3
- #4
- #5
The data is a common storage of all cartographic results, however an approximate abstraction of it is that it is a large table wherein there is a row for each paper (ever) and a column for each property we might use to differentiate papers. In essence, this is a system for desegregating pieces of research in a reliable way, through a systematic process of adding columns.
Views are mechanisms that allow people to contribute new columns or view results of queries to the data. Conceptually this is similar to the survey → csv
workflow we might have with Qualtrics or some of our other existing mapping technologies, but the view system would be designed specifically around supporting cartographic activities.
Automations are avenues for this valuable data product to get better without human intervention. As an example, products like Elicit add new algorithmically driven columns and we might want to sample from some of those in our mapping work. An example of an automation might be a process that extracts all a paper's references and the papers that cite it.
A critical concern in our mapping work has been navigating multiple levels of units of analysis. With this system I hope we can simplify to 3 main conceptual units, with the flexibility to extend beyond those as needed. The units I propose are driven by the possibility of defining them relatively consistently across a very large range of research activity:
-
publication
, the most common high level unit of research that we deal with -
claim
, a single instance of expressing a potential piece of knowledge expressed in a paper, based on data, an argument or some other type of evidence or assertion. -
observation
, a single instance of something that was measured in a research process.
GraphQL provides nice flexibility around specialized grouping factors, so we could add custom units as needed, e.g., studies and books. Studies, for example, are used differently in different fields and styles of papers. So instead of defaulting to study level analysis, leaning on a claim level analysis is likely more robust.
Of course, a core value of providing different units of analysis is allowing for views that utilize per unit features. For example, if we wanted to document each task in a set of papers that had team tasks, we might generate a unit called task
, that has a set of properties, such as the mapping dimensions we have ended up with in task mapping. Our view system would handle situations where a paper could have multiple tasks (or any other unit).
A system like this might be used in several different ways. Whenever we read a paper we might be encouraged to add an entry into atlas with a workflow that might be as simple as entering a DOI (which can be used to look up other bibliographic information), and jotting down any notes the reader has. In the case that something about the paper emerges as interesting to a particular type of column, a process can be launched to establish the value for it. Of course in most cases, the new column would be something that needs no particular process automation, such as "the number of participants in this work".
In a more sophisticated use, say if we were mapping a particular domain such as team tasks, we would generate a view specific to that mapping effort constructed of existing column's processes (read survey questions, but of course this could be other things), and new column processes. As we do this, we would have a validation process to establish and refine a column's procedure — something that we have done with various levels of repeatability in our attempts so far.
As a user trying to draw inference from the map, queries could call specific areas, e.g., the team task map, or could aggregate across a broader focus, e.g., to study effect sizes across all the social science papers in the atlas.
Another interesting use case is people outside our research team contributing, extending, or using the data. This seems like an extremely valuable goal for the broader community and one I intend to design around throughout this process.
This issue aims to serve as an initial vision. One that will be refined through feedback and refined as we build parts of this and try it out. I am intending to get a simple version off the ground quickly in a combination of a gradual layer 🥞 and slice 🍰 expansion. In other words, not all features will work on day one, and not all domains will work either, but the ground work will intend to make both those areas of expansion entirely plausible.
A key design point will be an incredibly strong focus on generating repeatable processes and self documenting scientific outcomes.
A few initial questions:
- Is there something obviously wrong with this idea that we appear to be overlooking?
- #6
- #7
- How do you think this data would be most useful? For example, a lot of the mapping dimensions will only apply to a small number of papers, will there still be value in thinking about those more broadly? Is there a limit to our goal of commensurable science driven by conceptual foci?