Analyzing real-world evidence, especially patient data, is a complex task demanding accuracy and reproducibility. Currently, research teams often re-implement the same statistical methods and data processing pipelines, leading to inefficient codebases, faulty implementations and technical debt.
MedModels addresses these challenges by providing a standardized, reliable, and efficient framework for handling, processing, and analyzing electronic health records (EHR) and claims data.
Target Audience:
MedModels is designed for a wide range of users working with real-world data and electronic health records, including:
- (Pharmaco-)Epidemiologists
- Real-World Data Analysts
- Health Economists
- Clinicians
- Data Scientists
- Software Developers
- Rust-Based Data Class: Facilitates the efficient transformation of patient data into adaptable and scalable network graph structures.
- High-Performance Computing: Handles large datasets in memory while maintaining fast processing speeds due to the underlying Rust implementation.
- Standardized Workflows: Streamlines common tasks in real-world evidence analysis, reducing the need for custom code.
- Interoperability: Supports collaboration and data sharing through a unified data structure and analysis framework.
-
MedRecord Data Structure:
- Graph-Based Representation: Organizes medical data using nodes (e.g., patients, medications, diagnoses) and edges (e.g., date, dosage, duration) to capture complex interactions and dependencies.
- Efficient Querying: Enables efficient querying and retrieval of information from the graph structure, supporting various analytical tasks.
- Dynamic Management: Provides methods to add, remove, and modify nodes and edges, as well as their associated attributes, allowing for flexible data manipulation.
- Effortless Creation: Easily create a
MedRecord
from various data sources:- Pandas DataFrames: Seamlessly convert your existing Pandas DataFrames into a
MedRecord
. - Polars DataFrames: Alternatively, use Polars DataFrames as input for efficient data handling.
- Standard Python Structures: Create a
MedRecord
directly from standard Python data structures like dictionaries and lists, offering flexibility for different data formats.
- Pandas DataFrames: Seamlessly convert your existing Pandas DataFrames into a
- Grouping and Filtering: Allows grouping of nodes and edges for simplified management and targeted analysis of specific subsets of data.
- High-Performance Backend: Built on a Rust backend for optimal performance and efficient handling of large-scale medical datasets.
-
Treatment Effect Analysis:
-
Estimating Treatment Effects: Provides a range of methods for estimating treatment effects from observational data, including:
- Continuous Outcomes: Analyze treatment effects on continuous outcomes.
- Binary Outcomes: Estimate odds ratios, risk ratios, and other metrics for binary outcomes.
- Time-to-Event Outcomes: Perform survival analysis and estimate hazard ratios for time-to-event outcomes.
- Effect Size Metrics: Calculate standardized effect size metrics like Cohen's d and Hedges' g.
-
Matching:
- (High Dimensional) Propensity Score Matching: Reduce confounding bias by matching treated and untreated individuals based on their propensity scores.
- Nearest Neighbor Matching: Match individuals based on similarity in their observed characteristics.
-
Installation:
MedModels can be installed from PyPI using the pip
command:
pip install medmodels
Quick Start:
Here's a quick start guide showing an example of how to use MedModels to create a MedRecord
object, add nodes and edges, and perform basic operations.
import pandas as pd
import medmodels as mm
# Patients DataFrame (Nodes)
patients = pd.DataFrame(
[
["Patient 01", 72, "M", "USA"],
["Patient 02", 74, "M", "USA"],
["Patient 03", 64, "F", "GER"],
],
columns=["ID", "Age", "Sex", "Loc"],
)
# Medications DataFrame (Nodes)
medications = pd.DataFrame(
[["Med 01", "Insulin"], ["Med 02", "Warfarin"]], columns=["ID", "Name"]
)
# Patients-Medication Relation (Edges)
patient_medication = pd.DataFrame(
[
["Patient 02", "Med 01", pd.Timestamp("20200607")],
["Patient 02", "Med 02", pd.Timestamp("20180202")],
["Patient 03", "Med 02", pd.Timestamp("20190302")],
],
columns=["Pat_ID", "Med_ID", "Date"],
)
# Create a MedRecord object using the builder pattern
record = (
mm.MedRecord.builder()
.add_nodes((patients, "ID"), group="Patients")
.add_nodes((medications, "ID"), group="Medications")
.add_edges((patient_medication, "Pat_ID", "Med_ID"))
.add_group("US-Patients", nodes=["Patient 01", "Patient 02"])
.build()
)
# Print an combined overview of the nodes and edges in the MedRecord
print(record)
# You can also print only nodes and edges respectively
print(record.overview_nodes())
print(record.overview_edges())
# Accessing all available nodes
print(record.nodes)
# Output: ['Patient 03', 'Med 01', 'Med 02', 'Patient 01', 'Patient 02']
# Accessing a certain node and its attributes
print(record.node["Patient 01"])
# Output: {'Age': 72, 'Loc': 'USA', 'Sex': 'M'}
# Getting all available groups
print(record.groups)
# Output: ['Medications', 'Patients', 'US-Patients']
# Getting the nodes that are within a certain group
print(record.nodes_in_group("Medications"))
# Output: ['Med 02', 'Med 01']
# Save the MedRecord to a file in RON format
record.to_ron("record.ron")
# Load the MedRecord from the RON file
new_record = mm.MedRecord.from_ron("record.ron")