Skip to content

Latest commit

 

History

History
162 lines (126 loc) · 7.06 KB

README.md

File metadata and controls

162 lines (126 loc) · 7.06 KB
MedModels Logo

MedModels: A Rust-Powered Python Framework for Modern Healthcare Research

Motivation

Analyzing real-world evidence, especially patient data, is a complex task demanding accuracy and reproducibility. Currently, research teams often re-implement the same statistical methods and data processing pipelines, leading to inefficient codebases, faulty implementations and technical debt.

MedModels addresses these challenges by providing a standardized, reliable, and efficient framework for handling, processing, and analyzing electronic health records (EHR) and claims data.

Target Audience:

MedModels is designed for a wide range of users working with real-world data and electronic health records, including:

  • (Pharmaco-)Epidemiologists
  • Real-World Data Analysts
  • Health Economists
  • Clinicians
  • Data Scientists
  • Software Developers

Key Features

  • Rust-Based Data Class: Facilitates the efficient transformation of patient data into adaptable and scalable network graph structures.
  • High-Performance Computing: Handles large datasets in memory while maintaining fast processing speeds due to the underlying Rust implementation.
  • Standardized Workflows: Streamlines common tasks in real-world evidence analysis, reducing the need for custom code.
  • Interoperability: Supports collaboration and data sharing through a unified data structure and analysis framework.

Key Components

  • MedRecord Data Structure:

    • Graph-Based Representation: Organizes medical data using nodes (e.g., patients, medications, diagnoses) and edges (e.g., date, dosage, duration) to capture complex interactions and dependencies.
    • Efficient Querying: Enables efficient querying and retrieval of information from the graph structure, supporting various analytical tasks.
    • Dynamic Management: Provides methods to add, remove, and modify nodes and edges, as well as their associated attributes, allowing for flexible data manipulation.
    • Effortless Creation: Easily create a MedRecord from various data sources:
      • Pandas DataFrames: Seamlessly convert your existing Pandas DataFrames into a MedRecord.
      • Polars DataFrames: Alternatively, use Polars DataFrames as input for efficient data handling.
      • Standard Python Structures: Create a MedRecord directly from standard Python data structures like dictionaries and lists, offering flexibility for different data formats.
    • Grouping and Filtering: Allows grouping of nodes and edges for simplified management and targeted analysis of specific subsets of data.
    • High-Performance Backend: Built on a Rust backend for optimal performance and efficient handling of large-scale medical datasets.
  • Treatment Effect Analysis:

    • Estimating Treatment Effects: Provides a range of methods for estimating treatment effects from observational data, including:

      • Continuous Outcomes: Analyze treatment effects on continuous outcomes.
      • Binary Outcomes: Estimate odds ratios, risk ratios, and other metrics for binary outcomes.
      • Time-to-Event Outcomes: Perform survival analysis and estimate hazard ratios for time-to-event outcomes.
      • Effect Size Metrics: Calculate standardized effect size metrics like Cohen's d and Hedges' g.
    • Matching:

      • (High Dimensional) Propensity Score Matching: Reduce confounding bias by matching treated and untreated individuals based on their propensity scores.
      • Nearest Neighbor Matching: Match individuals based on similarity in their observed characteristics.

Getting Started

Installation:

MedModels can be installed from PyPI using the pip command:

pip install medmodels

Quick Start:

Here's a quick start guide showing an example of how to use MedModels to create a MedRecord object, add nodes and edges, and perform basic operations.

import pandas as pd
import medmodels as mm

# Patients DataFrame (Nodes)
patients = pd.DataFrame(
    [
        ["Patient 01", 72, "M", "USA"],
        ["Patient 02", 74, "M", "USA"],
        ["Patient 03", 64, "F", "GER"],
    ],
    columns=["ID", "Age", "Sex", "Loc"],
)

# Medications DataFrame (Nodes)
medications = pd.DataFrame(
    [["Med 01", "Insulin"], ["Med 02", "Warfarin"]], columns=["ID", "Name"]
)

# Patients-Medication Relation (Edges)
patient_medication = pd.DataFrame(
    [
        ["Patient 02", "Med 01", pd.Timestamp("20200607")],
        ["Patient 02", "Med 02", pd.Timestamp("20180202")],
        ["Patient 03", "Med 02", pd.Timestamp("20190302")],
    ],
    columns=["Pat_ID", "Med_ID", "Date"],
)

# Create a MedRecord object using the builder pattern
record = (
    mm.MedRecord.builder()
    .add_nodes((patients, "ID"), group="Patients")
    .add_nodes((medications, "ID"), group="Medications")
    .add_edges((patient_medication, "Pat_ID", "Med_ID"))
    .add_group("US-Patients", nodes=["Patient 01", "Patient 02"])
    .build()
)

# Print an combined overview of the nodes and edges in the MedRecord
print(record)

# You can also print only nodes and edges respectively
print(record.overview_nodes())
print(record.overview_edges())

# Accessing all available nodes
print(record.nodes)
# Output: ['Patient 03', 'Med 01', 'Med 02', 'Patient 01', 'Patient 02']

# Accessing a certain node and its attributes
print(record.node["Patient 01"])
# Output: {'Age': 72, 'Loc': 'USA', 'Sex': 'M'}

# Getting all available groups
print(record.groups)
# Output: ['Medications', 'Patients', 'US-Patients']

# Getting the nodes that are within a certain group
print(record.nodes_in_group("Medications"))
# Output: ['Med 02', 'Med 01']

# Save the MedRecord to a file in RON format
record.to_ron("record.ron")

# Load the MedRecord from the RON file
new_record = mm.MedRecord.from_ron("record.ron")