Skip to content

Commit

Permalink
Merge pull request #566 from mitchstockdale/refactor-and-stability-im…
Browse files Browse the repository at this point in the history
…provements

Refactor and stability improvements
  • Loading branch information
mitchelllisle authored Jan 5, 2025
2 parents 3cf2c98 + ca88851 commit e5c1e94
Show file tree
Hide file tree
Showing 15 changed files with 621 additions and 543 deletions.
13 changes: 7 additions & 6 deletions COVERAGE.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
Name Stmts Miss Cover
-------------------------------------------------
src/sparkdantic/__init__.py 4 0 100%
src/sparkdantic/model.py 131 1 99%
-------------------------------------------------
TOTAL 135 1 99%
Name Stmts Miss Cover
---------------------------------------------------
src/sparkdantic/__init__.py 4 0 100%
src/sparkdantic/exceptions.py 1 0 100%
src/sparkdantic/model.py 122 0 100%
---------------------------------------------------
TOTAL 127 0 100%
24 changes: 22 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ named `SparkModel` that extends the Pydantic's `BaseModel`.

## Features

- Conversion from Pydantic model to PySpark schema.
- Conversion from Pydantic model to PySpark schema
- Type coercion

## Usage

Expand Down Expand Up @@ -47,6 +48,8 @@ class MyEnumModel(SparkModel):

### Generating a PySpark Schema

#### Using `SparkModel`

Pydantic has existing models for generating json schemas (with `model_json_schema`). With a `SparkModel` you can
generate a PySpark schema from the model fields using the `model_spark_schema()` method:

Expand All @@ -63,8 +66,25 @@ StructType([
StructField('hobbies', ArrayType(StringType(), False), False)
])
```

#### Using Pydantic `BaseModel`

You can also generate a PySpark schema for existing Pydantic models using the `create_spark_schema` function:

```python
from sparkdantic import create_spark_schema

class EmployeeModel(BaseModel):
id: int
first_name: str
last_name: str
department_code: str

spark_schema = create_spark_schema(EmployeeModel)
```

> ℹ️ In addition to the automatic type conversion, you can also explicitly coerce data types to Spark native types by
> setting the `spark_type` attribute in the `Field` function from Pydantic, like so: `Field(spark_type=DataType)`.
> setting the `spark_type` attribute in the `SparkField` function (which extends the Pydantic `Field` function), like so: `SparkField(spark_type=DataType)`.
> Please replace DataType with the actual Spark data type you want to use.
> This is useful when you want to use a specific data type then the one that Sparkdantic infers by default.
Expand Down
2 changes: 2 additions & 0 deletions src/sparkdantic/exceptions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
class TypeConversionError(Exception):
"""Error converting a model field type to a PySpark type"""
Loading

0 comments on commit e5c1e94

Please sign in to comment.