Merge pull request #566 from mitchstockdale/refactor-and-stability-im…

…provements Refactor and stability improvements
mitchelllisle · Jan 5, 2025 · e5c1e94 · e5c1e94
2 parents 3cf2c98 + ca88851
commit e5c1e94
Show file tree

Hide file tree

Showing 15 changed files with 621 additions and 543 deletions.
diff --git a/COVERAGE.txt b/COVERAGE.txt
@@ -1,6 +1,7 @@
-Name                          Stmts   Miss  Cover
--------------------------------------------------
-src/sparkdantic/__init__.py       4      0   100%
-src/sparkdantic/model.py        131      1    99%
--------------------------------------------------
-TOTAL                           135      1    99%
+Name                            Stmts   Miss  Cover
+---------------------------------------------------
+src/sparkdantic/__init__.py         4      0   100%
+src/sparkdantic/exceptions.py       1      0   100%
+src/sparkdantic/model.py          122      0   100%
+---------------------------------------------------
+TOTAL                             127      0   100%
diff --git a/README.md b/README.md
@@ -14,7 +14,8 @@ named `SparkModel` that extends the Pydantic's `BaseModel`.
 
 ## Features
 
-- Conversion from Pydantic model to PySpark schema.
+- Conversion from Pydantic model to PySpark schema
+- Type coercion
 
 ## Usage
 
@@ -47,6 +48,8 @@ class MyEnumModel(SparkModel):
 
 ### Generating a PySpark Schema
 
+#### Using `SparkModel`
+
 Pydantic has existing models for generating json schemas (with `model_json_schema`). With a `SparkModel` you can 
 generate a PySpark schema from the model fields using the `model_spark_schema()` method:
 
@@ -63,8 +66,25 @@ StructType([
     StructField('hobbies', ArrayType(StringType(), False), False)
 ])
 ```
+
+#### Using Pydantic `BaseModel`
+
+You can also generate a PySpark schema for existing Pydantic models using the `create_spark_schema` function:
+
+```python
+from sparkdantic import create_spark_schema
+
+class EmployeeModel(BaseModel):
+    id: int
+    first_name: str
+    last_name: str
+    department_code: str
+
+spark_schema = create_spark_schema(EmployeeModel)
+```
+
 > ℹ️  In addition to the automatic type conversion, you can also explicitly coerce data types to Spark native types by 
->  setting the `spark_type` attribute in the `Field` function from Pydantic, like so: `Field(spark_type=DataType)`.
+>  setting the `spark_type` attribute in the `SparkField` function (which extends the Pydantic `Field` function), like so: `SparkField(spark_type=DataType)`.
 >  Please replace DataType with the actual Spark data type you want to use.
 >  This is useful when you want to use a specific data type then the one that Sparkdantic infers by default. 
 

diff --git a/src/sparkdantic/exceptions.py b/src/sparkdantic/exceptions.py
@@ -0,0 +1,2 @@
+class TypeConversionError(Exception):
+    """Error converting a model field type to a PySpark type"""
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		class TypeConversionError(Exception):
		"""Error converting a model field type to a PySpark type"""