Skip to content

Latest commit

 

History

History
130 lines (88 loc) · 4.66 KB

notes.md

File metadata and controls

130 lines (88 loc) · 4.66 KB
User story:
As a {actor},
I want to generate code using the following technologies, requirements, and specifications:

Technologies: {technologies}
Requirements:
- {requirements}

Specifications:
- {specifications}
- Generate Python code to transform a data frame by consolidating the date and time columns into a datetime field column named 'created'.
- Use pandas to perform the data transformation.
- Load the data from a CSV file.
- Save the resulting data frame to disk in Parquet format.

For coding standards, please follow the guidelines outlined in the [PEP 8 style guide for Python](https://www.python.org/dev/peps/pep-0008/).

Please also create a unit test for the `transform_data` function to verify its correctness. The unit test should cover different scenarios and assert the expected behavior of the function.

Acceptance Criteria/Value:
- {acceptance_criteria}


In this updated version, the coding standard link is included within the Specifications section, providing a clear reference to the PEP 8 style guide for Python.

Additionally, the prompt now explicitly states to create a unit test for the transform_data function as part of the Specifications section. This ensures that developers are aware of the requirement to validate the correctness of the generated code.


import pandas as pd

def transform_data(input_file_path, output_file_path):
    # Load data from CSV file
    df = pd.read_csv(input_file_path)

    # Consolidate date and time columns into 'created' column
    df['created'] = pd.to_datetime(df['date'] + ' ' + df['time'])

    # Convert zone_id and log_id columns to integers
    df['zone_id'] = df['zone_id'].astype(int)
    df['log_id'] = df['log_id'].astype(int)

    # Drop date and time columns
    df = df.drop(['date', 'time'], axis=1)

    # Save transformed data frame to Parquet format
    df.to_parquet(output_file_path)

    return df.head(10)

import pandas as pd

def transform_data(input_file_path, output_file_path): # Load data from CSV file df = pd.read_csv(input_file_path)

# Consolidate date and time columns into 'created' column
df['created'] = pd.to_datetime(df['date'] + ' ' + df['time'])

# Convert zone_id and log_id columns to integers
df['zone_id'] = df['zone_id'].astype(int)
df['log_id'] = df['log_id'].astype(int)

# Drop date and time columns
df = df.drop(['date', 'time'], axis=1)

# Save transformed data frame to Parquet format
df.to_parquet(output_file_path)

return df.head(10)

import unittest
import pandas as pd
from mymodule import transform_data

class TestDataTransformation(unittest.TestCase):
    def setUp(self):
        self.input_file = 'input.csv'
        self.output_file = 'output.parquet'

    def tearDown(self):
        # Clean up the generated output file
        # (if any) after each test case
        try:
            os.remove(self.output_file)
        except FileNotFoundError:
            pass

    def test_data_transformation(self):
        # Generate a sample input CSV file
        input_data = {
            'log_id': [1, 2, 3, 4, 5],
            'zone_id': [100, 200, 300, 400, 500],
            'date': ['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05'],
            'time': ['08:00:00', '09:30:00', '12:15:00', '15:45:00', '18:20:00']
        }
        df_expected = pd.DataFrame(input_data)
        df_expected['created'] = pd.to_datetime(df_expected['date'] + ' ' + df_expected['time'])
        df_expected = df_expected.drop(['date', 'time'], axis=1)

        # Call the transform_data function
        transform_data(self.input_file, self.output_file)

        # Load the transformed data from the Parquet file
        df_actual = pd.read_parquet(self.output_file)

        # Assert that the transformed data matches the expected data
        pd.testing.assert_frame_equal(df_actual.head(10), df_expected.head(10))

if __name__ == '__main__':
    unittest.main()


In this example, a unit test class TestDataTransformation is created using the unittest.TestCase framework. The setUp method is used to set up the necessary input and output file paths, and the tearDown method is used to clean up the generated output file after each test case.

The test_data_transformation method defines a test case. It generates a sample input DataFrame, calls the transform_data function with the input file path and output file path, and loads the transformed data from the Parquet file. Finally, it asserts that the transformed data matches the expected data by using the pd.testing.assert_frame_equal function.

The generated unit test allows developers to verify the correctness of the transform_data function and ensure that it produces the expected output.