Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation for exported ActivityPub tarballs #7

Merged
merged 8 commits into from
Jan 15, 2025
Binary file modified out/test-export-2024-01-01.tar
Binary file not shown.
89 changes: 89 additions & 0 deletions src/verify.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
import * as tar from 'tar-stream'
import { Readable } from 'stream'
import YAML from 'yaml'

/**
* Validates the structure and content of an exported ActivityPub tarball.
* @param tarBuffer - A Buffer containing the .tar archive.
* @returns A promise that resolves to an object with `valid` (boolean) and `errors` (string[]).
*/
export async function validateExportStream(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you please make it so tarBuffer can be a ReadableStream? That way, if the export is really big and the tar is really big, it doesn't have to be buffered in memory all at once.

I think you should be able have tar-stream parse the stream, async iterate through the tar entries, and ensure each entry is valid, all without every buffering all the entries in memory

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you r abs right, i should consider that, thanks

tarBuffer: Buffer
): Promise<{ valid: boolean; errors: string[] }> {
const extract = tar.extract()
const errors: string[] = []
const requiredFiles = [
'manifest.yaml',
'activitypub/actor.json',
'activitypub/outbox.json'
]
const foundFiles = new Set()

return await new Promise((resolve) => {
extract.on('entry', (header, stream, next) => {
const fileName = header.name
foundFiles.add(fileName)

let content = ''
stream.on('data', (chunk) => {
content += chunk.toString()
})

stream.on('end', () => {
try {
// Validate JSON files
if (fileName.endsWith('.json')) {
JSON.parse(content) // Throws an error if content is not valid JSON
}

// Validate manifest file
if (fileName === 'manifest.yaml') {
const manifest = YAML.parse(content)
if (!manifest['ubc-version']) {
errors.push('Manifest is missing required field: ubc-version')
}
if (!manifest.contents?.activitypub) {
errors.push(
'Manifest is missing required field: contents.activitypub'
)
}
}
} catch (error: any) {
errors.push(`Error processing file ${fileName}: ${error.message}`)
}
next()
})

stream.on('error', (error) => {
errors.push(`Stream error on file ${fileName}: ${error.message}`)
next()
})
})

extract.on('finish', () => {
// Check if all required files are present
for (const file of requiredFiles) {
if (!foundFiles.has(file)) {
errors.push(`Missing required file: ${file}`)
}
}

resolve({
valid: errors.length === 0,
errors
})
})

extract.on('error', (error) => {
errors.push(`Error during extraction: ${error.message}`)
resolve({
valid: false,
errors
})
})

// Convert Buffer to a Readable stream and pipe it to the extractor
const stream = Readable.from(tarBuffer)
stream.pipe(extract)
})
}
Binary file removed test/fixtures/account2.tar
Binary file not shown.
Binary file added test/fixtures/tarball-samples/missing-actor.tar
Binary file not shown.
Binary file not shown.
Binary file added test/fixtures/tarball-samples/valid-export.tar
Binary file not shown.
6 changes: 4 additions & 2 deletions test/index.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,15 @@ describe('exportActorProfile', () => {
describe('importActorProfile', () => {
it('extracts and verifies contents from account2.tar', async () => {
// Load the tar file as a buffer
const tarBuffer = fs.readFileSync('test/fixtures/account2.tar')
const tarBuffer = fs.readFileSync(
'test/fixtures/tarball-samples/valid-export.tar'
)

// Use the importActorProfile function to parse the tar contents
const importedData = await importActorProfile(tarBuffer)

// Log or inspect the imported data structure
console.log('Imported Data:', importedData)
// console.log('Imported Data:', importedData)

// Example assertions to check specific files and content
expect(importedData).to.have.property('activitypub/actor.json')
Expand Down
76 changes: 76 additions & 0 deletions test/verify.spec.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
import { expect } from 'chai'
import { readFileSync } from 'fs'
import { validateExportStream } from '../src/verify'

describe('validateExportStream', () => {
it('should validate a valid tarball', async () => {
// Load a valid tarball (e.g., exported-profile-valid.tar)
const tarBuffer = readFileSync(
'test/fixtures/tarball-samples/valid-export.tar'
)
const result = await validateExportStream(tarBuffer)

expect(result.valid).to.be.true
expect(result.errors).to.be.an('array').that.is.empty
})

it('should fail if manifest.yaml is missing', async () => {
// Load a tarball with missing manifest.yaml
const tarBuffer = readFileSync(
'test/fixtures/tarball-samples/missing-manifest.tar'
)
const result = await validateExportStream(tarBuffer)

expect(result.valid).to.be.false
})

it('should fail if actor.json is missing', async () => {
// Load a tarball with missing actor.json
const tarBuffer = readFileSync(
'test/fixtures/tarball-samples/missing-actor.tar'
)
const result = await validateExportStream(tarBuffer)

expect(result.valid).to.be.false
console.log(JSON.stringify(result.errors))
})

// it('should fail if outbox.json is missing', async () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests look useful. Can we include them uncommented?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i will but it will add more tar files to the src, do you have a better way instead having them in the source code?

// // Load a tarball with missing outbox.json
// const tarBuffer = readFileSync(
// 'test/fixtures/exported-profile-missing-outbox.tar'
// )
// const result = await validateExportStream(tarBuffer)

// expect(result.valid).to.be.false
// expect(result.errors).to.include(
// 'Missing required file: activitypub/outbox.json'
// )
// })

// it('should fail if actor.json contains invalid JSON', async () => {
// // Load a tarball with invalid JSON in actor.json
// const tarBuffer = readFileSync(
// 'test/fixtures/exported-profile-invalid-actor-json.tar'
// )
// const result = await validateExportStream(tarBuffer)

// expect(result.valid).to.be.false
// expect(result.errors).to.include(
// 'Error processing file activitypub/actor.json: Unexpected token } in JSON at position 42'
// )
// })

// it('should fail if manifest.yaml is invalid', async () => {
// // Load a tarball with invalid manifest.yaml
// const tarBuffer = readFileSync(
// 'test/fixtures/exported-profile-invalid-manifest.tar'
// )
// const result = await validateExportStream(tarBuffer)

// expect(result.valid).to.be.false
// expect(result.errors).to.include(
// 'Manifest is missing required field: ubc-version'
// )
// })
})
Loading