Skip to content

Commit

Permalink
[Node.js] Fix incomplete rows handling in streaming transformations (#…
Browse files Browse the repository at this point in the history
…204)

Prepare 0.2.4, update contribution guide
  • Loading branch information
slvrtrn authored Oct 17, 2023
1 parent 5cb9d52 commit 2552dbb
Show file tree
Hide file tree
Showing 10 changed files with 257 additions and 66 deletions.
14 changes: 7 additions & 7 deletions .build/build_and_prepare.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,21 @@ import { execSync } from 'child_process'
import fs from 'fs'
import * as process from 'process'
;(() => {
const [tag] = process.argv.slice(2)
if (!tag) {
console.error(`Expected a tag as an argument`)
const [pkg] = process.argv.slice(2)
if (!pkg) {
console.error(`Expected package name as an argument`)
process.exit(1)
}

let packageName = ''
if (tag.endsWith('-web')) {
if (pkg.endsWith('web')) {
packageName = 'client-web'
} else if (tag.endsWith('-node')) {
} else if (pkg.endsWith('node')) {
packageName = 'client-node'
} else if (tag.endsWith('-common')) {
} else if (pkg.endsWith('common')) {
packageName = 'client-common'
} else {
console.error(`Provided tag ${tag} does not match any packages`)
console.error(`Provided tag ${pkg} does not match any packages`)
process.exit(1)
}

Expand Down
3 changes: 2 additions & 1 deletion .docker/nginx/local.conf
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ upstream clickhouse_cluster {

server {
listen 8123;
client_max_body_size 100M;
location / {
proxy_pass http://clickhouse_cluster;
}
}
}
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
## 0.2.4 (Node.js only)

No changes in web/common modules.

### Bug fixes

- (Node.js only) Fixed an issue where streaming large datasets could provide corrupted results. See [#171](https://github.com/ClickHouse/clickhouse-js/issues/171) (issue) and [#204](https://github.com/ClickHouse/clickhouse-js/pull/204) (PR) for more details.

## 0.2.3 (Node.js only)

No changes in web/common modules.
Expand Down
116 changes: 79 additions & 37 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ and help us maintain up-to-date documentation.

You have installed:

- a compatible LTS version of nodejs: `v14.x`, `v16.x` or `v18.x`
- NPM >= `6.x`
- a compatible LTS version of Node.js: `v16.x`, `v18.x` or `v20.x`
- NPM >= `9.x`

### Create a fork of the repository and clone it

Expand All @@ -28,11 +28,20 @@ npm i
### Add /etc/hosts entry

Required for TLS tests.
The generated certificates assume TLS requests use `server.clickhouseconnect.test` as the hostname.
See [tls.test.ts](packages/client-node/__tests__/tls/tls.test.ts) for more details.

```bash
sudo -- sh -c "echo 127.0.0.1 server.clickhouseconnect.test >> /etc/hosts"
```

## Style Guide

We use an automatic code formatting with `prettier` and `eslint`, both should be installed after running `npm i`.

Additionally, every commit should trigger a [Husky](https://typicode.github.io/husky/) Git hook that applies `prettier`
and checks the code with `eslint` via `lint-staged` automatically.

## Testing

Whenever you add a new feature to the package or fix a bug,
Expand All @@ -41,17 +50,22 @@ everyone in the community can safely benefit from your contribution.

### Tooling

We use [Jasmine](https://jasmine.github.io/index.html) as a test runner.
We use [Jasmine](https://jasmine.github.io/index.html) as a test runner,
as it is compatible with both Node.js and Web (Karma) clients tests.

### Type check and linting
Karma is to be replaced, see [#183](https://github.com/ClickHouse/clickhouse-js/issues/183),
which might allow us to use different test framework.

### Type checking and linting

Both checks can be run manually:

```bash
npm run typecheck
npm run lint:fix
```

We use [Husky](https://typicode.github.io/husky) for pre-commit hooks,
so it will be executed before every commit.
However, usually, it is enough to rely on Husky Git hooks.

### Running unit tests

Expand Down Expand Up @@ -89,10 +103,16 @@ Start a single ClickHouse server using Docker compose:
docker-compose up -d
```

and then run the tests:
Run the tests (Node.js):

```bash
npm run test:integration
npm run test:node:integration
```

Run the tests (Web, Karma):

```bash
npm run test:web:integration
```

#### Running TLS integration tests
Expand All @@ -105,10 +125,10 @@ Start the containers first:
docker-compose up -d
```

and then run the tests:
and then run the tests (Node.js only):

```bash
npm run test:tls
npm run test:node:tls
```

#### Local two nodes cluster integration tests
Expand All @@ -121,10 +141,16 @@ Start a ClickHouse cluster using Docker compose:
docker compose -f docker-compose.cluster.yml up -d
```

and then run the tests:
Run the tests (Node.js):

```bash
npm run test:integration:local_cluster
npm run test:node:integration:local_cluster
```

Run the tests (Web, Karma):

```bash
npm run test:web:integration:local_cluster
```

#### Cloud integration tests
Expand All @@ -139,50 +165,66 @@ CLICKHOUSE_CLOUD_HOST=<host>
CLICKHOUSE_CLOUD_PASSWORD=<password>;
```

Given these environment variables are set, you can run the tests:
With these environment variables set, you can run the tests.

Node.js:

```bash
npm run test:integration:cloud
npm run test:node:integration:cloud
```

Web + Karma:

```bash
npm run test:web:integration:cloud
```

## CI

GitHub Actions should execute integration test jobs in parallel
after we complete the TypeScript type check, lint check, and unit tests.
GitHub Actions should execute integration test jobs for both Node.js and Web versions in parallel
after we complete the TypeScript type check, lint check, and Node.js unit tests.

```
Build + Unit tests
├─ Integration tests (single local node in Docker)
├─ Integration tests (a cluster of local two nodes in Docker)
└─ Integration tests (Cloud)
Typecheck + Lint + Node.js client unit tests
├─ Node.js client integration + TLS tests (single local node in Docker)
├─ Node.js client integration tests (a cluster of local two nodes in Docker)
├─ Node.js client integration tests (Cloud)
├─ Web client integration tests (single local node in Docker)
├─ Web client integration tests (a cluster of local two nodes in Docker)
└─ Web client integration tests (Cloud)
```

## Style Guide

We use an automatic code formatting with `prettier` and `eslint`.

## Test Coverage

We try to aim for at least 90% tests coverage.
Prior to switching from Jest to Jasmine with multiple workspaces and client flavours,
the reported test coverage was above 90%. We generally aim towards that threshold, if it deems reasonable.

Coverage is collected and pushed to the repo automatically
in the end of each main branch CI run.
Currently, automatic coverage reports are disabled.
See [#177](https://github.com/ClickHouse/clickhouse-js/issues/177), as it should be restored in the scope of that issue.

See [tests.yml](./.github/workflows/tests.yml)
`upload-coverage-and-badge` job for more details.
## Release process

You can collect and check the coverage locally by running all tests
(unit, integration, TLS):
Don't forget to change the package version in `packages/**/src/version.ts` before the release.
We prefer to keep versions the same across the packages, and release all at once, even if there were no changes in some.

Common package manual release:

```bash
docker-compose up -d
npm t -- --coverage
ts-node .build/build_and_prepare.ts common && npm pack && npm publish
```

Please don't commit the coverage reports manually.
Node.js client manual release:

## Update package version
```bash
ts-node .build/build_and_prepare.ts node && npm pack && npm publish
```

Don't forget to change the package version in `src/version.ts` before the release.
Web client manual release:

```bash
ts-node .build/build_and_prepare.ts web && npm pack && npm publish
```

`release` GitHub action will pick it up and replace `package.json` version automatically.
For simplicity, `build_and_prepare.ts` just overrides the root `package.json`,
which allows to use `npm pack` and `npm publish` as usual despite having multiple workspaces.
Don't commit the generated `package.json` after the manual release.
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
"prepare": "husky install"
},
"devDependencies": {
"@faker-js/faker": "^8.2.0",
"@types/jasmine": "^4.3.2",
"@types/node": "^18.11.18",
"@types/sinon": "^10.0.15",
Expand Down
2 changes: 1 addition & 1 deletion packages/client-common/src/version.ts
Original file line number Diff line number Diff line change
@@ -1 +1 @@
export default '0.2.3'
export default '0.2.4'
112 changes: 112 additions & 0 deletions packages/client-node/__tests__/integration/node_streaming_e2e.test.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import type { Row } from '@clickhouse/client-common'
import { type ClickHouseClient } from '@clickhouse/client-common'
import { fakerRU } from '@faker-js/faker'
import { createSimpleTable } from '@test/fixtures/simple_table'
import { createTableWithFields } from '@test/fixtures/table_with_fields'
import { createTestClient, guid } from '@test/utils'
import Fs from 'fs'
import split from 'split2'
Expand Down Expand Up @@ -73,4 +75,114 @@ describe('[Node.js] streaming e2e', () => {
}
expect(actual).toEqual(expected)
})

// See https://github.com/ClickHouse/clickhouse-js/issues/171 for more details
// Here we generate a large enough dataset to break into multiple chunks while streaming,
// effectively testing the implementation of incomplete rows handling
describe('should correctly process multiple chunks', () => {
async function generateData({
rows,
words,
}: {
rows: number
words: number
}): Promise<{
table: string
values: { id: number; sentence: string; timestamp: string }[]
}> {
const table = await createTableWithFields(
client as ClickHouseClient,
`sentence String, timestamp String`
)
const values = [...new Array(rows)].map((_, id) => ({
id,
// it seems that it is easier to trigger an incorrect behavior with non-ASCII symbols
sentence: fakerRU.lorem.sentence(words),
timestamp: new Date().toISOString(),
}))
await client.insert({
table,
values,
format: 'JSONEachRow',
})
return {
table,
values,
}
}

describe('large amount of rows', () => {
it('should work with .json()', async () => {
const { table, values } = await generateData({
rows: 10000,
words: 10,
})
const result = await client
.query({
query: `SELECT * FROM ${table} ORDER BY id ASC`,
format: 'JSONEachRow',
})
.then((r) => r.json())
expect(result).toEqual(values)
})

it('should work with .stream()', async () => {
const { table, values } = await generateData({
rows: 10000,
words: 10,
})
const stream = await client
.query({
query: `SELECT * FROM ${table} ORDER BY id ASC`,
format: 'JSONEachRow',
})
.then((r) => r.stream())

const result = []
for await (const rows of stream) {
for (const row of rows) {
result.push(await row.json())
}
}
expect(result).toEqual(values)
})
})

describe("rows that don't fit into a single chunk", () => {
it('should work with .json()', async () => {
const { table, values } = await generateData({
rows: 5,
words: 10000,
})
const result = await client
.query({
query: `SELECT * FROM ${table} ORDER BY id ASC`,
format: 'JSONEachRow',
})
.then((r) => r.json())
expect(result).toEqual(values)
})

it('should work with .stream()', async () => {
const { table, values } = await generateData({
rows: 5,
words: 10000,
})
const stream = await client
.query({
query: `SELECT * FROM ${table} ORDER BY id ASC`,
format: 'JSONEachRow',
})
.then((r) => r.stream())

const result = []
for await (const rows of stream) {
for (const row of rows) {
result.push(await row.json())
}
}
expect(result).toEqual(values)
})
})
})
})
Loading

0 comments on commit 2552dbb

Please sign in to comment.