Skip to content

Commit

Permalink
feat(slb-495): fix format
Browse files Browse the repository at this point in the history
  • Loading branch information
dspachos committed Jan 13, 2025
1 parent 6c6bccd commit 00dbee4
Showing 1 changed file with 23 additions and 8 deletions.
31 changes: 23 additions & 8 deletions apps/converter/README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,20 @@
# Silverback Converter

The converter is a Node.js application designed to convert documents from various formats (DocX, PDF, and HTML) into Markdown.
The converter is a Node.js application designed to convert documents from
various formats (DocX, PDF, and HTML) into Markdown.

This tool is particularly useful for developers and content creators who need to transform documents into a format suitable for further processing, analysis, or integration with other systems.
This tool is particularly useful for developers and content creators who need to
transform documents into a format suitable for further processing, analysis, or
integration with other systems.

## Features

- **DocX to Markdown**: Convert Word documents (`.docx`) to Markdown.
- **PDF to Markdown**: Convert PDF files to Markdown.
- **HTML to Markdown**: Extract main content from web pages and convert it to Markdown.
- **Jina AI Integration**: Fetch and convert content using the Jina AI API. (ATTENTION: EXPERIMENTAL, DO NOT USE THIS)
- **HTML to Markdown**: Extract main content from web pages and convert it to
Markdown.
- **Jina AI Integration**: Fetch and convert content using the Jina AI API.
(ATTENTION: EXPERIMENTAL, DO NOT USE THIS)

## Setup and Installation

Expand All @@ -21,11 +26,12 @@ This tool is particularly useful for developers and content creators who need to
### Installation

1. **Install dependencies**:

```bash
npm i
```

3. **Set up environment variables** (optional):
2. **Set up environment variables** (optional):
- Create a `.env` file in the root directory.
- Add your Jina AI API key if you plan to use the Jina AI integration:
```env
Expand All @@ -35,6 +41,7 @@ This tool is particularly useful for developers and content creators who need to
### Running the Application
To start the application, run the following command:
```bash
npm start
```
Expand All @@ -46,16 +53,19 @@ The server will start on `http://localhost:3000`.
### Endpoints

- **Convert DocX to Markdown**:

```
GET /convert?path=/path/to/your/document.docx
```

- **Convert PDF to Markdown**:

```
GET /pdf-convert?path=/path/to/your/document.pdf
```

- **Convert HTML to Markdown**:

```
GET /html-convert?path=https://example.com
```
Expand All @@ -68,16 +78,21 @@ The server will start on `http://localhost:3000`.
### Example

To convert a Word document to Markdown, make a GET request to:

```
http://localhost:3000/convert?path=/path/to/your/document.docx
```

The response will include the converted Markdown content, the output directory, and any warnings generated during the conversion process.
The response will include the converted Markdown content, the output directory,
and any warnings generated during the conversion process.

## Configuration

- **Output Directory**: By default, converted files are saved in a directory named after the input file's hash. You can customize the output directory by modifying the `outputDir` variable in the respective conversion scripts.
- **Image Handling**: Images extracted from documents are saved in an `images` subdirectory within the output directory.
- **Output Directory**: By default, converted files are saved in a directory
named after the input file's hash. You can customize the output directory by
modifying the `outputDir` variable in the respective conversion scripts.
- **Image Handling**: Images extracted from documents are saved in an `images`
subdirectory within the output directory.

## Dependencies

Expand Down

0 comments on commit 00dbee4

Please sign in to comment.