Prerequisites

  • A DataExtractorAI account (sign up here if you don’t have one)
  • An API key (get yours from the Dashboard)
  • Choose your integration method:
    • Direct API access with cURL (no additional requirements)
    • Node.js 14+ (for the Node.js SDK)
    • Any HTTP client library in your preferred language

Step 1: Choose Your Integration Method

You can either use our API directly with cURL/HTTP requests or install our SDK:

# Using npm
npm install dataextractorai

# Using yarn
yarn add dataextractorai

# Using pnpm
pnpm add dataextractorai

# Or use cURL directly with our API
# No installation needed!

For other platforms, check out our SDK Examples for more information or use cURL directly with our API Reference.

Step 2: Basic Usage with SDK

Basic Usage Example
import { DataExtractorAI } from 'dataextractorai';
import fs from 'fs';

// Initialize the client with your API key
const extractor = new DataExtractorAI({
  apiKey: 'YOUR_API_KEY'
});

// Define extraction schema
const invoiceSchema = {
  type: 'object',
  properties: {
    invoice_number: { type: 'string' },
    date: { type: 'string', format: 'date' },
    total: { type: 'number' },
    vendor: { type: 'string' }
  }
};

// Extract data from a file
async function extractInvoiceData() {
  try {
    const result = await extractor.extract({
      file: fs.createReadStream('invoice.pdf'),
      schema: invoiceSchema
    });

    console.log('Extracted data:', result.data);
  } catch (error) {
    console.error('Extraction failed:', error.message);
  }
}

extractInvoiceData();

Important: Always keep your API key secure and never expose it in client-side code.

Alternative: Using cURL

If you prefer to use the API directly, here are examples using cURL:

cURL Examples
# Basic extraction with cURL
curl -X POST https://dataextractorai.com/api/v1/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@invoice.pdf" \
  -F 'schema={
    "type": "object",
    "properties": {
      "invoice_number": { "type": "string" },
      "date": { "type": "string", "format": "date" },
      "total": { "type": "number" },
      "vendor": { "type": "string" }
    },
  }'

# Using a template
curl -X POST https://dataextractorai.com/api/v1/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@invoice.pdf" \
  -F "templateId=invoice"

# With webhook for large documents
curl -X POST https://dataextractorai.com/api/v1/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@large_document.pdf" \
  -F "webhook_url=https://your-server.com/webhook" \
  -F "webhook_events[]=completed" \
  -F "webhook_events[]=failed"

Tip: For large documents or batch processing, we recommend using webhooks to avoid timeout issues. The webhook example above shows how to set this up.

Step 3: Web Integration

You can also use DataExtractorAI in browser environments:

Browser Integration Example

// For browser environments
import { DataExtractorAI } from 'dataextractorai';

const extractor = new DataExtractorAI({
    apiKey: 'YOUR_API_KEY'
});

document.getElementById('extract-form').addEventListener('submit', async (event) => {
    event.preventDefault();

    const fileInput = document.getElementById('document-file');
    if (!fileInput.files || fileInput.files.length === 0) {
        alert('Please select a file');
        return;
    }

    const file = fileInput.files[0];

    try {
        // Show loading state
        document.getElementById('result').textContent = 'Processing...';

        const result = await extractor.extract({
            file: file,
            schema: {
                type: 'object',
                properties: {
                    invoice_number: { type: 'string' },
                    date: { type: 'string' },
                    total: { type: 'number' }
                }
            }
        });

        // Display results
        document.getElementById('result').textContent = JSON.stringify(result.data, null, 2);
    } catch (error) {
        document.getElementById('result').textContent = 'Error: ' + error.message;
    }
});
HTML Form Example
<form id="extract-form">
  <input type="file" id="document-file" accept=".pdf,.jpg,.png,.jpeg">
  <button type="submit">Extract Data</button>
  <pre id="result"></pre>
</form>

Next Steps

Now that you have the basics down, here are some next steps to get the most out of DataExtractorAI:

Choose Your Integration Method

Use our API directly with cURL/HTTP requests or integrate with our SDK - choose what works best for your workflow.

Learn about schema definition

Create custom schemas to extract exactly the data you need in the format you want. [Learn more][/schema/basic]

Explore API Reference

Check out our complete API reference for all available endpoints and options. View API Reference