Frequently Asked Questions

Common questions about Pauhu LDS data offerings.


General Questions

What is the Language Data Space (LDS)?

The Language Data Space is a European initiative creating a secure marketplace for language data and technology. It enables organizations to share and monetize language resources while maintaining data sovereignty.

Key features:


What data does Pauhu offer through LDS?

We offer enriched parallel corpora from EUR-Lex (Official Journal of the EU):

Asset Coverage Languages
Parallel corpora 21 EuroVoc domains 24 EU languages
Enrichment layers E1-E5 annotations All segments
Terminology IATE + EuroVoc links All languages

What are the enrichment layers (E1-E5)?

Our corpora include five layers of linguistic and semantic annotation:

Layer Content
E1 Linguistic POS tagging, lemmatization, named entity recognition
E2 Semantic Entity linking, word sense disambiguation
E3 Domain EuroVoc classification, IATE terminology
E4 Quality Alignment scores, fluency metrics
E5 Metadata CELEX numbers, publication dates, document types

Licensing Questions

What's the difference between Research and Commercial licenses?

Aspect Research (ACA) Commercial (RES)
Who Universities, researchers, non-profits Companies, government
Commercial use Not permitted Permitted
Price €3,000-18,000 €9,000-54,000
Redistribution No Internal only
Verification ORCID + institutional email Company registration

Can I use Research license data in a commercial product?

No. Research licenses (CLARIN ACA) are strictly non-commercial. If your research leads to commercialization, you must upgrade to a Commercial license before commercial use.


Can I share data with collaborators?

License Sharing Policy
Research Within your research group only
Commercial Within your organization only

For broader sharing, contact us about consortium or redistribution licenses.


Can I train AI models with this data?

Research license: Yes, for research purposes only. Models cannot be deployed commercially.

Commercial license: Yes, for both internal and commercial deployment.


Pricing Questions

Are there volume discounts?

Yes:

Bundle Size Discount
3 domains 15%
5 domains 20%
10 domains 25%
All 21 domains 30%

Multi-year agreements have additional discounts.


What payment methods do you accept?

Method Research Commercial
Credit/Debit card Yes Yes
Bank transfer (SEPA) Yes Yes
Invoice (30 days) On request Standard
Purchase order No Yes

Technical Questions

What formats are available?

Format Use Case
JSONL Machine learning, with all enrichment
TMX Translation Memory tools
Moses Statistical MT training
XLIFF 2.1 Localization workflows
JSON-LD Linked data applications

How large are the downloads?

Package Approximate Size
Single domain (bilingual) 50-500 MB
Single domain (24 languages) 1-5 GB
Full corpus (24 languages) 20-40 GB

Compressed (gzip/zip) for download.


Can I access data via API?

Yes, after purchase:

API documentation provided with license.


Account Questions

Can students apply for Research licenses?

Yes, if:

Undergraduate students need supervisor approval.


Can government agencies use Research licenses?

No. Government agencies should apply for Commercial licenses, which may include public sector discounts.


Support Questions

How do I get help?

Type Contact Response
Pre-sales lds@pauhu.ai 1-2 business days
Technical support@pauhu.ai Same business day
Legal legal@pauhu.ai 2-3 business days
Enterprise enterprise@pauhu.ai Priority

Still Have Questions?

Email: lds@pauhu.ai
Subject: LDS FAQ Question

We respond within 2 business days.


Related Pages