Data Quality

Parquet Data Profiler

Inspect Apache Parquet files for schema details, row counts, and column statistics.

Data & Analysis

Upload Parquet File

Analyze schema and statistics locally.

Upload Parquet

Rows

Columns

Upload a Parquet file to see profiling results.

Pattern notation

Each character is classified: digits → N, letters → L, whitespace → ·.

Coverage

% of rows matching the dominant pattern.

Results will appear here after uploading a Parquet file.

About Parquet File Profiler

This Parquet Profiler allows you to inspect the metadata and statistics of big data files without downloading specialized desktop software. It reads the file header to visualize the schema, column types, compression codecs, and row counts directly in the browser.

Ensure your data lake files are valid and conform to the expected schema. This tool is vital for data engineers working with Arrow or Spark ecosystems who need to check data quality and file integrity rapidly. It provides transparency into binary formats that are otherwise unreadable.

Upload a .parquet file to the drop zone. The application parses the footer metadata and displays a structural overview, including nested fields and statistical summaries for each column group.

Under the Hood

This tool uses `apache-arrow` (referenced as `parquet-wasm` or pure JS implementation depending on build) to read Parquet footers and row groups. It extracts the Thrift metadata to display schema types and compression codecs without reading the entire file body. Statistical profiling utilizes the page-level statistics embedded in the Parquet file metadata (min, max, null count) to provide instant insights without a full table scan.