Data Quality
Parquet Data Profiler
Inspect Apache Parquet files for schema details, row counts, and column statistics.
Analyze schema and statistics locally.
Rows
0
Columns
0
Upload a Parquet file to see profiling results.
Pattern notation
Each character is classified: digits → N, letters → L, whitespace → ·.
Coverage
% of rows matching the dominant pattern.
Results will appear here after uploading a Parquet file.
About Parquet File Profiler
This Parquet Profiler allows you to inspect the metadata and statistics of big data files without downloading specialized desktop software. It reads the file header to visualize the schema, column types, compression codecs, and row counts directly in the browser.
Ensure your data lake files are valid and conform to the expected schema. This tool is vital for data engineers working with Arrow or Spark ecosystems who need to check data quality and file integrity rapidly. It provides transparency into binary formats that are otherwise unreadable.
Upload a .parquet file to the drop zone. The application parses the footer metadata and displays a structural overview, including nested fields and statistical summaries for each column group.
Under the Hood
This tool uses `apache-arrow` (referenced as `parquet-wasm` or pure JS implementation depending on build) to read Parquet footers and row groups. It extracts the Thrift metadata to display schema types and compression codecs without reading the entire file body. Statistical profiling utilizes the page-level statistics embedded in the Parquet file metadata (min, max, null count) to provide instant insights without a full table scan.