About 50 results
Open links in new tab
  1. What are the pros and cons of the Apache Parquet format compared …

    Apr 24, 2016 · 30,36,2 Parquet files are most commonly compressed with the Snappy compression algorithm. Snappy compressed files are splittable and quick to inflate. Big data systems want to …

  2. How to read a Parquet file into Pandas DataFrame?

    How to read a modestly sized Parquet data-set into an in-memory Pandas DataFrame without setting up a cluster computing infrastructure such as Hadoop or Spark? This is only a moderate amount of data …

  3. How to view Apache Parquet file in Windows? - Stack Overflow

    Jun 19, 2018 · 99 What is Apache Parquet? Apache Parquet is a binary file format that stores data in a columnar fashion. Data inside a Parquet file is similar to an RDBMS style table where you have …

  4. Python: save pandas data frame to parquet file - Stack Overflow

    Dec 9, 2016 · Is it possible to save a pandas data frame directly to a parquet file? If not, what would be the suggested process? The aim is to be able to send the parquet file to another team, which they …

  5. Spark parquet partitioning : Large number of files

    Jun 28, 2017 · I am trying to leverage spark partitioning. I was trying to do something like data.write.partitionBy ("key").parquet ("/location") The issue here each partition creates huge number …

  6. scala - Parquet column cannot be converted in file (...) Expected ...

    Aug 16, 2024 · From Databricks documentation: The vectorized Parquet reader enables native record-level filtering using push-down filters, improving memory locality, and cache utilization. If you disable …

  7. Read multiple parquet files in a folder and write to single csv file ...

    Aug 5, 2018 · I need to read these parquet files starting from file1 in order and write it to a singe csv file. After writing contents of file1, file2 contents should be appended to same csv without header.

  8. Reading / Fixing a corrupt parquet file - Stack Overflow

    Sep 3, 2024 · Reading / Fixing a corrupt parquet file Ask Question Asked 1 year, 6 months ago Modified 10 months ago

  9. How to read partitioned parquet file into polars? - Stack Overflow

    Apr 24, 2023 · 5 Note that the polars native scan_parquet now directly supports reading hive partitioned data from cloud providers, and it will use the available statistics/metadata to optimise which …

  10. python - pandas.to_parquet pyarrow.lib.ArrowInvalid: Could not …

    Mar 20, 2024 · The "value" column of your data frame has got mixed data type (floats and Timedelta). Pyarrow and parquet won't support it. I think you need to unpivot the table to that the floats are in …