Posts

Showing posts with the label ketan

DataSEA- Data Science, Engineering, Analytics

  # DataSEA: The Gamified Mobile App to Learn Data Engineering, SQL, and Analytics in 10 Minutes a Day                                                                                                                                                                                                             > **TL;DR** — DataSEA is a free Android app that turns Data Engineering, Analytics, and Data Science into bite-size, gamified lessons. 88+ modules,     500+ lessons,...

Benefits of Apache Parquet Format in big fata

Benefits of Parquet Format Columnar Storage Efficient for analytics and read-heavy workloads . Only required columns are read into memory. Highly Compressed Supports efficient compression algorithms (Snappy, GZIP, Brotli). Smaller file size compared to row-based formats like CSV/JSON. Splittable & Scalable Files can be split and read in parallel , improving speed in distributed systems like Hadoop/Spark. Schema Evolution Supports adding new columns without breaking existing data pipelines. Efficient for Queries Works well with SQL engines like Hive, Presto, Trino, Athena, BigQuery. Better IO Performance Reduces disk and network IO by avoiding unnecessary data reads. Interoperable Supported across multiple languages and platforms (Python, Java, Spark, Hive, AWS, GCP, etc.). Self-describing Format Stores schema as metadata within the file itself — no need for external schema definitions. Great with Partitioning When used wi...