parquetize 0.5.7

This release includes :

parquetize 0.5.6.1

This release includes :

fst_to_parquet function

Other

parquetize 0.5.6

This release includes :

Possibility to use a RDBMS as source

You can convert to parquet any query you want on any DBI compatible RDBMS :

dbi_connection <- DBI::dbConnect(RSQLite::SQLite(),
  system.file("extdata","iris.sqlite",package = "parquetize"))
  
# Reading iris table from local sqlite database
# and conversion to one parquet file :
dbi_to_parquet(
  conn = dbi_connection,
  sql_query = "SELECT * FROM iris",
  path_to_parquet = tempdir(),
  parquetname = "iris"
)

You can find more information on dbi_to_parquet documentation.

check_parquet function

Deprecations

Two arguments are deprecated to avoid confusion with arrow concept and keep consistency

Other

parquetize 0.5.5

This release includes :

A very important new contributor to parquetize !

Due to these numerous contributions, @nbc is now officially part of the project authors !

Three arguments deprecation

After a big refactoring, three arguments are deprecated :

They will raise a deprecation warning for the moment.

Chunking by memory size

The possibility to chunk parquet by memory size with table_to_parquet(): table_to_parquet() takes a chunk_memory_size argument to convert an input file into parquet file of roughly chunk_memory_size Mb size when data are loaded in memory.

Argument by_chunk is deprecated (see above).

Example of use of the argument chunk_memory_size:

table_to_parquet(
  path_to_table = system.file("examples","iris.sas7bdat", package = "haven"),
  path_to_parquet = tempdir(),
  chunk_memory_size = 5000, # this will create files of around 5Gb when loaded in memory
)

Passing argument like compression to write_parquet when chunking

The functionality for users to pass argument to write_parquet() when chunking argument (in the ellipsis). Can be used for example to pass compression and compression_level.

Example:

table_to_parquet(
  path_to_table = system.file("examples","iris.sas7bdat", package = "haven"),
  path_to_parquet = tempdir(),
  compression = "zstd",
  compression_level = 10,
  chunk_memory_size = 5000
)

A new function download_extract

This function is added to … download and unzip file if needed.

file_path <- download_extract(
  "https://www.nomisweb.co.uk/output/census/2021/census2021-ts007.zip",
  filename_in_zip = "census2021-ts007-ctry.csv"
)
csv_to_parquet(
  file_path,
  path_to_parquet = tempdir()
)

Other

Under the cover, this release has hardened tests

parquetize 0.5.4

This release fix an error when converting a sas file by chunk.

parquetize 0.5.3

This release includes :

parquetize 0.5.2

This release includes :

parquetize 0.5.1

This release removes duckdb_to_parquet() function on the advice of Brian Ripley from CRAN.
Indeed, the storage of DuckDB is not yet stable. The storage will be stabilized when version 1.0 releases.

parquetize 0.5.0

This release includes corrections for CRAN submission.

parquetize 0.4.0

This release includes an important feature :

The table_to_parquet() function can now convert tables to parquet format with less memory consumption. Useful for huge tables and for computers with little RAM. (#15) A vignette has been written about it. See here.

parquetize 0.3.0

parquetize 0.2.0

parquetize 0.1.0