Setup

If you are using a docker image (PostGIS/PostgreSQL) you will need to setup the environment in the following way (the docker images don’t have these installed):

CMD: Log into the container and install the required packages to build cargo:

podman exec -it spatialytics-postgis bash
apt-get update && apt-get install -y curl gcc libssl-dev pkg-config git postgresql-17 clang-16 postgresql-server-dev-17

add postgres user to sudoers

usermod -aG sudo postgres

Notes: postgresql-17: we need the latest so that cargo-pgrx runs clang-16 and postgresql-server-dev-17: needed to build the pg_parquet

SWITCH TO USER, ENTER USER HOME AND: CMD: Follow pg_parquet installation from source (-s -- -y answer yes for defaults):

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y

CMD: Install cargo:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y

You will see:

> curl https://sh.rustup.rs -sSf | sh -s -- -y
...
Rust is installed now. Great!
 
To get started you may need to restart your current shell.
This would reload your PATH environment variable to include
Cargo's bin directory ($HOME/.cargo/bin).
 
To configure your current shell, you need to source
the corresponding env file under $HOME/.cargo.
 
This is usually done by running one of the following (note the leading DOT):
. "$HOME/.cargo/env"            # For sh/bash/zsh/ash/dash/pdksh
source "$HOME/.cargo/env.fish"  # For fish
source "$HOME/.cargo/env.nu"    # For nushell

CMD: You need to source:

source "$HOME/.cargo/env"

SHOULD BE DONE BEFORE CMD: install gcc, the headers for openssl and

apt-get install -y gcc libssl-dev pkg-config

NOTE: To install cargo-pgrx, cargo will try to find openssl via: PKG_CONFIG_ALLOW_SYSTEM_CFLAGS=1 pkg-config --libs --cflags openssl

CMD: now we can install cargo-pgrx

Install locked like this

cargo install cargo-pgrx --version "0.13.1" --locked

From docs:

# install cargo-pgrx
> cargo install cargo-pgrx
# install this way until the issue is patched
# https://github.com/pgcentralfoundation/pgrx/issues/2009
# https://github.com/pgcentralfoundation/pgrx/issues/2016

CMD: configure pgrx

cargo pgrx init --pg17 $(which pg_config)

THIS MIGHT NOT BE NEEDED B/C WE’RE USER If running with root, I don’t think it will work (initdb cannot run as root user)

root@1d951fd1d999:/# cargo pgrx init --pg17 $(which pg_config)
     Creating PGRX_HOME at `/root/.pgrx`
   Validating /usr/bin/pg_config
   Skipping initdb as current user is root user

CMD: append the extension to shared_preload_libraries

echo "shared_preload_libraries = 'pg_parquet'" >> ~/.pgrx/data-17/postgresql.conf

but in the Debian docker PostGIS image you should look where that conf is and use that (https://stackoverflow.com/a/3603162):

> psql -U postgres -c 'SHOW config_file'
               config_file
------------------------------------------
 /var/lib/postgresql/data/postgresql.conf
(1 row)

DONT THINK WE NEED SUDO B/C WE’VE ALREADY INSTALLED EVERYTHING CMD: install sudo:

> apt-get install -y sudo

Clone the repo:

git clone https://github.com/CrunchyData/pg_parquet.git && cd pg_parquet

Then:

# initialize a data directory, build and install the extension (to the targets specified by configured pg_config), then connects to a session
> cargo pgrx run
# alternatively you can only build and install the extension (pass --release flag for production binary)
> cargo pgrx install --release

# create the extension in the database
psql> "CREATE EXTENSION pg_parquet;"

Installing

Install pg_parquet

CREATE EXTENSION IF NOT EXISTS pg_parquet;
 
-- Import the Parquet file directly
COPY my_table FROM '/path/to/file.parquet' WITH (FORMAT 'parquet');

References

https://github.com/CrunchyData/pg_parquet/