Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Bytehouse is a derivative of ClickHouse. It is based on very old ClickHouse version (20.4.54418) and many features are unsupported.

Status

ByteHouse's international cloud (bytehouse.cloud) is no longer reachable from outside the China region. The service still operates within China via Volcengine. All existing results in this directory were collected against the international cloud and have been re-tagged with "historical". Future submissions running against a self-managed ByteHouse instance (or via Volcengine) should not be tagged historical.

https://bytehouse.cloud/signup

Sign Up. Only Asia-Pacific South-East 1 AWS region is available. Verify email.

Create virtual warehouse. Size L.

Go to "Databases" and create database "test".

Go to "SQL Worksheet" and copy-paste create.sql query there.

Note: S3 import does not support public buckets. And it requires pasting secret access key, which we are not going to do. So, switch to using CLI.

Create a machine in ap-southeast-1 region and install Bytehouse CLI:

wget --continue --progress=dot:giga https://github.com/bytehouse-cloud/cli/releases/download/v1.5.34/bytehouse-cli_1.5.34_Linux_x86_64.tar.gz
tar xvf bytehouse-cli_1.5.34_Linux_x86_64.tar.gz
export user='...'
export password='...'
export account='AWS...'
export warehouse='test'
wget --continue --progress=dot:giga 'https://datasets.clickhouse.com/hits_compatible/hits.csv.gz'
gzip -d -f hits.csv.gz

Load the data:

echo -n "Load time: "
command time -f '%e' ./bytehouse-cli --user "$user" --account "$account" --password "$password" --region ap-southeast-1 --secure --warehouse "$warehouse" --query "INSERT INTO test.hits FORMAT CSV" < hits.csv
99,997,497 total rows sent, 0 rows/s (81.14 GB, 0.00 B/s)
total rows sent: 99,997,497, average speed = 134,320 rows/s
Elapsed: 12m24.754608947s. 81.14 GB (108.94 MB/s).
─── End of Execution ───

real    12m25.310s

Run the benchmark:

./run.sh 2>&1 | tee log.txt 

cat log.txt | grep --text -F 'Elapsed' | 
    grep --text -oP 'Elapsed: [\d\.]+(ms|s)\. Processed: \d+ row' | 
    sed -r -e 's/Elapsed: ([0-9\.]+)(ms|s)\. Processed: ([0-9]+) row/\1 \2 \3/' | 
    awk '{ if ($3 == 0) { print "null" } else if ($2 == "ms") { print $1 / 1000 } else { print $1 } }' |
    awk '{ if (i % 3 == 0) { printf "[" }; printf $1; if (i % 3 != 2) { printf "," } else { print "]," }; ++i; }'

Note: cluster size L is the maximum that can be created. An attempt to create XL gives "Failed AWAITING RESOURCES".