BigQuery Bulk Insert using Python


To insert multiple rows at once using DML in BigQuery, you can do something like this:

from google.cloud import bigquery
from google.cloud.bigquery import dbapi
from faker import Faker

...
rows_to_insert = []
for i in range(0, args.count):
    id = str(uuid.uuid4())
    createdAt = fake.date_time_between(args.start_date, args.end_date).isoformat()
    rows_to_insert.append({
       "id": id,
       "createdAt": createdAt
    })
bqclient = bigquery.Client(project = args.storage_project_id)
dataset = args.storage_project_id
table = 'table'
query_text = f"""
INSERT INTO {dataset}.{table} (id, createdAt) VALUES  (%(id:STRING)s, %(createdAt:TIMESTAMP)s)
"""
conn = dbapi.Connection(bqclient)
cur = dbapi.Cursor(conn)
cur.executemany(query_text, rows_to_insert)
conn.close()

The above snippet inserts multiple rows into the table (table with columns: id and createdAt) in one go.

You can also see that values types are being supplied (id:STRING and createdAt:TIMESTAMP). Binding the values this way (using parameters) will help with preventing SQL Injection while also ensuring data integrity.

I am using the DB API instead of the Streaming API here because the rows inserted via streaming API are not available for update for up-to 90 minutes.

,

Leave a Reply

Your email address will not be published. Required fields are marked *