Analyzing database queries
You may sometimes find yourself needing to either debug or optimize specific queries against the PostgreSQL database made by the API handlers. This section outlines some tips for going about analyzing and fixing these queries.
As described in the Database Schema section, SkyPortal communicates with the
PostgreSQL database via the SQLAlchemy ORM. More specifically, most
of this is done through the SQLAlchemy
Query object API, found here.
Database queries are built via the methods in this
Query API, and compiled into SQL queries
to be sent to the underlying database - in our case PostgreSQL.
For example, a query written like this on the Python side
DBSession().query(Instrument.id).filter(Instrument.id == 1) # new version for SQLAlchemy 2.0 with DBSession() as session: session.execute(sa.select(Instrument.id).where(Instrument.id == 1))
would yield the following Postgres-dialect SQL query
SELECT instruments.id FROM instruments WHERE instruments.id = 1;
A good first step to debugging or looking for inefficiencies in a query made by the SkyPortal backend is to look at the actual SQL generated by SQLAlchemy for the query. To do so, you can do something like this:
from sqlalchemy.dialects import postgresql query = DBSession().query(Instrument.id).filter(Instrument.id == 1) print(query.statement.compile(dialect=postgresql.dialect()))
Note the specification of the PostgreSQL dialect when printing the compiled SQL. This ensures that any Postgres-specific conventions or syntax are respected in the conversion to SQL, as described in the SQLAlchemy documentation here.
For example, the following expression prints out two different versions when the equivalent SQL is asked for, based on whether the PostgreSQL dialect is specified or not:
from sqlalchemy.dialects import postgresql from sqlalchemy import column expression = column('x') == 'some value' print(expression) print(expression.compile(dialect=postgresql.dialect()))
x = :x_1 x = %(x_1)s
Finally, you should note that in the SkyPortal API handlers, most of the SQLAlchemy
that you would print out in this manner originate from our permissions framework
query_accessible_rows() method. Thus, if you’re ever lost looking for those
DBSession().query()... lines to do this sort of print-debugging, you may want to look for the
query_accessible_rows() lines instead. Similarly, if you’re ever wondering where certain parts of
the printed out SQL are coming from (especially a series of joins), the access control rules
skyportal/models.py for the database model being queried can be a good place to start.
Once you have the full SQL query, it can be easier to spot errors/inefficiencies and to experiment with different approaches. It is easier, after all, to directly edit the SQL and try different ways of joining, moving around subqueries, etc. without also trying to figure out how to get SQLAlchemy to produce the exact queries you’re envisioning.
Some quick tips with regards to debugging/optimizing raw SQL:
Use a pretty-printer like this one to get nicely indented versions of longer queries to more easily spot mistakes and understand what a query is doing.
A missing index is often both the biggest bottleneck and the quickest fix to a slow query. You can easily check which columns are indexed on a given table by running
Another common inefficiency in queries is a cross-join, where each row in a table is joined against every row in another table and is rarely what you really want. Cross-joins will show up in the SQL generated by SQLAlchemy with the
table_a , table_bsyntax. It can be hard to spot those commas, so be on the lookout for them.
Finally, the most helpful (and in-depth) way of analyzing the performance of a SQL query is through
EXPLAIN ANALYZE statement. You can connect to the SkyPortal database using
psql and then run
EXPLAIN ANALYZE [your SQL query here] in order to get a report of the steps PostgreSQL took to
carry out your query, and how long each step took. This can often point you toward the changes
necessary to speed up your query. For example, you may notice that PostgreSQL is looping through an
entire table (a
SEQ SCAN) to look for certain rows instead of using an index. More detailed guidance
on using the
EXPLAIN outputs can be found in the PostgreSQL documentation.