SQLAlchemy Expression Language, Advanced Usage

Expression Language

One of the core components of SQLAlchemy is the Expression Language. It is allows the programmer to specify SQL statements in Python constructs and use the constructs directly in more complex queries. Since the expression language is backend-neutral and comprehensively covers every aspect of raw SQL, it is closer to raw SQL than any other component in SQLAlchemy. In this article, we are going to illustrate the power of the expression language using a three table database.

Database Models

Suppose we want to model multiple shopping carts, each of which is created by a user and stores multiple products. From the specification, we can deduce that a user owns multiple shopping carts, a shopping cart includes multiple products, and a product can be included in multiple shopping carts. Therefore, we want to establish a many-to-many relationship between ShoppingCart and Product, and another one-to-many between User and ShoppingCart. Let's create the database models:

[python]
from sqlalchemy import Column, DateTime, String, Integer, ForeignKey, Float
from sqlalchemy.orm import relationship, backref
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
name = Column(String)

class ShoppingCart(Base):
__tablename__ = 'shopping_cart'
id = Column(Integer, primary_key=True)
owner_id = Column(Integer, ForeignKey(User.id))
owner = relationship(
User, backref=backref('shopping_carts', uselist=True)
)
products = relationship(
'Product',
secondary='shopping_cart_product_link'
)
def __repr__(self):
return '( {0}:{1.owner.name}:{1.products!r} )'.format(ShoppingCart, self)

class Product(Base):
__tablename__ = 'product'
id = Column(Integer, primary_key=True)
name = Column(String)
# Using a Float is not the right way of modeling a currency value.
# We will investigate that topic in a different article.
price = Column(Float)
shopping_carts = relationship(
'ShoppingCart',
secondary='shopping_cart_product_link'
)
def __repr__(self):
return '( {0}:{1.name!r}:{1.price!r} )'.format(Product, self)

class ShoppingCartProductLink(Base):
__tablename__ = 'shopping_cart_product_link'
shopping_cart_id = Column(Integer, ForeignKey('shopping_cart.id'), primary_key=True)
product_id = Column(Integer, ForeignKey('product.id'), primary_key=True)

from sqlalchemy import create_engine
engine = create_engine('sqlite:///')

from sqlalchemy.orm import sessionmaker
DBSession = sessionmaker()
DBSession.configure(bind=engine)
Base.metadata.create_all(engine)
[/python]

Create Users, Products and Shopping Carts

Now let's create one user and several products.

[python]
>>> session = DBSession()
>>> cpu = Product(name='CPU', price=300.00)
>>> motherboard = Product(name='Motherboard', price=150.00)
>>> coffee_machine = Product(name='Coffee Machine', price=30.00)
>>> john = User(name='John')
>>> session.add(cpu)
>>> session.add(motherboard)
>>> session.add(coffee_machine)
>>> session.add(john)
>>> session.commit()
>>> session.close()
[/python]

Before progressing further, let's verify that we have one user and three products in the database now.

Now we can create two shopping carts for user John.

[python]
>>> session = DBSession()
>>> cpu = session.query(Product).filter(Product.name == 'CPU').one()
>>> motherboard = session.query(Product).filter(Product.name == 'Motherboard').one()
>>> coffee_machine = session.query(Product).filter(Product.name == 'Coffee Machine').one()
>>> john = session.query(User).filter(User.name == 'John').one()
>>> john_shopping_cart_computer = ShoppingCart(owner=john)
>>> john_shopping_cart_kitchen = ShoppingCart(owner=john)
>>> john_shopping_cart_computer.products.append(cpu)
>>> john_shopping_cart_computer.products.append(motherboard)
>>> john_shopping_cart_kitchen.products.append(coffee_machine)
>>> session.add(john_shopping_cart_computer)
>>> session.add(john_shopping_cart_kitchen)
>>> session.commit()
>>> session.close()
[/python]

Use the Expression Language to Query the Database

Now we have one user, three products and two shopping carts in the database, we can start playing with the expression language. First, let's write a query to answer the question: which products' prices are higher than $100.00?

[python]
>>> from sqlalchemy import select
>>> product_higher_than_one_hundred = select([Product.id]).where(Product.price > 100.00)
>>>
>>> session = DBSession()
>>> session.query(Product).filter(Product.id.in_(product_higher_than_one_hundred)).all()
[( :u'CPU':300.0 ), ( :u'Motherboard':150.0 )]
>>> session.close()
[/python]

Then, let's write a query to answer a more complicated question: which shopping carts contain at least one product whose price is higher than $100.00?

[python]
>>> shopping_carts_with_products_higher_than_one_hundred = select([ShoppingCart.id]).where(
... ShoppingCart.products.any(Product.id.in_(product_higher_than_one_hundred))
... )
>>> session = DBSession()
>>> session.query(ShoppingCart).filter(ShoppingCart.id.in_(shopping_carts_with_products_higher_than_one_hundred)).one()
( :John:[( :u'CPU':300.0 ), ( :u'Motherboard':150.0 )] )
>>> session.close()
[/python]

Then, let's write a query to answer a slightly different question: which shopping carts contain no product whose price is lower than $100.00?

[python]
>>> products_lower_than_one_hundred = select([Product.id]).where(Product.price < 100.00) >>> from sqlalchemy import not_
>>> shopping_carts_with_no_products_lower_than_one_hundred = select([ShoppingCart.id]).where(
... not_(ShoppingCart.products.any(Product.id.in_(products_lower_than_one_hundred)))
... )
>>> session = DBSession()
>>> session.query(ShoppingCart).filter(ShoppingCart.id.in_(
... shopping_carts_with_no_products_lower_than_one_hundred)
... ).all()
[( :John:[( :u'CPU':300.0 ), ( :u'Motherboard':150.0 )] )]
>>> session.close()
[/python]

Or the previous question can be formed in a different way: how can we find the shopping carts all of whose products have a price higher than $100.00?

[python]
>>> from sqlalchemy import and_
>>> shopping_carts_with_all_products_higher_than_one_hundred = select([ShoppingCart.id]).where(
... and_(
... ShoppingCartProductLink.product_id.in_(product_higher_than_one_hundred),
... ShoppingCartProductLink.shopping_cart_id == ShoppingCart.id
... )
... )
>>> session = DBSession()
>>> session.query(ShoppingCart).filter(ShoppingCart.id.in_(
... shopping_carts_with_all_products_higher_than_one_hundred)
... ).all()
[( :John:[( :u'CPU':300.0 ), ( :u'Motherboard':150.0 )] )]
>>> session.close()
[/python]

Now, we can ask a different kind of questions regarding aggregation over Product.price. For example, we can ask: which shopping carts' total price of the products is higher than $200.00?

[python]
>>> from sqlalchemy import func
>>> total_price_of_shopping_carts = select([
... ShoppingCart.id.label('shopping_cart_id'),
... func.sum(Product.price).label('product_price_sum')
... ]).where(
... and_(
... ShoppingCartProductLink.product_id == Product.id,
... ShoppingCartProductLink.shopping_cart_id == ShoppingCart.id,
... )
... ).group_by(ShoppingCart.id)
>>> session = DBSession()
>>> session.query(total_price_of_shopping_carts).all()
[(1, 450.0), (2, 30.0)]
>>> session.query(ShoppingCart).filter(
... ShoppingCart.id == total_price_of_shopping_carts.c.shopping_cart_id,
... total_price_of_shopping_carts.c.product_price_sum > 200.00
... ).all()
[( :John:[( :u'CPU':300.0 ), ( :u'Motherboard':150.0 )] )]
>>> session.query(ShoppingCart).filter(
... ShoppingCart.id == total_price_of_shopping_carts.c.shopping_cart_id,
... total_price_of_shopping_carts.c.product_price_sum < 200.00 ... ).all() [( :John:[( :u'Coffee Machine':30.0 )] )] >>> session.close()
[/python]

In the previous example, we begin with constructing an SQLAlchemy selectable total_price_of_shopping_carts whose 'columns' are ShoppingCart.id of each shopping cart and the sum of all the products' prices in each corresponding shopping cart. Once we have such a selectable, it's easy to write the query that finds all the shopping carts whose sum of products' prices are higher than $200.00.

Potential Gotchas

So far, our example program seems to be working pretty well. But if we accidentally break the program by writing and using the constructs in unintended ways? Will SQLAlchemy inform us about what's wrong with our program so we can debug it?

For example, the column Product.price is defined as a Float. What if we create a Product object with a price that's a string? Will SQLAlchemy break because the data type of the input for price is different from the definition? Let's give it a try.

[python]
>>> session = DBSession()
>>> cpu = Product(name='CPU', price='0.15')
>>> session.add(cpu)
>>> session.commit()
>>> cpu = session.query(Product).filter(Product.name == 'CPU').one()
>>> cpu.price
0.15
[/python]

So, the product CPU with a string price was inserted successfully into the database. How about using a string for the price that is not a number at all?

[python]

>>> cpu_two = Product(name='CPU Two', price='asdf')
>>> session.add(cpu_two)
>>> session.commit()
...
sqlalchemy.exc.StatementError: could not convert string to float: asdf (original cause: ValueError: could not convert string to float: asdf) u'INSERT INTO product (name, price) VALUES (?, ?)' [{'price': 'asdf', 'name': 'CPU Two'}]

[/python]

Oops. Now SQLAlchemy raises a StatementError because "asdf" cannot be converted into a Float. This is a good feature since it eliminates potential programming errors caused by carelessness.

You might also notice that the filter() method in our example use expressions such as Product.name == 'CPU' and Product.price > 100.0. Aren't these expressions being evaluated first and then the resulting Boolean values are passed into the filter() function to get actual filter results? Let's use several examples to verify the behaviour of filter().

[python]
>>> session.query(Product).filter(True).all()
[( :u'CPU':300.0 ), ( :u'Motherboard':150.0 ), ( :u'Coffee Machine':30.0 )]
>>> session.query(Product).filter(Product.name='CPU').all()
File "", line 1
SyntaxError: keyword can't be an expression
>>> session.query(Product).filter(Product.price > '100.0').all()
[( :u'CPU':300.0 ), ( :u'Motherboard':150.0 )]
[/python]

From the examples above, we see that filter() does accept straightforward Boolean values like True, which returns all products in the database. However, it does not accept expression like Product.name = 'CPU' whose meaning is unclear in the context of a filter. Like the Product constructor, it also converts a String value '100.0' into a float and filter the product table based on the final criteria.

Now let's examine several cases where the SQLAlchemy API might look a bit less than intuitive. First, the select() statement seems to only accept a list of columns as the first argument like select([Product.id]). Wouldn't it be nice if we can write something like select(Product.id)?

[python]
>>> products_lower_than_one_hundred = select(Product.id).where(Product.price < 100.00)
...
NotImplementedError: Operator 'getitem' is not supported on this expression
[/python]

Oops. SQLAlchemy does not like a single element as the first argument of select(). Remember, always pass in a list.

Second, the syntax in some of the where clauses do not look Pythonic: ShoppingCart.products.any(Product.id.in_(product_higher_than_one_hundred)). Wouldn't it be nice if we can write something like ShoppingCart.products.any(Product.id in product_higher_than_one_hundred))?

[python]
>>> shopping_carts_with_products_higher_than_one_hundred = select([ShoppingCart.id]).where(
... ShoppingCart.products.any(Product.id in product_higher_than_one_hundred)
... )
...
TypeError: argument of type 'Select' is not iterable
[/python]

Since SQLAlchemy's 'Select' objects are not iterable, using it in a in context does not work. It may look like a downside, but it makes sense since a 'Select' object is very flexible in SQLAlchemy. As what's shown in the examples, a 'Select' object can be passed around into any filter() or where() to become part of a more complex query or event another 'Select' object. Supporting iterable on such an object would require lots of changes to the underlying implementation.

Third, the results of query() seem to be returning objects that are formatted nicely, such as ( :u'CPU':300.0 ) as the display for one Product object. It looks different from a typical object like:

[python]
>>> class C:
... pass
...
>>> c = C()
>>> c

[/python]

Why? It's because we overrode the __repr__() method of Product and the print() command from the Python interpreter is calling repr() on the resulting array of Product and ShoppingCart objects which call __repr__() of each corresponding class's implementation.

Finally, why does SQLAlchemy implement their own Float column type? Why can't they re-use Python's internal float type?

Well, the short answer is that SQLAlchemy is an ORM and an ORM maps Python constructs into SQL constructs using a defined type system, and the type system has to be database-agnostic, which means it has to handle different database backends with the same column type. And the long answer is that every column defined in a model has to be defined by SQLAlchemy and the definition / column type implements custom methods that are called by the SQLAlchemy's lower level API to convert a Python construct into a corresponding SQL construct.

Tips and Summary

In this article, we used a three-table database to illustrate how to use SQLAlchemy's Expression Language. One thing to keep in mind is that we use mathematical sets to guide us on writing SQL queries. Whenever we encounter a question that's not trivial, especially a question involving multiple tables, we should divide and conquer the question by answering parts of the question first. For example, the question 'how can we find all the shopping carts whose products' sum of prices are higher than $200.00' can be divided into the following ones: 1. how can we calculate the sum of products' prices? (func.sum()) 2. how can we list all the tuples (ShoppingCart.id, func.sum(Product.price)) in a selectable? 3. how can we use the selectable to write the actual query?

Best Practices for SQLAlchemy

Poor organization when using SQLAlchemy always results in data management complications. The following best practices ensure your SQLAlchemy applications are easy to maintain and scale:

Avoid conflicts by coordinating with your team: When working with a team, everyone must be aware of schema changes, if any. It is best practice to review database migrations during team meetings and code reviews. This way, everyone on the team is in the loop.
Clean your migration history regularly: Writing separate scripts for individual, focused changes will ensure your migration history remains organized. Never add unrelated schema changes to your application.
Manage sessions effectively: Efficient session management is a crucial factor in database interactions impacting both performance and data integrity. The sessionmaker and scoped_session functions allow you to create and manage session objects consistently. Coupling this approach with a context manager helps ensure that sessions are handled, committed, and closed optimally.
Test your migrations: Writing test cases for migration scripts is essential – you must verify that they perform the data transformations and schema changes. More importantly, your scripts shouldn't introduce issues into your application. Using Docker or other tools to create isolated testing environments is also a good choice.
Use advanced filtering techniques: SQLAlchemy features methods such as join, group_by, and having. These allow you to reduce database load by retrieving specific data subsets more efficiently.
Use Alembic for version control: With Alembic, you can manage schema changes in a structured and version-controlled manner. Besides allowing you to track the schema's evolution over time, Alembic facilitates collaborating with other developers.
Use indices: Creating indices on frequently accessed columns is an excellent way to improve query performance. You can then analyze your application's database usage patterns and add or remove indices to optimize the schema further. Remember that you can also use SQLAlchemy's built-in performance-tuning tools for further database optimization.
Use modular classes: Organizing your schema into distinct, modular classes enhances code comprehensibility and makes it easier to maintain. Remember to assign descriptive names to relationships, tables, and columns for optimal code readability. These steps simplify schema management, making future modifications easier.
Use the declarative mapping system: SQLAlchemy's declarative mapping system lets you use Python classes to define table schemas and relationships. This method offers improved code clarity and maintainability compared to raw SQL. Using declarative mapping is one of the best ways to facilitate better code organization and streamlined schema updates.

SQLAlchemy Expression Language, Advanced Usage

Expression Language

Database Models

Create Users, Products and Shopping Carts

Use the Expression Language to Query the Database

Potential Gotchas

Tips and Summary

Best Practices for SQLAlchemy

About The Author

Xiaonuo Gantan

Latest Articles

Tags