We often encounter data as Relational Databases. To work with them we generally would need to write raw SQL queries, pass them to the database engine and parse the returned results as a normal array of records.
SQLAlchemy provides a nice “Pythonic” way of interacting with databases. So rather than dealing with the differences between specific dialects of traditional SQL such as MySQL or PostgreSQL or Oracle, you can leverage the Pythonic framework of SQLAlchemy to streamline your workflow and more efficiently query your data.
Other stories on datascience can be found here
Installing The Package
pip install sqlalchemy
Connecting to a database
To start interacting with the database we first we need to establish a connection.
import sqlalchemy as db engine = db.create_engine('dialect+driver://user:pass@host:port/db')
Some examples of connecting to various databases can be found here
Viewing Table Details
SQLAlchemy can be used to automatically load tables from a database using something called reflection. Reflection is the process of reading the database and building the metadata based on that information.
MetaData have already been imported. The metadata is available as
ResultProxy: The object returned by the
.execute() method. It can be used in a variety of ways to get the data returned by the query.
ResultSet: The actual data asked for in the query when using a fetch method such as
.fetchall() on a ResultProxy.
Dealing with Large ResultSet
.fetchmany() to load optimal no of rows and overcome memory issues in case of large datasets
while flag: partial_results = ResultProxy.fetchmany(50) if(partial_results == ): flag = False // code // ResultProxy.close()
Convert to dataframe
df = pd.DataFrame(ResultSet) df.columns = ResultSet.keys()
Lets see some examples of raw SQLite Queries and queries using SQLAlchemy.
SQL: SELECT * FROM census WHERE sex = F SQLAlchemy: db.select([census]).where(census.columns.sex == 'F')
SQL: SELECT state, sex FROM census WHERE state IN (Texas, New York) SQLAlchemy: db.select([census.columns.state, census.columns.sex]).where(census.columns.state.in_(['Texas', 'New York']))
and, or, not
SQL: SELECT * FROM census WHERE state = 'California' AND NOT sex = 'M' SQLAlchemy: db.select([census]).where(db.and_(census.columns.state == 'California', census.columns.sex != 'M'))
SQL: SELECT * FROM census ORDER BY State DESC, pop2000 SQLAlchemy: db.select([census]).order_by(db.desc(census.columns.state), census.columns.pop2000)
SQL: SELECT SUM(pop2008) FROM census SQLAlchemy: db.select([db.func.sum(census.columns.pop2008)])
other functions include avg, count, min, max…
SQL: SELECT SUM(pop2008) as pop2008, sex FROM census SQLAlchemy: db.select([db.func.sum(census.columns.pop2008).label('pop2008'), census.columns.sex]).group_by(census.columns.sex)
SQL: SELECT DISTINCT state FROM census SQLAlchemy: db.select([census.columns.state.distinct()])
case & cast
case() expression accepts a list of conditions to match and the column to return if the condition matches, followed by an
else_ if none of the conditions match.
cast() function to convert an expression to a particular type
.scalar to the result when the result contains only single value
If you have two tables that already have an established relationship, you can automatically use that relationship by just adding the columns we want from each table to the select statement.
Creating and Inserting Data into Tables
By passing the database which is not present, to the engine then sqlalchemy automatically creates a new database.
Updating data in Databases
Dropping a Table
table_name.drop(engine) #drops a single table metadata.drop_all(engine) #drops all the tables in the database
IPython Notebook’s and other assets of this story can be found here
See ya again 😃
Originally posted: https://towardsdatascience.com/sqlalchemy-python-tutorial-79a577141a91