Querying complex unknown database with python

In my job, I sometimes have to deal with web applications that are pointing to half a dozen databases with an overall of more than 160 tables.

Not to say it can be complex, but sometimes it is :)

So I made a small script using 2 of my favourite technologies to deal with that: graphviz, and sqlsoup.

Sqlsoup is a lesser known child of the great software that Mike Bayer (zzeek) made: sqlalchemy.

I have been told that when you know an ORM you don't need to know SQL. I strongly disagree (especially when it comes to performance issues with wrong types of indexes or lack of). However, when you use the declarative sqlalchemy syntax, sqlalchemy does a lot of thinks rights that may helps a lot: it creates foreign keys when you use the backrefs (unless you use MyISAM). And sometimes you have a de-synchronization in your model (ho! someone made an alter table with SQL), you need to be able to do stuff on the data and your model does not help.

And these foreign keys helps a lot to construct an entity diagram relationship of the python objects that may be used.... to navigate easily in a flow of data. Because, the model may be a nice map. But sometimes you may need to rebuild it when the map is not the territory anymore. And time is pressing.


gist here https://gist.github.com/jul/e255d76590930545d383


Here, as a test case I used turbogears quickstart that I consider an under-rated framework. It is very well written and correct, and it has a lot of production features I like: possibility to easily change the technologies you don't like in the stack (mako or genshi, it is your choice), database versioning ...

The only stuff a quickstart builds is the data for authorization/authentication.

Here is the result of this script (of course, installing graphviz is mandatory to transform the out.dot file to a picture, and as you see the window's version has some glitches for the fonts)

turbogears quickstart database construction

and what is nice is that with using the interactive option you can directly go in the database with your diagram under your nose and directly get your data:

ipython -i generate_diagram.py postgresql://tg@localhost/tg
 

#postgresql://tg@localhost/tg problem with u'migrate_version'
#(...)
#SQLSoupError("table 'migrate_version' does not have a primary key defined",) 
#nb col = 17 
#nb fk = 4 
In [1]: db.tg_permission.join(db.tg_group_permission).join(db.tg_group).join(db. tg_user_group).join(db.tg_user).filter(db.tg_user.user_name == 'editor').all() Out[1]:[ MappedTg_permission(permission_id=2,permission_name=u'read',description=u'editor can read'), MappedTg_permission(permission_id=3,permission_name=u'see stats',description=u' can see stats') ]

I thought of making a module. But sometimes in the turmoil of production you don't have the time for it. A single short script is sometimes what you need with very few requirements that can be tweaked fast to adapt.

Note: Alembic (still from Mike Bayer) -that is nicely included in turbogears- can of course generate sqlalchemy models from a db and even make a diff between a database and an sqlalchemy declarative model. I don't know this guy but I must admit that even if sqlalchemy is a tad heavy when used correctly his softwares are great. And even if it is heavy pip install works fairly well in regard of the stability and complexity of the API.

Mister Mike Bayer -if you read me- I am a fanboy of yours and I like how you answer to people on the mailing list.

PS: yes I know about the graphviz module. But I use graphviz so often, it is faster for me to just write directly in a file.

No comments: