X-Git-Url: https://git.jsancho.org/?p=datasette-pytables.git;a=blobdiff_plain;f=DEVELOPERS.md;h=762b9da202b4dc895590ebf0e4c29aeca172d803;hp=e0f8db1eea6e084b1dcc6fdeae3f1337b31bcf1a;hb=ab111327f9822300ab1d8605ff86cfcebb78aeee;hpb=3b69fc3150e21922485816efbeed22cb561f2af9 diff --git a/DEVELOPERS.md b/DEVELOPERS.md index e0f8db1..762b9da 100644 --- a/DEVELOPERS.md +++ b/DEVELOPERS.md @@ -1,71 +1,61 @@ -# How is datasette-pytables made? +# How is datasette-pytables implemented? -Datasette-PyTables is an external connector for [Datasette](https://github.com/simonw/datasette). Datasette publish data in SQLite files to the Internet with a JSON API, and this connector provides a way to do the same with PyTables files. +Datasette-PyTables is an external connector for [Datasette](https://github.com/simonw/datasette). Datasette publish data in SQLite files to the Internet with a JSON API, and this connector provides a way to do the same thing with PyTables files. By using [Datasette-Connectors](https://github.com/PyTables/datasette-connectors), we can load external connectors that allow us to access to any data container. For this, the connectors need the interface that is described here. By following these interface, you will can make connectors for other data sources too. -Using a modified version of Datasette, [Datasette-Core](https://github.com/PyTables/datasette-core), we can load external connectors that allow us to access to any data container. For this, the connectors need a certain structure. +## Starting from scratch -Reviewing datasette-pytables code, you will see how to make other connectors for your needs. +For making a Datasette connector for your favorite database files, you need to inherit from `datasette_connectors.Connector`. Then, you can specify your connector type in the class property `connector_type` and, very important, you should set `connection_class` property with a class that inherits from `datasette_connectors.Connection` and implements a method for opening your database files. -## Tables inspection - -First of all, we need to export a special method called `inspect` that receives the path of the file as an argument and returns a tuple formed by a dictionary with tables info, a list with views name and a string identifying the connector. - -Each entry in the dictionary for tables info has the next structure: - - tables['table_name'] = { - 'name': 'table_name', - 'columns': ['c1', 'c2'], - 'primary_keys': [], - 'count': 100, - 'label_column': None, - 'hidden': False, - 'fts_table': None, - 'foreign_keys': {'incoming': [], 'outgoing': []} - -This structure is used for PyTables. Maybe, in your case, you will need things like primary keys or foreign keys. - -## Returning results - -Datasette runs through SQL queries, so your connector has to accept these queries and execute them. The next class and methods are needed: +For example, for Pytables the next class definition is used: - class Connection: - def __init__(self, path): - ... + import tables + import datasette_connectors as dc - def execute(self, sql, params=None, truncate=False, page_size=None, max_returned_rows=None): - ... + class PyTablesConnection(dc.Connection): + def __init__(self, path, connector): + super().__init__(path, connector) + self.h5file = tables.open_file(path) -The `execute` method receives: + class PyTablesConnector(dc.Connector): + connector_type = 'pytables' + connection_class = PyTablesConnection -* sql: the query -* params: a dictionary with the params used in the query -* truncate: a boolean saying if the returned data can be separated in pages or not -* page_size: the number of rows a page can contain -* max_returned_rows: the maximum number of rows Datasette expects +## Tables inspection -We need to parse the query because PyTables has his own style for queries, but other databases could work with the SQL queries without requiring any parsing. +Datasette needs some data about your database so you have to provide it overwriting some methods in your custom connector. For that, the connector stores and instance of the class set in `connection_class` in the property `conn`, so you can use `self.conn` to access to your database in order to retrieve that data. -Sometimes, Datasette make queries to `sqlite_master`; you need to keep it in mind. +The methods that must be overwritten are: -The `execute` method has to return a tuple with: +* **table_names(self)**: a list of table names +* **hidden_table_names(self)**: a list of hidden table names +* **detect_spatialite(self)**: a boolean indicating if geometry_columns exists +* **view_names(self)**: a list of view names +* **table_count(self, table_name)**: an integer with the rows count of the table +* **table_info(self, table_name)**: a list of dictionaries with columns description +* **foreign_keys(self, table_name)**: a list of dictionaries with foreign keys description +* **table_exists(self, table_name)**: a boolean indicating if table exists in the database +* **table_definition(self, table_type, table_name)**: a string with a 'CREATE TABLE' sql definition +* **indices_definition(self, table_name)**: a list of strings with 'CREATE INDEX' sql definitions -* a list of rows (Datasette expects something like SQLite rows) -* a boolean saying if the data is truncated, i.e., if we return all the rows or there are more rows than the maximum indicated in max_returned_rows -* a tuple with the description of the columns in the form (('c1',), ('c2',), ...) +## Returning results -## Rows format +Datasette uses SQL for specifying the queries, so your connector has to accept SQL and execute it. Overwriting `execute` method you can receive the query in SQL format and return some results. -Datasette receives the results from the queries with SQLite row instances, so we need to return our rows in a similar way. +The `Connector.execute()` method receives: -For example, if we have the next query: +* **sql**: the query +* **params**: a dictionary with the params used in the query +* **truncate**: a boolean saying if the returned data can be separated in pages or not +* **custom_time_limit**: an integer with a time limit for the execution of the query in seconds +* **page_size**: the number of rows a page can contain +* **log_sql_errors**: a boolean saying if errors has to be logged - SELECT name FROM persons +In our case, we need to parse the SQL query because PyTables has its own style for queries, but other databases could work with the SQL queries without requiring any parsing. -we need to return an object that allows to do things that: +Note: Sometimes, Datasette make queries to `sqlite_master`; you need to keep it in mind. - row[0] == 'Susan' - row['name'] == 'Susan' - [c for c in row] == ['Susan'] - json.dumps(row) +The `Connector.execute()` method has to return a tuple with: -We extend `list` class to get it, but if you respect the requirements for rows, you can develop your own implementation. +* a list of rows; each row is a dictionary with the field name as key and the field value as value +* a boolean saying if the data is truncated, i.e., if we return all the rows or there are more rows than the maximum indicated in max_returned_rows +* a tuple with the description of the columns in the form `(('c1',), ('c2',), ...)`