This document is intended to help users to understand the STN database. It explains the logic that lies behind its basic components, and the strategic decisions taken by its designers.

The STN database is what is known as a ‘relational database’ because each piece of data exists in relation to several others (and therefore changing one piece of data may have knock on effects elsewhere). But it is also ‘an interpretational database’, because much of the data it contains had to be interpreted in during the process of data entry or reinterpreted thereafter. The data therefore contains multiple layers of uncertainty.

Because of this uncertainty, the complexity of the STN database and the difficulties inherent in locating, defining and capturing books and ‘events’ in time and space from multiple sources, care is required when interpreting individual pieces of data. It is hoped that this document will help users with this vital task.

Nevertheless, users may find it helpful to familiarise themselves with the data interface before proceeding further.

Archival Sources

The archival sources from which the STN database was compiled are the Fonds de la Société typographique de Neuchâtel (STN) which are housed in the Bibliothèque publique et universitaire de Neuchâtel (BPUN) in Switzerland.

This rich collection has been described in detail by Jacques Rychner, whose study helped confirm our theories about how to construct the STN database. The archive comprises many types of documents relating to the day to day business of the publishing house, but for our purposes none of the documents survives in a complete run. For example, none of the different types of accounting records used in this project survives for the period 2 June 1787 to 22 May 1790. We have been able to gain partial data for the first part of this period from other sources (e.g. the records of a sales tour by the STN agent Durand; miscellaneous correspondence), but from 1788-1789 even the STN’s in-letters mysteriously dry up. The business continued trading for another five years. Whatever letters they received during this time ended up elsewhere and do not appear to have survived.

As a result, the project has had to create a single record by compiling fragmentary data from disparate sources. The good news is that the records to do this survive for almost the entire period from the STN’s foundation in 1769 until 1 June 1787, and again from 21 May 1790 to 1794.

For information about how the sources have been used, see the sections on ‘Philosophy and Methodology’ and ‘Research Methods’ below or the Options menus in the STN database.

Philosophy and Methodology

Approach

The STN database records and collates what we have called ‘solid accounting information’ drawn from the STN’s surviving accounting records. We set out to capture only that evidence which recorded the actual physical presence or movement of books by the STN and its clients. For this project, the mere ordering of a book is not enough. There needs to be clear, recorded evidence that a transfer of physical volumes took place. This is not to deny that accounts can be erroneous, falsified, or mistranscribed. Or the wrong title or quantity of books might be sent by mistake. Nevertheless, we believe our approach offers a more concrete approximation of reality than many other studies.

This is a very different approach to most previous statistical studies of the STN’s trade. They have tended to look at the STN’s in-correspondence to find orders placed. The most famous such statistical studies are found in Robert Darnton’s seminal Forbidden Bestsellers of Pre-Revolutionary France (1996) and its companion volume of appendices, The Corpus of Clandestine Literature in France, 1769-1789 (1995). Using the correspondence of a sample group of French booksellers, Darnton recorded orders for 28,212 copies of 457 clandestine works, which he then collated into an illegal bestseller’s lists and categorised by subject, in order to provide a statistical underpinning to his theories on the influence of illegal literature.

Why a database?

Robert Darnton’s work directly inspired the French Book Trade in Enlightenment Europe database project. It both showed the potential of the archive and the limitations of the traditional methods and means of representing this sort of data. Despite his best efforts to present his data transparently, it is impossible for Darnton’s readers to use his appendices to track changes in demand over time. Using them to trace the geographical spread of orders is possible, but requires skill and laborious effort. In addition, though Darnton spent years of toil sifting through booksellers’ correspondence, his survey only sampled one aspect of one sector of the STN’s trade with a single country – sales of illegal works to France. His conclusions about the revolutionary impact of scandalous works could not therefore be validated by comparative analysis. Finally, Darnton’s work measures demand: there remains a question mark over whether all the works ordered were supplied. He himself admitted that not all works requested reached the STN’s customers. How great was this discrepancy?

It was the realisation that these limitations, methodological problems and questions could be resolved by a database that led to the French Book Trade in Enlightenment Europe project. The aim was to look at the STN’s entire trade with all Europe – including the supply origins of its books as well as sales. The result, we hoped, would be a far more rounded, comparative, chronologically and geographically nuanced view of the pan-European trade in (mostly) French books in the late enlightenment. We think we have succeeded in doing this – and developing transferrable tools and models for other projects.

Sources and Research Methods

The basic spine of data from which the database is built came from the accounting records of the STN. It was entered into the database using a custom built data editor. The first data fed into the database concerned clients, their places of residence and their professions. It came from the BPUN’s MS 1000A, card index and out-letter book indices (see Clients below for more details). Information on further clients was added as we encountered them in the STN’s accounting data. Bibliographic data on individual editions traded was compiled initially from the STN’s own printed and manuscript catalogues and stock books, as well as the catalogue of an allied bookseller, Louis Fauche-Borel. This meant we had data on the majority of clients and on the books that the STN traded before the accounting data was added.

The STN records gave us data about the volume of books traded; whence they came from or where they went to; the routes they took; who bought or supplied them; the dates on which they were accounted for as printed, received, despatched, or counted in stocktakes. However, not all of our sources could provide all of this information, though much could be gleaned by cross-referencing.

Our favourite sources, which we call ‘day books’, were those the STN labelled variously Brouillards, Journals and Mains Courantes. These are arranged by order and tend to supply all the types of information given above for any given transaction. They also offered information about agents and middlemen and the crate numbers in which books were despatched. For all periods for which these books survive, they were our preferred source.

The second richest sources are the STN’s ‘stock books’ (‘rencontres’) which served as rolling inventories. As a result each entry tends to begin and end with a dated stocktake. The ‘rencontres’ are arranged roughly alphabetically by title and edition, with a column for the STN’s sales and another for their purchases. They give the names [only] of vendors and purchasers of books and the dates of any transaction. They do not say where clients were located and they do not give supplementary data about agents and middlemen, or trade routes.

For the brief periods where no ‘day book’ or ‘stock book’ survives, we have turned instead, wherever possible, to the STN’s ‘Order Books’ (Livres de Commissions) as our main source for sales. However, even for sales data they are not comprehensive. They only give us the books that were sent to the STN’s mail order clients. They do not contain some local customers who appear to have had separate accounts, nor do they record job-printings for clients, retail counter sales or any other non-standard business etc. They also do not contain any information about the STN’s purchases. As a result, we have supplemented the data from the ‘order books’ with a variety of other sources, where they survive, notably:

1) Booksellers correspondence, from which we have only taken solid accounting evidence for the exchange of books, eg. written accounts or lists of books accompanying the letter. Due to the size of the archive, we limited our research to letters from book trade customers who were in correspondence with the STN during the year before or after any gap in our data. Such people were both more likely to be trading with the STN during the periods where other data was lacking; they were also more likely to provide accounts which met our criteria of ‘solid accounting data’. Occasionally letters were also consulted in order to resolve an ambiguity in our other data.

2) Miscellaneous other accounting sources were also used. Wherever they survived, we also consulted petty cashbooks and accounts of minor creditors, for example. Most of these types of source survive for only a short period. All such sources were checked, as they sometimes provided extra details. The cashbooks might, for example, reveal the identity of someone who bought a book over the counter but who had not been named in other sources.

3) A final and rather anomalous source used is a dossier recording the sales tour of the STN’s agent Durand to the south of France, northern Italy, Austria, Germany and back into Switzerland in 1787-1788. The data the dossier contains is less solid than for other periods, and does not cover all the STN’s trade in the period. The data is recorded in a numerical code (based on the catalogues Durand carried with him), but which was cracked during the course of the project (I leave Mark Curran to explain how he did this elsewhere). Durand instructed the STN what to do and which books to send to whom and how. However, he was transacting business at a distance from Neuchâtel, so could not have had a clear and up to date idea of stock. Thus much of the data from this source is a half-way house between correspondence data and solid accounting data. We have used it because it is the best evidence we have for the period. Users who are troubled by this can use the ‘Data source’ option in the ‘Options’ menu to filter out such data.

The manuscript source of all data concerning the distribution and acquisition of books by the STN is carefully recorded in the database. (In the interface, it can be found by clicking ‘show details’ for any individual event). At a macro-level we have indicated which sources were providing data at any given period in a calendar available through our the ‘Visualisation gallery’ under the ‘Help and Resources’ menu. It has a box for each day of the STN’s existence, and each day for which one or more book ‘orders’ are recorded is colour coded according to the source of their data. Darker shading indicates more ‘orders’ taking place.

More information about how we have used our various sources is available through the drop down ‘Options’ menu under the ‘Data source’ option. For a more detailed discussion of these sources, consult Jacques Rychner, ‘Les Archives de la Société typographique de Neuchâtel’ in L'Edition neuchâteloise au siècle des Lumières: la Société typographique de Neuchâtel (1769-1789), ed. Michel Schlup (Neuchâtel, 2002), 179-210.

Records from the STN’s own archives have been enriched throughout the project. Further bibliographic data was gathered from collective library database catalogues (notably Worldcat; the Karlsruhe portal; RERO and the Catalogue collectif de France) and from on line editions provided by digital repositories, above all by Google Books, the Hathi Trust, and Gallica. These bibliographic sources also provided considerable amounts of data for the process of categorisation (more will be said about these documents and how we have used them later in this document). In addition, the place data was enriched by mapping it against geo-political definitions of space; client data was linked to groups of professions; and book data was enriched with the provision of taxonomic data and measures of illegality. All this extra data is provided to help users to navigate or interpret the STN archive.

The On-line Data Interface

It is not enough to input data into a well structured database. The database’s effectiveness as a tool also depends on how effectively data can be extracted. For that reason we have invested a great deal of time and effort in the data interface. Our basic idea has been to devise a tool that can run more or less every search query that will be needed for the project team’s printed publications from the project. If it is capable of meeting our most complex needs, it should be sufficient for most other users, too. And we have tried to make it as intuitive as possible, despite its wide range of function choices on offer. (Altogether, there are 71 functions under the Browse, Map, Query, Rank and Compare dropdown menus alone).

This range of choices has therefore been carefully organised under just five drop down menus – each relating to a particular type of search query. In addition, there is a search button on the top right of the interface. To further aid navigation, hyperlinks are provided, leading to further data. We have also provided general Search functions and an advanced edition search: they provide 11 more baseline functions. Wherever possible visualisations and maps have been built in. Google books images of editions in the database have also been embedded.

The most innovative feature of the On-line Data Interface, however, is a variety of ‘Options’ menus. These are provided to help users make more specialised queries or maintain data integrity. In effect the ‘Options’ are customised filters that allow users to interrogate a subset of the data in the database. They might for example wish to create statistics relating only to women bookdealers; illegal books; French editions; or non-trade client of the STN. Users select which ‘Options’ they wish to leave on and off using a simple tick-box system. In all there are eight ‘Options’ menus available, and each offers multiple possibilities for shaping the parameters of the data interrogated. Full details of what each ‘Option’ does and why you might wish to use it appear under the ‘Options’ menu. (For further discussion of the ‘Options’ see ‘Data Integrity, Distortion and Tools’ below).

Hopefully all these features make the user experience pleasant and reasonably intuitive. They also make possible most complex searches and queries we wished to do ourselves. For skilled users, however, we have also provided a downloadable My SQL version of the database on which they can run more specialised queries. Such users are also free to customise our data in downloaded versions subject to the licensing restrictions listed in the database (under help and resources). But for most users of this database, it will be easier to use the online interface.

Data integrity, Distortion and Data Tools

This document identifies numerous ways in which our data is problematic, incomplete or contains distortions. Some of these – for example compromises and simplifications in the mapping process - have to be accepted. Others – such as the existence of different sorts of clients or our reliance on different source types – can be counter-acted by some of our tools.

Options menus

The most important such tools are the eight Options to be found under the Options menu. The options are in effect ways of filtering the data which is interrogated, allowing us to examine only selected subsets. They fall into three categories. The first two are about data integrity. The 'data source' option is intended to allow users to interrogate only data drawn from certain types of source document. By doing so they limit the data supply and chronological periods of the STN’s trade on which they draw, but are assured of more consistent and comparable data outputs. The 'client data source' does much the same for clients, limiting search queries to clients whose client data came from consistent sources and who hence interacted with the STN in certain ways (eg. by corresponding with them).

The next two options – the ‘edition type’ and ‘client type’ options are intended to root out data distortions stemming from the way business was structured. For example, by excluding STN editions, we can get more consistent bestseller tables, since the STN aspired to sell 100% of their own publications, while they tended to handle only about 10% of anyone else’s editions. The ‘client type’ option allows us to do something similar with clients – for example by excluding the ‘commissioning clients’ who frequently had whole editions printed and despatched to them or their agents in a single place. Such bulk sales probably confuse our picture of sales. Both menus also allow us to examine particular subsets of data, adding nuance and accuracy to certain researches.

The final four options all add to our ability to study data subsets. The first limits by client gender, allowing us to examine whether women in the book trade seem to have acted any differently to men, for example. The others allow us to isolate illegal books, translated works or works printed in a particular language, permitting similar fine analyses.

Other tools

Other database tools compensate in other ways. The problem of comparing data across time is dealt with by the choice to select ‘comparing percentages of total’ [trade] within a given year rather than ‘raw score’. This allows us to search for trends over time. Equally the choice to examine net sales (sales minus returns) compensates for any distortion due to the relatively high number of returns on some titles. The very existence of this problem had not been foreseen, for Darnton claims that there was no system of returns.

Customised search queries

In an ideal world, it would have been nice to allow readers to create their own ‘basket’ of books (or authors, or clients, or places etc.) on which to run ‘query’, ‘map’, ‘rank’ or ‘compare’ functions via the interface.

This is certainly something that would be feasible from a technological point of view. However, given the complexity of our data structures and variables involved in many of the existing ways of interrogating the database, our design team felt it would be complex to execute and slow to run. We therefore chose to set this challenge to one side.

For the moment, we hope users will be content with the almost infinite range of options already existing in the interface. But if there is sufficient demand and resources become available, it might be possible to offer this as an upgrade in the future. In the interim, the best solution is to download the SQL version and interrogate that.

Database Content

Books

Books, Book Sales and Bibliometrics

The STN database is a database about books and their dissemination across time and space. Its basic building block is single book trade ‘events’ – usually involving the transfer of books, normally by sale or purchase, between the STN and its clients, as recorded in the STN’s account books. There are also events in which no books change hands – stocktakes or the printing of a new edition, for example. Each ‘event’ involves one or more copies of a single edition of a certain work. Such events can be pinpointed to a single day, and related, in most cases, to a particular client in a particular place.

In measuring the volume of the STN’s trade, the FBTEE project assumes that on some basic level, there is some relationship between the dissemination of a book and its significance. However, here it should be noted that to discuss the data in terms of ‘sales’ and ‘purchases’ is somewhat misleading (though in places in our instructional materials and interface we have ourselves used such terms). In fact the database measures the distribution of books by, and supply of books to, the STN. For many ‘events’ we are unaware of the underlying nature of the transaction between the STN and its ‘clients’, because our records do not always give us this information. A book sent out by the STN may be a mail-order sale; but equally it might be a book swapped with a fellow bookseller or sent as a gift or to the censor. Users are therefore encouraged to think in terms of copies ‘distributed’ or ‘received’ rather than ‘bought’ or ‘sold’. Moreover, not all books received were ordered explicitly: some customers requested the STN to send them any new works they printed or acquired.

Editions and Superbooks

Advanced users of the STN database need to understand the concept of a ‘superbook’. This is a concept borrowed from the Science in the Victorian Periodical project (SciPer).

Basically, the Superbook is a conceptual container for different editions of the same work. Hence, multiple different editions of a given work are grouped together by the database into ‘superbooks’ (also occasionally called ‘books’ in the database. In this document however, the term ‘book’ will be used to represent a single copy of a single work). Thus any printed work in the database exists at two levels – the individual ‘edition’ and the ‘superbook’, which combines all relevant editions, including foreign language translations, into a single unity. This distinction is essential to an understanding of the data, because the database operates on both levels. It deals with data and functions common to all editions of a superbook (eg. categorisation, illegality, bestseller tables and other basic functions for interrogating the database through the on-line interface) whereas others are recorded and can be searched or filtered at edition level (precise title, edition type).

Broadly speaking, the types of queries that can be run through the interface operate and gather their statistics at the Superbook level. The exceptions are 'Map', 'Rank', and 'Compare' for Authors, Languages, Original Languages, Place of Publication. For all of these, it is necessary for the database to consult individual edition level data. The ‘Advanced Edition Search’ [found in the Search box] also operates at the edition level.

Defining Books

What is a book? This is a knotty problem for book historians. They have traditionally seen pamphlets, serial publications and printed ephemera in hierarchical terms, and argue about whether or not they are also books. It is also a problem in database terms.

On one level, our answer is simple. A book is whatever the STN accounted for as a book. Thus our database does indeed, for example, include pamphlets, some of them as short as 4 pages, as well as isolated volumes of journals, almanacs and other ephemera. It does not include job-printings of handbills, trade cards, proclamations and other one-off printings for the Neuchâtel authorities or local businesses. These were not usually recorded by title, nor treated as general merchandise in the accounts.

The books we count are not all equal, either. While some are four pages long, others may be multi-tome works produced over several years in 30 volumes. The sale of a copy of either would count as a unit sale in the STN accounts, and thus a single sale in our statistics. Hence in the database the sale of a lone volume of our 30 volume set would count as just 1/30 of a unit. Wherever possible our data records precisely which volumes of which editions were distributed or received by the STN in individual transactions: indeed this was necessary to arrive at our statistics.

Defining Superbooks

A more difficult problem, at least theoretically, was defining which editions should comprise a ‘superbook’. Should radically ‘new, improved editions’ containing major textual additions or amendments be included in the same superbook as older editions? Should simple translations be included? What about loose translations? Did changes of title matter, if most of the material was the same? What about editions containing continuations or sequels? Or later editions of anthologies which contained pieces by new authors? These were serious problems for us.

However, generally, we took superbooks to be comprised of all editions that belonged to the same unitary intellectual project. Simple translations, re-editions, and moderately revised or extended editions certainly belonged under the same superbook. Short extracts from a larger work or discreet sections published separately did not. This approach dispelled most problems when they were examined on a case by case basis. In practice, however, a ‘superbook’ is whatever we say is a ‘superbook’.

This does mean that some works are counted in two places. Where Rousseau’s Emile was sold in isolation under the title Emile it counts as a single sale of the superbook Emile in our statistics. If it was sold as one volume of a 15 volume collection of his Oeuvres, it is accounted for as 1/15 of his Oeuvres and appears as such in our statistics. This problem is inherent in our data, and we could find no way around it. Fortunately, such cases are relatively rare. Where we have spotted them, they are recorded in our attribution notes.

The bibliographic detective work and effort involved in identifying precise editions and collating them into superbooks – which was necessary for the reliability of our statistics and understanding of the European book trade – has been one of the most laborious and time-consuming tasks in preparing the database. It was a major research task in its own right. But we also believe that it will prove one the most valuable contributions to scholarship and book history.

Compiling edition data

Data on editions in the database comes from multiple sources, primarily the STN’s accounting records (where data is often partial or incomplete, and sometimes just a single word title). This data was then cross-referenced with the printed and handwritten catalogues of the STN and allied houses (notably Louis-Fauche-Borel’s catalogue of 1787, which was used to raise orders by the STN’s travelling agent Durand) and on-library databases. These included the on-line collective catalogues available through RERO, the Catalogue Collectif de France (CCF), the Karlsruhe portal, and Worldcat. Data from these on-line catalogues was used to both confirm and enrich the data found in the STN’s own accounts and catalogues. These on line catalogues are the source of much of the data for publishers, author attributions, and places of publication, as well as the number of pages, which are often lacking in the STN sources. Occasionally other bibliographic or antiquarian book trade sources also provided data during bibliometric data entry or the catagorisation process.

In ideal cases the bibliographic data on books contains a vast array of information – title, authorship data, year and place of publication, publisher’s name, details of false imprints, pagination and edition format, actual and original language of publication.

In addition, each edition has been given an ‘edition type’ by place of origin, as an aid to interpreting our data. This allows users to filter our data, using the Editions type Options menu, to remove certain biases from the dataset (see below).

Dealing with false and uncertain data

Where publication year, place of publication or publisher name appear to have been false, they are recorded in a separate database field to the (supposedly) genuine data. Search query results will default to the ‘genuine’ field (although advanced edition searches by place will bring up both false and genuine imprints).

Where data on titles was incomplete or ambiguous, this has been noted in the attribution notes for individual editions.

Ascribing editions

Identifying which editions were traded in individual transactions events was another area of uncertainty. Although catalogues and stock books often indicated which editions were being traded at a particular period, other sources frequently did not identify the precise edition sold, purchased, given or exchanged. Nor did we always know the exact dates of printing or print-runs for STN editions, though in cases where these could both be derived approximately we have included an ‘assumed printing’ in our records of events.

This means that the edition recorded for any given ‘event’ should be taken as our suggestion for the ‘most likely’ candidate. As with much of the other data in the database, there is often room for rational doubt. This is particularly the case when more than one edition was being traded at a given moment, or around moments when the STN moved from one supplier of a work to another. In other cases, as noted in the attribution notes, we have not been able to identify a precise edition at all.

However, for a clear majority of ‘events’ we can aspire to something approaching near certainty which edition was traded. In cases where a work was printed by the STN, or acquired by them from named publishers of known editions, it is usually possible to see stock entering and leaving their stockroom. Where catalogues record the presence of a particular edition over a period of time there is also considerable certainty, and the same is often true of the records from the stockbooks. In many other cases, only a single edition of the work is extant, or only one edition exists from the relevant time period. Where doubts remain, other indications such as price, number of sheets, number of volumes or format are often given in the sources, and that too can often pin down editions. We have also established that the STN traded primarily in locally-produced editions sourced directly or indirectly from local trading partners. There were obvious economic and practical reasons for doing so, and hence there is no circularity in assuming that in cases where more than one edition was available, they are likely to have traded editions produced in the Swiss romande.

Users wishing to gauge the accuracy of edition ascriptions for individual ‘events’ are encouraged to check (1) attribution notes; (2) stock acquisition events and stock takes for the preceding period; (3) manuscript source and (4) any transactions notes recorded on the event record (there are several hundred of these among the 70,000 records). In combination these will indicate the degree of certainty with which an individual ascription can be treated.

There are, of course, likely to be cases where individual users with specialised bibliographic knowledge may be able to supplement, refine or perhaps challenge our edition ascriptions. We would encourage such users to get in touch with the Principal Investigator so that our records can be upgraded. The possibility of such specialised crowd-sourcing is not the least reason for promoting a publicly available data interface.

Describing books

There are several ways of approaching the content of books in the database. Where a known writer tended to write on a single subject, it can be done by querying individual authors. Where the concern is with illegal or subversive literature, the topic can be approached by using the options menus to limit searches to the various corpuses of illegal literature. And sometimes a title or title keyword alone (identifiable via the search menu) is enough to tell the user about likely subject matter. However, for a more systematic understanding of the books sold by the STN, we have provided taxonomic data on the content of each book.

General Remarks on Taxonomic Systems

The first problem this presented was what sort of categorisation system did we wish to use. Library professionals at the University of Leeds suggested that we ought to consider the Dewey system or the Library of Congress. However, Dewey was too general and non-specific for our purposes, while the LoC system appeared complex to master and liable to inconsistent application. Dewey also seemed too focused on modern preoccupations. This raised the further question of whether we wanted a system that reflected the world view of eighteenth-century readers or twenty-first century librarians, bibliographers or academics. There was also a further dilemma of whether we wanted a multiple set of descriptive keywords (like LoC), or a single category for each work (as with Dewey). The former would present dilemmas about where to place individual works but allow us to present the totality of books as amounting to 100%, so that we could use pie-charts and similar data representations.

After much reflection, it was decided that we could, in this case, ‘have our cake and eat it’, by adopting two systems, one modern and purpose built, the other an eighteenth-century system. These are described below.

The System of the Parisian Booksellers (Parisian Categories):

The eighteenth-century system we adopted was an iteration of the ‘system of the Parisian booksellers’. THis was used primarily to catalogue private libraries, and, more crudely in some booksellers catalogues. One advantage of this system was that it was familiar to bibliographers and book historians, and has indeed been used by some of them, at a basic level, in their own works.

Developed by Parisian booksellers in the late seventeenth and early eighteenth century, the system is essentially a tree system. Each work is placed in one of five major categories (Théologie; Jurisprudence; Arts et sciences; Histoire; Belles lettres), each of which has numerous sub-branches and sub-sub-branches. As the system was flexible – probably no two catalogues included exactly the same set of sub-branches – we had still to decide precisely which sub-categories to include.

Thus, early in the project, we adopted an iteration of the Parisian system which seemed capable of describing more or less every STN work we then knew of. The main problem area, unsurprisingly perhaps, related to atheistic or materialistic attacks on Christianity. Were they to go under Théologie in the sub-category Théologie héterodoxe or under Arts et sciences / Métaphysique under the sub-sub-categories De dieu, de son existence, de sa providence and De l’Ame, de son immortalité, de l’esprit de l’homme etc. ? We knew that users of the system in the eighteenth-century had placed such works in both places. But what mattered for us was consistency, so it was resolved that works that build their own philosophical systems – eg. Helvétius De l’Homme and d’Holbach’s Systême de la nature are filed under metaphysics, whereas works that primarily attack Christianity, such as d’Holbach’s Christianisme devoilée were placed under Théologie heterodox.

For all practical purposes, the system worked so well, that save for a few works that were uncategorisable due to incomplete information, nearly every work was placed. At the end of the categorisation process, we had to add or restore just three extra categories to accommodate a rump of about 20 works – ‘ Franc-maçonnerie’, ‘Jeux’ and ‘Bibliographie’. There are, however, considerable shortcomings in the Parisian system, not least its tendency to lump together all novels under the catch all category ‘Romans’ and put both plays and poetry under the category ‘Poétique’. In addition, modern users will be surprised at the arrangement of certain topics – the catch all Voyages appears under Histoire while ‘Astrologie’ is a branch of ‘Mathématique’. But such apparent anomalies serve to remind us that the men and women of the early enlightenment organised the world in different ways to ourselves. As such they are a tool to help us approach our archive in ways more familiar to its original creators.

To help users to navigate this unfamiliar system we decided to include several ways to access or visualise it in the data interface itself. Under ‘Browse’ menu ‘Books by Parisian Categories’, each of the five main categories (Belles-Lettres; Histoire; Jurisprudence; Science et Arts ; Théologie) is listed with its sub-categories and sub-sub categories laid out beneath it. Indeed, in the case of Histoire de France (only) we even have a further level of sub-categories, each covering various time periods by reign. There are also two visualisation of the system provided, one of which offers assistance with translations of these categories. It can be found under the Visualisations Gallery (under the Help and Resources menu) entitled ‘Bubbles of Parisian Categories’. If you click on any bubble in the visualisation, a translation will appear in the pane to the left of the screen.

The Project Keyword System

The keywords system used in this project is unique to the database. This gave the twin advantages of familiarity with a system we had designed and the creative flexibility to add new words and structures. On the other hand, it is not a system familiar to users, and so each keyword term has been carefully defined. Definitions can be found in the browse keywords function.

The aim of the keywords was to allow printed works to be defined as richly as possible in just a few words. We hoped, while keeping it relatively simple, to be able to capture and describe subject, theme, genre, and in some cases ideological tendencies or erotic intensity. We were very aware that scholars have been interested in all these aspects of the literature we were describing, and that literary scholars might be searching for different things in literary works than historians looking at, say, political texts.

Unlike the tree structure of the Parisian system, which has a fixed place for everything and set hierarchical structures, we hoped that our keyword system would be flexible to apply, and that no relationship between keywords would be so rigid as to be definitive. Nevertheless, as a starting point for developing the system, it was felt necessary to identify a number of key domains of knowledge or literary activity, and then list themes, topics, genre that might occur within them. Likewise, we isolated a number of ideological tendencies (eg. philosophie, works of religiosity, sceptical works, anti-clerical works, and libertine works) that might work along side them. And because it is a key issue in the book history of our period, the system closely defined a hierarchy of works dealing with the sexual side of human relationships, ranging from ‘romantic literature’ and tame or not-so-tame treatments of [multiple] ‘amorous adventures’ through genuinely ‘erotic works’ to sexually explicit ‘pornographic works’.

In addition, there was a need for ‘tags’ to describe key people or places, whether in factual works such as history or travel literature, or works of creative literature. That said, the people in the keyword system are almost all either real persons or mythical ones from ancient legend. We did not tend to index fictional figures, and we only tended to index the main protagonists of works, trying to limit ourselves to key historical individuals or otherwise to significant persons who appeared in more than one work.

Naturally, such a system did involve implicit hierarchies – most works on ‘medicine’ tended also to be works of ‘science’. Most, but not all. A play satirising doctors, for example, treated ‘medicine’ but was not a work of ‘science’. So rather than the tree system of the Parisian booksellers, our system ended up like a tangled bush, and perhaps one wrapped in varying lengths of string at that. There are common and almost obligatory inter-relationships, but also discontinuities and unexpected connections.

One standard distinction we avoided was that between ‘fiction’ and ‘non-fiction’, which we found difficult to apply in practice at the boundaries. We preferred instead to set aside [creative] ‘literature’, which could thus accommodate historical plays and epics, and incorporates all ‘prose fiction’, ‘poetry’ and ‘drama’, which are its three main sub-branches.

Works of ‘literature’ tended to be described primarily and necessarily in terms of ‘genre’, rather than content. However, our need to establish the existence and type of erotic charge led to some time-consuming excursions into texts suspected of harbouring accounts of amorous encounters....

One of the best ways for users to access and understand the keywords system is to approach it from the ‘Rank’ Keywords function, looking at the top keywords in the table. These are mostly those that are catch-all over-arching terms. By looking at the definitions and related keywords, it is then possible to see which keywords are closely related to these major keywords.

Ascribing keywords

So how did we ascribe categories to almost four thousand works using two different systems? The short answer is, after much work, and to our best of our ability given the time and resources!

In retrospect the categorisation process for books appears the most naïve aspect of our original grant application. It estimated that we would have to categorise 3,300 superbooks (the final total was 3,600), and that 2/3 of them would be categorised using reference works. We thought that a further 1,000 would require a more thorough hour-long physical examination of title page and content-skimming. A rump of 200 would need closer reading. We suggested that a single person sitting in Paris and London would peruse [most of] these 1,200 books in the British Library and Bibliothèque nationale. Even had the books all been there, one suspects this would have been logistically challenging.

Fortunately technology came to the rescue: Gallica and Google books saved our bacon! For by the time that I started on categorising the books, vast numbers of them were consultable on line – many hundreds of them in the very editions traded by the STN. So how was this task accomplished?

For practical purposes, working on the task for almost a whole year, I had approximately 15 minutes per work. In that time I needed to research the work, ascribe keywords, and record any supplementary data or errors found in our bibliographic records during the process. The categorisation process thus had the added function of allowing bibliographic checking.

It became clear very early in the categorisation process that it would require a special data entry tool. There was provision for the work to be done in the original data editor, but the process of identifying and clicking on every key word in turn proved cumbersome. It was laborious, time consuming, required constant vigilance to make sure that all intended keywords were entered. It was also difficult to maintain consistency over time or make retrospective changes that affected multiple titles.

It was therefore decided to create a specialised ‘keyword editor’ with inbuilt computer-moderated systems to help the user identify related keywords. It incorporated a scroll down list of all keywords in the database, and whichever one of them was selected a tick-box data display would appear. For every keyword it offered lists of hierarchically and conceptually related keywords; and a computer generated list of the most popular associated terms. The system also offered a short separate menu listing of all the ideological tags in the database, so that they could be consulted for each work, and facilities for creating new words and calling up all works with a given keyword. This was particularly useful for retrospective tasks, such as identifying works which needed to be ascribed a newly created keyword. The system was designed quickly to meet a specific need, but it was intuitive, efficient and proved more than adequate to the job.

The basic principle behind the categorisation work is that it was undertaken on a ‘best available information’ basis.

The information used in the process might therefore come from a number of sources, notably:

(a) the texts of books themselves, usually consulted in digital form. In most cases a perusal of tables of contents, prefatory material, and introductions was all that was necessary. Only occasionally was deeper reading required.

(b) contemporary review journals offering detailed reviews. These were usually turned up by on-line searches for a particular work.

(c) titles of the works themselves. Sometimes this was our only available data, but to rely on the title alone was clearly a last resort.

(d) keyword categories ascribed in on-line library catalogues. These are frequently untrustworthy, and certainly inconsistent between catalogues. But they were often indicative of what to look for in approaching a text or other sources, sometimes revealing hidden themes.

(e) other details given in on-line library catalogue entries.

(f) on-line antiquarian book dealers and auctioneers catalogues, which often describe works in considerable detail.

(g) miscellaneous bibliographic sources encountered through on line searches. These were particularly helpful for identifying works which had an erotic charge. They were much more problematic for identifying whether or not they lived up to the project’s definition of erotica.

(h) searching out scholarly references to works through google-scholar and google-books, on-line. Biographical sketches of the author or lists of their publications often gave information that was at least indicative.

(i) descriptions of the content of the text in other miscellaneous on-line sources or scholarly works in my own possession.

(j) the pre-existing scholarly expertise of the project participants.

(k) Finally, the categorisation of prose literature and dramatic works in the database has been greatly facilitated by two major scholarly bibliographic resources:

(1) For novelistic works Angus Martin et al., Bibliographie du genre romanesque française, 1751-1800, which lists every novelistic work published by year, and for those published for the first time between 1751 and 1800 a brief, formulaic content breakdown. This lists genre, historical and national setting, names of principal protagonists and certain themes or aspects or stylistic features. Although the authors warn that the system was neither entirely uniform nor satisfactory, it was more consistent than most other bibliographic sources consulted, and much of the information it provided fed directly into our categorisation scheme. Thus wherever the terms ‘sensibilité’ or ‘sentimentale’ are used in the Martin bibliography, we have designated a work a ‘sentimental novel’ (except where the phrase ‘intrigue sentimentale’ is used, in which case we have treated the work as ‘romantic fiction’). The STN database only uses some of Martin et al’s categories, and then sometimes only if they are considered important – eg. if national setting seems only incidental to the plot, it is not recorded in the database; whereas any work designated as being in ‘ lettres’ is always recorded as ‘epistolary fiction’. The work of Martin et al. was also invaluable for helping us to pick up allegorical fiction, historical novels and political or erotic tendencies. Impressively, hardly a novel in our database published between 1751 and 1800 was missing from their pages. The omission rate was only about 1%. For that 1% and novels first published prior to 1751, it was necessary to examine the text or find alternative bibliographic sources. For earlier works this was not usually a problem, as they were generally well known and merited a reprint. For the rare omissions this could be more difficult.

(2) For dramatic works, our standard source was the César database (http://www.cesar.org.uk/cesar2/index.php). This contains data on several thousand French dramatic works written or produced in the seventeenth and eighteenth centuries. It includes a categorisation by form and content. However, this categorisation is derived from the title or title-page of the play itself – there has been no attempt at standardisation. Fortunately, most playwrights of the time conformed to fairly standard conventions when describing their works – most are ‘drames’, tragédies’, ‘comédies’, ‘opéras’, ‘ballets’ and so forth, with a few ‘parades’, ‘vaudevilles’, ‘tragi-comédies’ etc. thrown in. Most of these terms map directly on to equivalent keyword terms in the STN database. A few works also add further useful descriptors, perhaps noting that they are historically based or contain ‘ariettes’ or other songs or musical forms. The César database also gives supplementary information, including whether a play is written in verse or prose, and as a result this information is given as standard for dramatic works in the STN database.

This allowed for a highly standardised approach to these literary forms. It also meant that it was possible to categorise these works in much more detail than would have been possible otherwise given the time at our disposal and the difficulty of consulting some in textual form.

These methods ensured that in the majority of cases, categorisation was conducted by consulting a copy of the work in question or a relevant entry in one of our key bibliographical sources (ie. César or the Bibliographie du genre romanesque). Where neither of these was possible, we consulted as many of the sources / methods described above in (b) to (j) as was feasible, in order to get the most rounded picture of the work possible given the available time and resources.

Remarks on taxonomic systems for future projects...

On what basis should the project’s taxonomic efforts be judged? And are we satisfied with them?

Certainly we are all too aware of their – and our – limitations, and the unevenness of the source material on which they were based. Some key judgments had to be made on incomplete information. In many cases categorisation is dependent on what was ‘detected’ about the text on relatively cursory inspection or secondary reports. Without reading a text cover to cover I cannot be absolutely sure that it qualifies as merely ‘erotic’ rather than ‘pornographic’ under our definition. And even if I do read it cover to cover, unless it is the precise edition the STN traded, how can I be sure that it has not a bowdlerised or rewritten version? To some extent our categories must remain tentative.

It is also clear that the system itself will not appeal to everyone. It may not be adapted to the particular queries they have in mind, to their patterns of thought or the needs of their discipline. Hopefully in such cases, the sheer breadth of keywords available will afford some compensation. And although we have all sorts of tools to help users navigate the system, it is unfamiliar and complex. Certainly it would be difficult to apply it in an exactly analogous way in another project.

But we will venture to suggest that they are at least as rigorously and consistently applied and more richly and accurately descriptive than most existing systems and comparably disparate bibliographic projects. They are also more closely and carefully defined than any other system of which we are aware. And all this is appropriate, since they are a major tool for helping users to navigate and comprehend a rich but largely unfamiliar corpus of texts. Furthermore, they stand alongside other means of navigation: the Parisian system, our markers of illegality; and author queries all provide proxy means of approaching certain groups of texts.

Do we have any advice for other scholars rash enough to attempt something similar? And what would we do differently if we had to do it all again?

Sadly, given our initial vision of how the categorisation work would be achieved, the database and the keyword editor did not have a field for categorisation notes. Nor do they have any means of recording the basis on which each work was categorised. This is a matter for regret. In future projects it would be relatively easy to create a tick box system to record whether the work was consulted, and if so in part or whole, or whether the categorisation was conducted using a range of other source types.

We would also want to look again at some of the areas of weakness in the keywords system. It did not do very well at handling ‘Emotions and Sentiments’ [an over-arching keyword added retrospectively], and it was weak for works aimed at or debating issues around women and gender and social life. That these particular areas are weak points in a system designed by eighteenth-century historians is slightly shameful: we should have known better!

In addition, too many disparate keywords were needed for those travel works which described alien peoples, their cultural and social institutions and activities, governments and beliefs. The all-embracing keyword ‘Travel and Description’ (belatedly noted in some library catalogues) would probably be a better way to handle these cases, and leaving terms like ‘political institutions’, ‘social mores’ etc. for slightly more specialised works. The same applies to certain sorts of household and agricultural reference manuals, which contain miscellaneous information ranging from recipes for medical remedies or dyes to animal nutrition.

Despite the hastily created computer moderation tools, it was also a constant stuggle to maintain full consistency in the application of certain keywords. This was made worse by the fact that the tick-box system for applying the keywords sometimes failed to respond, a risk while working at speed. This was a general if mundane lesson from the project: tick box systems need to ensure that it is hard to ignore ‘click failure’. And where keywords normally exist in fixed relations with one another, a system ought to enter them all at once, with an option for deleting unwanted keyword items.

The solution might be to adopt editable ‘keyword strings’, whereby instead of selecting ‘surgery’ and looking for the related keywords, the system automatically selects ‘Science – Medicine – Surgery’, with the option of deleting one or more steps in the string. This would streamline the work, and make it easier to apply. At some future stage I dream of developing such a ‘universal taxonomy’ for eighteenth-century works.

Illegality

Illegality is a further means of navigating the database, Unfortunately, illegality is an elastic concept. What texts should be considered to be illegal? This is problematic for a variety of reasons, not least because it is individual jurisdictions that define what is outside the law, but our database is pan-European. In other words, illegality is place specific. A book freely available in England might be illegal in France.

Illegality is also time specific – a book needs to fall foul of, or fail to comply with, some measure or edict, in order to be illegal.

But illegality is a broad spectrum, too. An illegal work may nevertheless be an innocuous text. It might be a pirated edition of a permitted text; or alternatively one whose author has not thought it necessary to go through the formal channels to gain a publishing permission. Or conversely, it may be a work deeply offensive to public morals or inciting violent direct action. In France many of the protestant religious works sold by the STN were also technically illegal – including protestant versions of scripture. At times the STN’s clients were very cautious about acquiring them. Yet for the most part they do not show up as illegal by our measures.

Nor was it clear quite where full legality began, particularly in France where there was a system of tacit permissions for works not openly tolerated. As Robert Darnton has noted, even those charged with the policing of the book trade were aware what was legal and what was not. But at the most clearly illegal end of the spectrum there were certain works that were pursued actively by the authorities. Booksellers treated these works as contraband, fearing punishment if they were caught with them. They and their customers called these books livres philosophiques. This is ‘libertine literature’ that Darnton catalogues in his Corpus of Clandestine Literature.

As illegality is relative and not constant, and there is no one definitive list of illegal works, the French Book Trade project prefers to think of illegality in terms of ‘markers’.

We have therefore used our ‘Options’ menus to provide users with a set of lists of illegal works (or ‘Markers of Illegality’), drawn from a variety of jurisdictions. For France we have used Darnton’s Corpus as a shorthand for seriously illegal works. We have also listed illegal works that were found impounded in the Bastille after its seizure by the mob in July 1789. We have included works from Joseph II’s index. In a future upgrade we hope to include works on the Papal Index that the STN agent Durand asterisked in his correspondence as illegal in the various jurisdictions through which he travelled.

At the start of the project it was hoped that we would also be able to indicate which superbooks had publishing permissions in France, using surviving registers. Alas this was not possible due to time and financial constraints, but again it offers possibilities for upgrading the database. However, most of the editions of French works sold by the STN were pirate editions sourced from Switzerland, so even the existence of a permission does not mean a work is legal.

Thus while we cannot provide a comprehensive overview of what was illegal anywhere, we have given users a choice of ways of defining and thinking about illegality. These can be accessed in the database through the ‘Options’ menu by selecting ‘Markers of Illegality’. As in other areas of the database, we have presented users with a range of choices to compensate for ambiguities, biases or uncertainties in the data.

People

People are encountered in the database in several separate places and guises. They appear as ‘Clients’ ‘Authors’, ‘Publishers’ or subject ‘Keywords’. As these are recorded in separate parts of the database, there has been no attempt to link the different places where an individual appears. Hence, although a considerable number of authors (e.g. Brissot, Mercier, Raynal, Rilliet) were also clients of the STN, we have not recorded this data systematically or provided hyperlinks between their entries as authors and clients.

Clients

Defining clients

The word ‘client’ as used in the STN database has a meaning slightly distinct from normal English usage.

The STN database uses the word ‘client’ to describe individuals or corporate bodies (1) who corresponded with the STN and/or left related documents in the STN archive; (2) to whom there are letters in the STN’s surviving out-letter books; (3) or who are recorded in the STN’s accounting records as playing a role in the transfer of books to and from the STN, whether as an end recipient, agent or middleman.

Because clients could be as parts of corporate bodies (usually but not always business partnerships), and because individuals could move between several such bodies over time, it has also been necessary to distinguish between ‘clients’ and the ‘named persons’ who composed them. While in most cases the ‘client’ and ‘named person’ are identical and stable, there are ‘named persons’ who appear in several guises, or corporate bodies which evolve. For example, the printer, economist and future counter-revolutionary François d’Ivernois appears in his own right as the client ‘François d’Ivernois’ and as a ‘named person’ in the partnership ‘Boin, d’Ivernois and Bassompierre’ and its successor ‘Boin and d’Ivernois’. In statistical search queries, each of these three incarnations of François d’Ivernois returns distinct and separate results.

Data sources for clients

Data for the majority of clients is taken from the BPUN’s MS 1000A, a typed handlist of ‘Société typographique correspondants: répértoire géographique’. This is now on line ( http://bpun.unine.ch/pdf/BPUN_typo_correspondants_repertoire_geo.pdf). This document lists 2,290 correspondents and their professions by place, noting the number of letters in their individual files in the STN archives. This data was then cross referenced with the BPUN’s card index of correspondents, to gain supplementary information. This included the archival manuscript references for clients’ correspondence. Where the card index information contradicted the handlist, the card index data was generally preferred. The card index is also more comprehensive, providing 96 additional clients whose names do not appear in MS 1000A, either because they were later additions or because their place of residence was not known. To this list were added 409 clients whose names appear in the indexes of the STN’s surviving volumes of out-letters. (NB in one case we have the index but not the letters themselves). The names of a further 173 clients who ‘played a role in the transfer of books’ were taken from the STN’s accounting records. As a result the STN database both offers the most complete (yet still incomplete) list of STN ‘clients’ and serves as a complete index to the STN’s in and out correspondence. Hitherto, no such unitary list had existed.

Types of clients

It follows from this discussion that there are several further conceptual distinctions to make between ‘clients’.

First we must distinguish between ‘correspondents’ (those who made written contact with the STN of which there are traces in the STN archive) and non-corresponding clients. The latter group may include individuals who sole relationship with the STN was that the Neuchâtel publishers attempted to contact them by letter.

By contrast, the former group (comprising all those listed from the ‘Répértoire géographique’ or card index) can be said to have initiated some interaction, however brief, with the STN.

Those with an ‘active’ relationship with the STN also included the clients whose names were taken from the accounting records.

It should also be noted that many ‘clients’ did not trade in books with the STN. Some were their suppliers of raw materials (candlemakers, papermakers, type-founders etc.); others were their political contacts, lawyers or debt-collectors; and yet others were actual employees of the STN.

Finally, we should note the existence of a special corporate client, composed of all those customers who bought books anonymously over the STN’s trade counter. This composite character bought over 4,000 books for hard cash. ‘His’ purchases, more often than not in single copies, provide a valuable insight into the reading habits of the Neuchâtelois.

Fortunately users can distinguish between these generic types of clients if their analysis requires it. An Options menu [q.v.] allows us to distinguish between clients according to the ‘client data source’, thus allowing users to exclude corresponding or non-corresponding clients from our data searches.

A further ‘Option’ allows users to limit their searches to certain sorts of ‘client’ according to their volume of trade and relationship with the book trade. This is a means of compensating for certain forms of bias within the data. We have provided more precise information on this ‘Option’ and its uses under the ‘Client type’ rubric under the drop down ‘Options’ menu.

Placing clients

Wherever possible, individual clients are linked to a place of residence, as recorded in MS1000A, the BPUN card index, or STN accounting records. This is the default place for all trades involving that client.

On rare occasions the accounting records indicate that an order was to be sent to, or received from, a place other than that client’s usual place of residence. For such ‘events’, that data has been recorded in preference to the client’s normal place of residence. In all other cases, the client is assumed to be static in their default location.

This is clearly a necessary fiction – giving rise to one of the uncertainties in the database. Clearly a few individuals in the database moved around significantly. One such person is the philosophe and revolutionary Jacques-Pierre Brissot, who spent time in Britain and America as well as in his location as recorded in the database (Paris). Although in some cases it would theoretically be possible to isolate these individuals and use their correspondence to trace their movements, this has not been attempted, not least because the data would be inconsistent. Nor does their absence from their normal home mean that their orders were necessarily sent to these clients at their temporary homes. Thus unless our data indicates otherwise, all book trading involving those clients is treated as if it were at their normal place of residence.

A further issue stemming from this is the assumption that clients ‘received’ books where they were ‘consumed’. In fact, many larger clients may have been entrepôt dealers who on-sold works to other locations. Indeed, a handful of clients are known to have done this on a fairly systematic basis: for example, Pierre Gosse in The Hague forwarded works on to his trading partner Boissière in London; while Malherbe in Loudun supplied a whole band of travelling colporteurs. We have provided users with a tool for excluding such clients from their analyses: the ‘Client type’, ‘Option menu’. This also allows them to exclude other types of clients who might be thought to distort the figures, such as clients who commissioned whole editions and had them sent to just one or two locations.

Professional data

The STN should be seen as a business archive as much as a publisher’s archive. It contains correspondence to and from clients based across Europe in a very large range of professions, from papermakers to publicans, pedagogues to politicians. Thus it offers a point of entry for studying many industries, not just the book trade, and it was important to us that the database should facilitate such studies by its handling of professional data.

Information on clients’ professions is usually taken from BPUN sources. These are secondary sources in the sense that the professional designation was recorded by the various compilers of the card index and MS1000A (although material from the out-letters indices is contemporary). However, the MS1000A and card index evidence was clearly derived in most cases from the correspondence files of clients (from address lines and internal evidence of their letters). It thus usually respects how clients described themselves. It should be noted that this means some clients have multiple professions.

There are exceptions to these observations. In many cases there is also evidence in the STN’s ‘day books’ and ‘order books’, where the clients business is often listed. Where other evidence is lacking, this has been added. Equally, for clients where no profession is given in the STN sources, it has been considered legitimate to add a profession when this is discovered and unambiguous (eg. by the discovery of a printed book catalogue showing that they were a bookseller).

Thus the professional designations are somewhat arbitrary. There are, for example, a few ‘merchants’ who were clearly significant bookdealers. A case in point is Veuve Joly at Avignon. In partnership with her son as ‘Veuve Joly et fils’, she is described in our data as merchant. But she was in fact also one of Avignon’s handful of licenced printers and did a considerable trade with, among others, the STN. We have however respected her designation as a Merchant, and it is possible that she and her son saw this as their main occupation.

Profession Groups and Economic Sectors

To aid analysis and get around the arbitrary nature of some professional designations, (eg. the stampatore granducale in Florence is, of course, a printer), clients have been grouped into larger groupings. These are ‘professional groups’ defined in relation to their professional function, and ‘economic sectors’ defined according to the industry in which they were employed. Although the two were often related (e.g. those whose profession group is ‘Printer’ were invariably in the ‘Book Trade’), in other cases they were not. Hence a ‘Commis libraire’ (bookseller’s clerk) has for his profession group ‘Secretary’ but his economic sector is the ‘Book Trade’.

Both the ‘Profession Group’ and ‘Economic Sector’ designations are derived data. They are categories invented by the project and applied according to set rules as a tool to aid understanding.

Authors and authorial roles

Handling authorship also proved a challenge. From a literary point of view, defining authorship is often problematic. The question of who participates in the creation of a text is open to debate – on a theoretical level, some would doubt that any text should be attributed to a single author. Multiple roles and processes are involved before a word is written and many more before it finds its way into print. Writers, secretaries and transcribers, proof-readers, editors, type-setters, publishers, translators, commentators can all be involved in the production process. We wished to capture some of that complexity in the database, not least so users can be clear as possible what is meant by authorship in any given case.

In database terms, we have treated as an author anyone who was, according to our bibliographic sources, significantly involved in the creation of a piece of text that is included in a work. This ‘involvement’ might include many possible roles, but in effect we have reduced them to four. The primary author is the sole or main writer of a piece. A secondary author is defined as someone who, while not the primary author, played a significant role in creating the text of a published work. They might be a contributor or collaborator in the process, a person who adds some extra material, or someone whose work is included in an anthology or collection of essays. Then we have editors of texts and finally translators.

As many authors in the database played more than one of these roles, it may be felt important to distinguish between them. Thus when using certain search queries we determined that it should be possible to restrict the search to just one or several of these categories of authorship.

Generally author attributions – including the attribution of anonymous texts – have been taken from a combination of STN sources, library catalogues and bibliographic works. Cases of ambiguity were dealt with by further research and / or by noting alternative possibilities in the attribution notes. In a few cases up to date research has superceded earlier attributions: the attribution of the Gazette noire to Anne-Gédeon La Fite de Pelleport after the work of Simon Burrows and Robert Darnton would be a case in point.

Publishers

Publishers present a rather different set of challenges. ‘Publisher’ is an ambiguous term in the STN database. It exists as a distinct profession (‘éditeur’) and Profession group (publishers) in our client data. The trade of such people with the STN can be explored in some detail, both as individuals or part of a group.

But it also exists as part of the bibliographic data on many editions in the database. Sadly, it is not possible using the interface to ‘query’, ‘browse’, ‘map’, ‘rank’, ‘compare’ the outputs of individual publishers. This was a conscious decision, not least because (in contrast to our author data) the publisher data was too messy to collate – the same publishing house is often described in several different variants, or the same term in multiple places (eg. ‘Société typographique’; ‘chez l’auteur’). We also wanted to keep our menus options limited, and it is possible to ‘Map’, ‘Rank’ and ‘Compare’ by publication places. It is not possible to ‘Query’ books by publication place, however, as the ‘Query’ menu operates on books (ie. superbooks), whereas publishers apply to individual editions.

Users wishing to explore individual publishers outputs therefore have the option of entering their name (and variants) in the ‘Advanced edition search’ function’s publisher box, and exploring editions individually. They could also run their own SQL searches using the downloadable, customisable SQL version of the database.

Time and Events

The project covers the period from when the STN first traded (1769), to when they made their last known sales in 1794. The smallest calibration of time used in the database is the individual day.

Events

When describing time in a database such as this, one is presented with a number of problems. In particular, were we interested in individual moments or events of longer duration. In general the French book trade project was interested in single events that occurred on a single known day. And because we dealt with accounting records, the date given is generally that under which an event is accounted for in the STN’s various ledgers. This means that users can set the parameters of their search queries by single day increments. That said, the date of a particular event may be, on some level, an accounting fiction.

It should also be noted here that ‘Events’ are the building blocks of the STN database, and each of them can be identified in time and space. At present most ‘events’ are book trade transactions by the STN and its clients, though we also have ‘events’ such as printings and stocktakes. In future research projects, the range of such events might be widened to take in other book related activities which can be geospatially and chronologically located – eg. the inclusion of books in a catalogue; the seizure of books at customs, etc..

Occurrences with duration

Our dates are approximations. Events were not always recorded on and attributed to the precise day they occurred – there is occasional evidence of significant time lags. Equally books took a long while to reach distant destinations, so date of receipt (always assuming stock arrived) could be long after date of despatch. The database does not give us journey times – they took place over durations much longer than a day.

Another type of occurrence in the database also has a measurable duration longer than a day. That is the relationship between a correspondent and the STN as measured by the time elapsed between their first and last letter in the archive. This opens the possibility of mapping how the STN’s client networks changed over time. Ironically the very first data analysis attempted using the database, long before we had the interface, did just that. It painstakingly graphed the number of Darnton’s sample of booksellers in correspondence with the STN across each year of its existence using a spreadsheet. Unfortunately, because our interface design has prioritised the mapping of books, that sort of analysis still needs to be done via the SQL version of the database. But because first and last dates of correspondence are recorded separately, there is no reason why we could not develop that functionality in the interface in a later upgrade.

Trends

The data interface does, however, have the capacity to deal with trends over time. Indeed, these are an important part of the historical dimension of the project. The ‘Compare’ function is designed to let us see how sales (etc.) changed, whether in terms of absolute numbers (raw numbers) or relatively and proportionately (percentage of total [of annual sales]). Allowing us to compare a wide range of different aspects of the database, it is a powerful interpretative tool.

Date of publication

Dates of publication given in bibliographic data in the database are taken directly from book imprints or bibliographic (usually library) records. The STN database has shown conclusively that booksellers tended to publish books carrying the following year’s date from about September or October of the previous year, in much the way that modern magazines start publishing ‘Summer special editions’ in early spring. Real dates of publication are recorded as well as stated year in cases where we know them to have been different.

It is possible to use the data interface to do some analysis by date of publication. Under the ‘Advanced edition search’ function there is an option to search by year of publication. This will return all books carrying a particular year. Further analysis is of course possible using SQL searches in the downloadable version of the database.

It should be noted here that the ‘events’ included in the database include ‘assumed printings’. This is derived data given wherever we have strong evidence of approximate print runs and the approximate date of printing. Printing was a process that took time, of course, so the date approximates to when a given printing was finished. ‘Assumed printings’ are the only ‘events’ in the database that are derived rather than documented. We nevertheless thought it useful to include them.

Space and Mapping

Maps are a familiar means for visualising and interpreting data: but we need to remember that they contain simplifications and distortions that can mislead. In trying to map the eighteenth century, we are in addition confronted with a number of issues which do not apply when creating political maps of twenty-first century Europe. For political and geographic space in ancien régime (and early revolutionary) Europe was more complex, multi-layered and ambiguous than today. This called for project specific solutions when representing space.

Towns and Regions (Points and Polygons)

The basic spatial location in the STN database is the ‘town’, a defined, named and nuclear community that can be pinpointed in geographic space. The database records the coordinates of 516 towns in all, each of which is (by definition) home to at least one client, though books were not sent to or through every town in the database. The majority of these towns, particularly once we move outside the STN’s regional hinterland, were sizeable urban centres.

All books sales and purchases by the STN are associated with a particular town, where the supplier or recipient of those books resided. Thus all the STN’s sales and purchases are mapped on to a fixed point or ‘town’.

The volume of trade with an individual town is represented on our maps by the size of the scaled dot by which it is represented. The trade of any given region (the irregular polygons within borders) is represented by the colour in which it appears on the map. But the colouring is derived from adding together the aggregate of the trade of the towns within the region (or polygon). All a region’s trade is linked back to the individual ‘towns’.

This perhaps approximates well to reality. Towns were not only centres of the book trade in this period, but they were also home to the most literate sectors of the population. In so far as the polygons represent political-economic urban hinterlands, it may be broadly appropriate to associate them too with the trade.

But what regions should the polygons represent? The question hinges on how we should define political space.

Defining Geopolitical Space:

(1) Representing the ancien régime

The ancien régime had a dazzling array of overlapping and conflicting jurisdictions, both within and between states. Supernational entities such as Papal and ecclesiastical territories and the Holy Roman Empire competed with modern unitary states and dispersed international empires such as the Habsburg territories or Prussia, which ruled over Neuchâtel. Even apparently unitary states such as France had territorial exclaves without and enclaves within them, as well as different and competing sub-units.

To try to represent this complex political reality, and allow multiple levels of geographic analysis, we have used two levels of political territory. The STN database attributes ‘towns’ to an eighteenth-century sovereign territory and also a lower territory (province). What constituted the province varied from state to state, and in some cases there was a choice of sub-unit. In France, among several choices, we chose the généralités, because they were the authorities through which the book trade was policed. In Spain we used the Intendancies, in Switzerland the Cantons, in Britain the various kingdoms that comprised the UK, and so on.

There are some further refinements to this geographic model of the old regime, necessitated on the one hand by states that spanned several non-contiguous territories, and on the other by those too small to be subdivided visually. The multi-territory units are particularly problematic for our purposes, especially for mapping, because it would be misleading for Prussia, Cleves and Neuchâtel to all be coloured as a single unit. This would give the visual impression that 15% of the STN’s entire trade was heading off to Brandenburg and East Prussia, when in reality it was staying in Neuchâtel. Thus geographically separated Prussian territories are treated as the separate ‘sovereign territories’ of Prussia (that is to say Brandenburg and the other east German territories), 'Prussia: Cleves' and 'Prussia: Neuchâtel'. Austria is similarly sub-divided as are the Papal territories and Wurtemberg.

Conversely, in some cases, (eg. city or small states), the sovereign and lower territories can be one and the same.

Nevertheless, it is clear that some users will want to interrogate dispersed or non-national political units as single entities. For this reason, in addition to sovereign territories and lower territories, we have recorded whether Towns formed part of certain larger units, or shared particular common features. These alternative groupings are called ‘Other place groupings’ and can be found and interrogated under the Browse menu. This allows users to look at the Holy Roman Empire en masse, or Catholic Ecclesiastical lands, as well as to investigate the STN’s trade with the totality of the territories of Prussia, Wurtemberg (including Montbeliard), Austria or the Papacy (including Avignon). For good measure we have also created place groupings for the Imperial Free Cities, European University towns and the seats of the French parlements.

(2) Modern geopolitical space

For purposes of analysis, we have also introduced three other ways of subdividing geographic space into regions. We thus included maps of 21^st century countries and EU administrative regions. These were initially included for two reasons. One was pragmatic – we had a limited budget for GIS work and feared we might not be able to create the C18 maps we needed and knew we could get modern ones. The other was for the benefit of users who preferred to investigate space in modern day terms – perhaps because they were investigating a modern national or regional history.

(3) Geopolitical zones

Finally, we made our own map divided into large geo-political zones, as an aid to analysis on a macro scale. We began by grouping all the territories that comprise modern Switzerland and that contemporaries also thought of as Swiss (thus including territories outside the confederation, such as Neuchâtel, Geneva and Basle bishopric) into a unitary Helvetic zone. All territories inside the main frontiers of Bourbon France became the French zone, including foreign ruled enclaves such as Avignon. Portugal and Spain were then united into an Iberian zone, Britain became the British zone; Dutch and Belgian zones were created analogous to the modern states of Belgium and the Netherlands (the former annexing Bouillon and Liège in the process). The Scandinavian states were united into one; Poland and Russia were merged into an Eastern European zone; the Austrian empire proper together with the Archbishopric of Salzburg formed our Austrian zone; while a premature risorgimento united the Italian states (including Savoy) into an Italian zone. This left a rump of territories that we labelled the German zone: it included a number of French exclaves. As each zone took over 2,000 books, and all but two over 5,000, we felt that these ‘zones’ were a potentially helpful way to look at larger data sets.

Further Mapping Issues

Border Changes:

The borders used in our eighteenth-century maps were potentially problematic. Those we have chosen were those that applied following the 1772 partition of Poland. Fortunately they remained largely unchanged throughout the heyday of the STN.

This was a major stroke of luck major changes only occurred at the start and end of this period. There were substantial border changes in Poland at the first partition in 1772 at the start of our period and again with the second partition in 1793. French revolutionary annexations and conquests also threatened to cause us problems from 1790 onwards. There were also occasional minor internal changes to provincial borders, including some to the généralités in South-Western France. And there was a potential conceptual issue with the French occupation of Avignon in the early 1770s.

But overall, the project proved very lucky in this respect. The major change of the pre-revolutionary period – the partition of Poland – did not affect the political location of any STN clients. Equally, the border changes and French revolutionary annexations of the period 1792-1794 do not affect our statistics, as the STN did not send books into the annexed regions or war zones. Clearly they were a cautious business.

Nor we decided did the occupation of Avignon matter in a substantive sense. Anyone wishing to look at Avignon specifically could decide how to deal with the issue. Our place notes on Avignon note that it was occupied by the French from 1768-1774 and annexed to France in 1791. We have had to make similar compromises in other cases – towns ruled jointly by more than one state, for example, of which there are several in the database. They have been handled on a case by case basis. None influence our overall statistics significantly. For example, under the French occupation of 1768-1774, Avignon only received a single consignment of books from the STN; after the revolutionary annexation there was no further trade.

Other map issues

There remain two other map issues that should be noted. First there are two non-contiguous (split) sovereign territories on the map – ie. states that are made up of two non-adjacent polygons. Both are far flung parts of the Austrian empire and appear on the C18 sovereign territories maps. They are the Austrian Netherlands and the Austrian Italian territories, both of which comprise two polygons. Since sales to both regions were generally low, we chose to keep them both as single units, since the Austrian situation was complex enough already.

Second, two French exclaves in Germany (Landau and Saarlouis) are almost invisible due to their tiny size – they can be seen by hovering over the map or clicking to enlarge it. As they belong respectively to the lower territories of Strasbourg and Metz it is perhaps as well that they are obscured, because otherwise they would appear on many maps as islands with heavy sales, coloured the same as France, Strasbourg or Metz.

Elsewhere we have dealt with some of the smaller territories by enlarging their borders on the map. This has been done to Geneva, for example. In the German Rhineland, which was in the eighteenth century a patchwork of hundreds of tiny sovereign territories, we have also adjusted borders. We have, for example, joined all the disparate territories of Baden into one contiguous block. As some of the polygon names imply, we also merged small states in areas where there are no German STN sales or purchases. We also meddled with borders between C18 territories a little in North Italy and Belgium, to improve the readability and interpretation of our maps. Naturally, no borders were shifted across towns in the database.

A Note on Software

All software used in the FBTEE project has been non-proprietorial. This was a deliberate decision, allowing us to distribute our work free of charge and by open access means. All the data collected by this project is contained in an opensource MySQL database (relational database management system). Information is displayed online with the help of phpMyAdmin, another opensource tool. All the visualisations and maps are displayed either with protovis or d3.js. These javascript libraries are free and open-source according to the BSD licence. Maps were adapted from Natural Earth files released in the public domain.

Simon Burrows