Overview

Why another product to explore and analyze data?
Our belief is that if something can be done in Excel, it will be done in Excel. Most people know Excel -- and if they don't, they could probably pick it up in a few days. Yet most non-trivial questions about data quickly run into its limitations, such as:

  • There is too much data ("big data")
  • The data is changing ("live data")
  • The data is not available or convenient to have locally ("remote data")
  • The data needs to be worked on collaboratively ("shared data")
Datadocs aims to solve these challenges: to enable individuals to work with their data in an Excel-like fashion, while also allowing more sophisticated analysis by the more technically-inclined.

Quick Start

Let's show a quick example to demonstrate the most basic usage of the application.

Here I have added a file into the interface, and it has been automatically ingested into a new "datadoc". I then do a quick filter and search on that data. Notice that I have not had to manually build the filters — they are automatically populated in the left panel — or specify which field(s) I want to search across.

Datadocs

A datadoc can be defined as a spreadsheet with one more tabs, with each tab being connected to a data source. In the below example, we have a datadoc named "Investments", which contains two tabs at the bottom of the page, "products" and "offers".

Quickstart example

Creating a datadoc

There are two ways to create a datadoc. The first way is by dragging and dropping a file onto the Home screen.

Add files example

A new datadoc will be created for each valid file that is added in, and will have the same name as the file that created it. Additionally, if the file added is a spreadsheet containing multiple tabs, all those tabs will be imported.

The second way to create a datadoc is by clicking the CREATE button in the top-left of the Home screen. Doing so will generate a new datadoc and prompt you to choose which source to connect the tab to.

You can also add any number of new tabs to an existing spreadsheet by clicking the "+" button at the bottom of any datadoc, next to the current tabs.

Filters

When a datadoc is first saved, a Filters Panel will be created based on what we think may be the most relevant fields for filtering. Your filters panel may be modified or even hidden depending on your preferences.

Here is an example of using the filters panel:

As shown in the example above, there are three different types of filters:

  • Datepicker - default for dates
  • Slider - default for numbers
  • Checkbox - default for text

The checkbox filter type may also be used for dates or numbers as well. The option to change between these filter types is in the options section in the upper-right of each filter section. For text fields, the user can choose between OR-ing or ANDing multiple selections together. By default, multiple selections will be OR-ed.

Additionally, a user can filter to a value directly by right-clicking on a cell (in the main data section) and selecting "Filter to value".

Quick-sort & filter

For each field, a dropdown context menu is available for sorting and filtering the data within in that column.

Add files example

Note that the number of filter values will be limited to 1000. However, you can find any filter value by searching for it within the context menu.

Datadocs allows searching across all text and number fields. There are four different search matches that Datadocs supports:

  • Off - this field will not be searched
  • Exact match - this field will only match if an exact match. This is the default setting for number fields.
  • Edge search - searching for "book" would match bookworm. This is the default setting for text fields.
  • Full inner search - searching for "book" would match mybookbookreport.mov.

While the default search settings should be sufficient in most cases, these settings can be modified in the settings gear of the search bar.

Add files example

Depending on the changes that are made to these settings, the data may need to be re-saved. If so, an estimated time to re-save the document will be shown to you. Additionally, all search settings may be changed on the Source page.

Show me

The Show-me is the control panel of the datadoc: it tells the datadoc what to display and how to display it. In other words, it generates the query that gathers the data display in the datadocs grid. There are two "modes" in the Show-me section. The first mode is the normal or "flat" mode, in which a user can build a query as they would in a spreadsheet or SQL application:

Add files example

The second mode is the "pivot" mode, where someone can create a pivot table off of their data as they would in Excel:

Add files example

The mode can be changed by toggling the PIVOT MODE button on the right side of the Show me bar. The best way to understand how the show me section works is to test it out with Auto-query on (its default selection). Here are a few notes to keep in mind when using the Show me:

  • Aggregated results are click-into-able. That is, if you are using the Pivot Mode, or have a "Broken Down By" in your query, you can double-click any of the cell values to see the raw data that comprises that value.
  • For complicated aggregations or pivots try turning auto-query off. If auto-query is selected, it may take longer to build out multiple levels of aggregation. A good rule of thumb is if the running of intermediary queries is significantly slowing you down, turn auto-query off.
  • Pivots and Aggregations may have hard limits on the number of rows or columns they return. Depending on the number of aggregations applied, the result set will have various limits. This is similar to how the number of rows are limited with a normal query, and you have to "scroll down" to view more rows. On a pivot, both the number of rows and the number of columns may be limited.

Additionally, for the more technical user, there are plans to support direct SQL querying of the datadoc in the near future.

Formatting

Number and date fields can be formatted in a variety of ways. For example, sometimes it is useful to view the number 2014 as 2014 if it is referring to a year; other times it makes more sense to format it as $2,014.00 if a dollar amount.

To change the format of a field, select that field(s) and choose the format button. This will give you the available formats that the selected field(s) supports:

Add files example

Depending on the type of field it is, not all formatting options will be available. In the case where the formatting option that you want to use is greyed out, it probably means that the field is of the wrong type, and you will need to click the link at the bottom of the dropdown shown above that says "Click to Edit Source" to change the data type.

Exports

All your data can be easily downloaded as a csv or excel file by clicking the above the datadoc. If you choose to download your data as an excel file, you can save the current tab or all tabs in the datadoc. Note that Excel exports are currently limited the first 1M rows, and csv exports will only export the first 4M rows (free user limitations may also apply). Additionally, only one export may be in progress at a time per user.

Saved views

The current view of your datadoc can be saved by clicking the button.
Saved views allow you to return to a particular view of the datadoc, or to share that view with other collaborators. For example, suppose you want to view various football stats to build a Fantasy Football team. One saved view may be the top five performing players at each position; another view may be all available players.

Here is an example of using Saved Views:

Saved views are shared across all users that can access a datadoc.

Sharing

A datadoc may be shared directly with other users, or a link may be created that will allow others to access it. To share a datadoc, click the SHARE button in the top-right of the page.

Add files example

Sharing occurs at the datadoc level and not at the tab level. In other words, if you create a datadoc with five tabs and share it with someone, they will be able to view all five tabs.

A user with whom a datadoc is shared may have one of two roles:

  • Viewer - can manipulate the datadoc but cannot alter or view the underlying data, cannot create new tabs.
  • Admin - has access to the Source Settings page, and can modify data from that page
By default, a user is added as a Viewer.

Sharing a datadoc will not share the underlying sources connected to that datadoc. If you would like to share both a datadoc and its underlying sources you will either need to do so separately, or have both of them in a Shared Folder, which can be made on the Home screen.

A datadoc may only be shared if it has one or more ingested tabs. That means, for example, that a datadoc cannot be shared until the first sheet has finished ingesting. Addititionally, a newly created tab will not be shared with others viewing the datadoc that tab's data has been ingested.

Source settings

To view your source settings or make changes to any aspect of your data, you can click the Source button in the top right of the page:

Add files example

This will provide you with basic information on your source, the ability to refresh your data if it's a live source, and a link to your source settings page, where any modifications to the data would be made.

The Source settings page is covered in more detail in the Data section below.

Additional settings

The following settings can be accessed by clicking the button. The settings covered here apply to the specific tab of the datadoc, and not the entire datadoc.

Auto-query

When auto-query is turned off, you must manually click to execute any queries. By default, auto-query is turned on. However, it may be useful to turn it off when the data source is large and you do not want to query the data until you know your query is correct. An example of this would be if your data consists of company sales, and you want to add a filter to limit the results to Country=US, and limit the sales again to Product=Hardware and then sort it to show the most expensive products first. If auto-query were enabled for this, it would execute three queries (one for each change) rather than just one time at the end.

Drill-into fields

The drill-into fields is a setting that is relevant when working with aggregated data. It allows you to define which fields will be displayed, in what order they will be display, and how they will be sorted when you view the results behind any aggregated calculation. For example, let's say you are viewing the count of children per household. The Smith's household has 3 children, and when you click-into that result, you want to see the first name and age of each child, ordered by oldest to youngest. This would be defined in the Drill-into fields.

Timezone conversion

Sometimes the timezone of the data source will be different than the current timezone that you are in. For example, a database may be in UTC time but you want to view the times in PST times, to more closely understand and purchase habits at your retail stores in California. By selecting the option Convert times to [my timezone], these times will be converted accordingly. The timezone of a source can be changed on the Source Settings page, and your own timezone can be changed on your User Settings page. Note that this option is only applicable when viewing a value in the datetime format, it will not modify values in the date format (that is, "Jan 1, 2015" will still be that same date regardless of whether timezone conversion is applied).

Adjusting field headings

By default, field headings are parsed exactly as-is when importing data. However, you can adjust your field headings to one of the following to help with readability or other considerations:

  • Off - column headings are brought in as-is
  • Human-readable - a field such as is_for_sale would be shown as Is For Sale.
  • Database-friendly - a field such as RAISED AMT would be shown as raised_amt.

Limits and limitations

While data limits are discussed in the Data section, here we will note some of the limitations that datadocs have:

  • Number of columns - 1000
  • Number of rows - no limit, however each browser will have its own limitation in terms of how many data it can load.
  • Number of tabs in a datadoc - 100 tabs

Additionally, here are some other limits that datadocs currently has in terms of the interface:

  • Columns can be dragged-and-dropped to reorder them. Currently, rows cannot.
  • Rows or columns cannot be "hidden" like they can be in Excel.
  • The data in a cell is not editable. However, column headings can be edited.
We look to support these features in the near future.

Data

An active datadoc must be connected to one or more data sources. This means that a source must exist in Datadocs before a real datadoc can be created. A datadoc can be connected to either a static source, such as a file, or a live source, such as a database.

Adding a source

We will break down the adding of sources into two sections: Adding File sources and Adding Live sources. Each are handled a bit differently.

Adding files

A file may be added in two ways. The first way is by clicking the "+" in the button, and selecting "Add Source". From here you can add your file in the Add Source modal:

Add files example

The second way is to drag and drop your files onto the home screen, and they will be automatically uploaded and ingested, for example:

When a file is added into the interface, there are three steps from when the file is uploaded to when it is ingested and viewable as a datadoc:

  1. Upload - if the file has a valid extension, it will be uploaded to storage. Currently acceptable extensions are .xls, xlsx, csv, tsv, json, xml, avro, and an empty extension. Folders and file archives are not currently accepted.
  2. Validation - the contents of the file are validated and the data is interpreted.
  3. Datadoc creation - if the validation proceeds, the data will be automatically saved to a new datadoc.

Adding DB sources

A database source is different from a file source in two main ways: (1) it needs to be connected to; and (2) it needs to be told what data to return, such as by selecting a table or writing a query.

To connect to a database, you will need to click the "+" in the button, and choose "Add Source". From here, you can choose which source you want to connect to enter in the source's credentials:

Add files example

To connect to a database, you will need to click the "+" in the button, and choose "Add Source". From here, you can choose which source you want to connect to enter in the source's credentials:

Once a database has been connected to, you can either choose a table or write a query to define the data connected to a tab in a datadoc.

Add files example

Note that you each tab in a datadoc is connected to its own source, so a datadoc can be connected to multiple databases, tables, or queries.

Source settings

The source settings page is used to manage a data source. It can be accessed by either double-clicking a file/database on the Home screen, or it can be accessed by clicking the "Source settings" button on a Datadoc. Let us review the main components of this page shown in the screenshow below:

Add files example

1. Attached source — If the current sheet in your datadoc is attached to a source, this will display what source it is connected it. If a source has not yet been selected, this will allow you to choose one. You may also click this to change your source at any time.

Add files example

2. Source settings — Settings that are applicable to the selected source will be shown here.
Add files example The possible source options include:

Fields to display — allowing you to choose which columns you want to import and in what order. By default, all columns will be imported. Note that columns that are not imported here may not be queried in your datadoc.

Start on row / Header in first row / Skip after header — these settings apply to various file types, such as excel and csv. Let us look at an example to illustrate. Suppose we have the following data in excel:
1 This data was gathered on Sept 14, 2014 by Reuters.
2 Name Age
3 # first and last name # rounded decimal
4 Tom Jones 25.4
5 Sarah Haley 46.2
In the above example, we would would to ignore the first and third row, as it's not the actual data. In this case, we would set Start on row=2, Headers in first row=Yes, and Skip row after header=1. While the above is a trivial example, it is very common to see excel and csv files that are not as clean as we would like them to be. Datadocs infers these values when ingesting the data, however they will not always be perfect, and these values will sometimes need to be manually changed.

Delimiter / Quote / Escape — these settings are used to determine how to correctly parse a csv file, and are not applicable to other sources. For example, if we had the following line of data in a file, "first"|"last"|"age", we would set the Delimiter=| and Quote=". Again, these settings are inferred when a file is uploaded, but may need to be manually altered at times.

Source timezone — this tells us if the source is associated to a specific timezone. By default, this is set to UTC. This is useful when a user's timezone is different from the source timezone and they want to view times in their local timezone.

Ingest error mode — this determines how "errors" will be handled when saving data to a datadoc. Let's go through an example to demonstrate. Suppose we have the following data:
1 Name Date of Birth
2 Martin Luther King, Jr. January 15, 1929
3 Mahatma Gandhi October 2, 1869
4 Jesus Christ [exact date unknown]
In the above example, we can see that Martin Luther King, Jr. and Mahatma Gandhi have exactly dates of birth, but Jesus Christ does not, and so we cannot import that value as a proper date. The error mode would tell us how we should deal with that piece of data. It has three options:

  • Set value to null - the offending value will be brought in as null. This is the default.
  • Skip row - the row of the offending value will be skipped
  • Stop ingest - the datadoc will stop ingest.
In most cases, such as the above, the option "Set value to null" is sufficient. If any errors are detected in the preview data, a warning icon will show in that field alerting you to the issue. Note that Datadocs, like a database, only allows one type per column (unlike Excel or Google Sheets, where any cell can have any type). For a more in-depth discussion on types, view the section on Data Types.

Refresh settings — this setting is only applicable to database sources, and specifies how frequently the data should be refreshed. The data can also be refreshed "on demand" on the datadocs page by clicking the Sources > "Refresh Now" button. Note that Datadocs currently only supports "full" refreshes. Incremental refreshes will be coming in the near future


3. Tables & Saved Views — If the attached source is a database, the left panel will display:

  • Tables — A list of all tables in the current database.
  • Saved Views - queries that have been written for any other datadocs. This provides easy access if you ever want to re-use or improve upon existing queries.

4. Query editor — If the attached source is a database, a query editor will show above the main data results, allowing the user to write and preview any query against the existing database. Note that all queries will have a limit of 1000 results applied, even if an explicit limit isn't specified in the query.


5. Data Preview — This section will preview up to the first 1000 rows of data in the selected source. Unlike the main datadocs view, these results will not be sortable or filterable, but are provided so that you can confirm your data "looks correct" before saving. Each field will you to perform the following four actions on it:

  • Rename the column
  • Remove the column
  • Change Type of the column
  • Change Search of the column
For more information on the different data types for a field, please view the Data Type section. For more information on search types for a field, please view the Datadocs Search section.


6. Save Source — To save your data to the datadoc, click the button. Once this is clicked, a percentage bar will show the progress of the Save. You do not need to stay on the Data Source page while a datadoc is saving, you may navigate elsewhere and it will continue ingesting. If you have made modifications to an existing datadoc, a button will also appear, allowing you to un-do any changes you've made that are different than the source settings of the currently-ingested datadocs sheet.

Sharing sources

Like a datadoc, a source may be shared with another user via a link or directly with another user. This may done on the Home screen by clicking the "Share with" link on the info panel on the right.

Add files example

If a user views a source that has been shared with them, that source will be copied over to their home directory and they will be able to access it from there. Note that if the owner of the source removes or un-shares that source, the recipient with whom it was previously shared with will no longer have access to it.

Disconnected sources

It is possible that an active datadoc will not have a connection to a source. This can happen for one of two reasons: either (1) the source has been deleted, or (2) the source is offline, for example, a database cannot be connected to or the query returns an error. When a live source goes offline and cannot be connected to, we will send a notification to view the disconnected source and fix it. Additionally, when a datadoc is no longer connected to an active source, a red blinking warning indicator will appear on the sources button — .

Note that a datadoc does not necessarily need to be connected to a live source -- for example, a datadoc could be connected to a query that references fields that are no longer contained in a database. In this case, the data will not be able to be refreshed, but the data that was previously saved to the datadoc will still be viewable. Additionally, the user can modify the source settings to either update the query or change or source in order to connect the sheet to an active source, if needed.

Data types

Datadocs currently uses the following three primary data types:
Type Examples
Text "Hello", "Datadocs"
Number 14, 7.2, 101
Date/Time 2014-01-01, Jan 1, 2014, 12:01:00
Additionally, datadocs also supports list fields for the above types, though all values in a list must be of the same type. An example of this would be as follows:
Name Favorite bands Favorite numbers
Tom ["The Beatles", "Rolling Stones"] [1, 17]
Samantha ["Jimy Hendrix", "Beethoven"] [99, 101]
Data types may be specified on the Source settings page. However, the "formatting" of a source field -- for example, whether a number should be displayed as 2014 or $2014.00 — is handled on the datadocs page. A boolean field (T/F, yes/no, 1/0) can be stored as either a number or text and displaying it is a formatting option for the user. If a field is left empty or has a value of NULL — regardless of its type — it will be shown as [empty value] in the datadoc display. On the preview page, if a value would result in an error, it is displayed as Ø.

Json and xml fields are not natively supported, and when importing those files, their fields will be expanded via dot notation. For example, the following json structure:


{
  "Name": "Paul",
  "Age": 12,
  "Friends": ["John", "Dave"],
  "Address": {
    "Street": "102 Elm",
    "City": "Beverly Hills",
    "State": "California",
    "Country": "USA",
    "ZipCode": 90210
  }
}
						
Would result in the following data when imported into datadocs:
Name (text) Age (number) Friends (text,list) Address.Street (text) Address.City (text) Address.State (text) Address.Country (text) Address.ZipCode (number)
Paul 12 [John, Dave] 12- Elm Beverly Hills California USA 90210
We do plan to introduce support for json fields in the near future.

Data usage

Data usage is defined and priced per month as follows:
Category Description Price
Files stored Only applies to files added to datadocs. This is the size of the actual file. $0.10 / GB
Data stored When data from a file or database is stored in datadocs, that data is saved to datadocs. The ratio between the size of the raw data to the size of the data stored in datadocs varies depending on the various data and search types, but usually range between 2-4x the size of the raw data. $0.20 / GB
Data ingested Each time a datadoc is saved or refreshed, the size of the raw data is ingested. For example, if 1GB of data is saved one time in a month, that would mean 1GB of data is ingested. If that same source was refreshed one time per day, then 1GB x 24 hours/day x 30 days/mo = 720 GB would be ingested in that month. $0.05 / GB
Data queried Each time a query is run against a datadoc, it processes a certain amount of data depending on the complexity of the query, the size of the data being queried, and the amount of data it returns. This amount is indicated each time you run a query on the datadocs page. $0.05 / GB
While estimating data usage may be challenging to gauge initially, we try to make your data usage as transparent as possible in the following ways:

  • You can view your file usage at any time on your Home screen.
  • Each time your data is saved, the amount of data
  • You can click the icon on the datadocs page at any time to see the current data usage of a given datadoc.
  • Each time you run a query, the amount of data that was queried is displayed.
  • Your currently monthly data usage is available to view at all times on your User Account page.

Users

In the Users sectin we'll cover the different user accounts, preferences, and the Home screen.

User home

The home screen is where a user lands when they first log in to the Datadocs application.

Add files example

The following is a breakdown of each component of the page.


1. My Data — The main section contains all sources, datadocs, and folders that the user has created or has access to. The following is an overview of these three types:
Icon Type Description
Add files example Folder Folders can be created by clicking the "+" in the CREATE button or right-clicking in the main section. Sources or other folders may be dragged into or out of folders.
Add files example Datadoc Discussed in more detail in the Datadocs section.
Add files example Add files example Add files example Source Discussed in more detail in the Data section.
By default, all three types are displayed. However, you may choose to narrow down to specific types by clicking the "Type" heading. The last column defaults to displaying "Date", but it can display various other fields, such as "Size" or "Added by". Double-clicking a row in the My Data section will access that item. For example, double-clicking a folder will take you to that directory, while double-clicking a datadoc will take you to its datadoc page. The My Data section is built to emulate a desktop experience, and files and folders can be dragged-and-drop, lassoed, selected, or right-clicked.


2. Item Info Panel — When an item is selected, its details will be displayed in the info panel on the right. Depending on the type of the item selected, this may show things such what type of item it is, when it was added, who it was added by, with whom it is shared, and various other details.


3. Search — At the top of the page is a search bar where you can search for any source, datadoc, or folder that you have access to. Add files example Clicking an item in the auto-suggest will take you directly to that item.


4. Create Button — The Create button allows you to create a new datadoc, add a new folder, or add a new source. Clicking the main part of the button will create a new datadoc, and clicking the "+" button will allow you to add any new item. Add files example

5. File Uploads — Any active source uploads will display in the right-hand corner of the page. This component is discussed in greater detail in the My Data section.

My account

The My Account page allows you to update your user preferences and account settings. You can access this page by clicking the "My Account" link in the left panel of your home screen, or by clicking the link in your account dropdown in the top right.

Add files example

On the My Account page, you can:

  • Update your name, email, or password.
  • Update your default time zone, date formatting, and currency.
  • View or modify your plan type or payment information.
  • Cancel your account.

If you currently have an active Datadoc plan — or have ever had one in the past — this page will also include a link to view and download all past billing statements.

Account types

There are three primary account types: free accounts, individual accounts, and business accounts. The following are the differences between these account type:
Free Individual Business
File storage limits 4GB Unlimited Unlimited
Datadoc limits (per sheet) 100K rows Unlimited Unlimited
Datadoc limits (across all) 1M rows Unlimited Unlimited
User management No No Yes
Consolidated billing No No Yes
Easy sharing within team No No Yes
Minimum users -- -- 3
A user can upgrade or downgrade their account at any time on the My Account page. If, for example, a free user uploads a 1M row file, it will get truncated to 100K rows. If that user then upgrades their account to an Individual plan, they will need to re-upload that file to ingest all 1M rows in its entirety.

Pricing

Datadocs will always have a free account type, and our goal is to make pricing as transparent as possible. For current plans and pricing, please view our Pricing page.

Future roadmap

The following is a brief, unordered list of items to expect from Datadocs in the future:

  • Visualizations
  • Calculations, both at the row- and aggregated-level
  • API to access and update data
  • More sources and direct-source access
  • ETL and improved ingest capabilities
  • Editing of data
  • Dashboards
And of course, to always, always be improving on query performance.