For a good answer, you need to provide a lot more detail in the requirements:
- What do the writes look like? If they are coming in a stream how many writes per second do you need to support? If they are a bulk load how large and frequent are the batches? Simple numerical values?
- What do the reads look like? How many queries per second do you need to support? How much data per query? How fast do the queries need to be? Will your queries be simple aggregations? Dimensional queries? Unique dimension value counts? Are approximations tolerated?
- How much history do you need to keep?
- What are your requirements for availability?
- What are your requirements for consistency?
- How fast does new data have to show up in reads?
Without more detail, you're going to get dozens of suggestions which may each be right for a particular case.
Part of the reason the question was light on details is that this is just at the very beginning and a lot of relevant things aren't locked in yet. Below are the back of napkin results and are subject to the risk of being laughably wrong.
Writes: not totally sure in terms of how the data is being packaged before being sent yet, but it'll probably be more than 10 writes a second but less than 1000 initially(?). Not sure yet if we're aggregating and batching before sending or if we are, to what degree.
Availability: If it has brief breaks where it just misses some data (<3seconds?) probably not the worst thing, but really trying to avoid big gaps in the data.
Reads will likely be grabbing the last n records of a given set of sensors maybe with some light math on it if the query language supports it, though there might be an easier way to cache recent history and then only need to go to the big list for responding to a longer-term issue. Also the nature of reads is very subject to change since there's a bunch of use-cases for the data being kicked around and I haven't gone through what each use's reads would look like yet.
New data needs to show up in reads in soft-real time. The napkin-estimate indicates that we might be looking at asking for about 6-80MB returned per query as a generally large but perhaps not max query, bigger operations that dealt with legitimately huge amounts of data will probably be scheduled around lighter periods/put on different machines (not sure how adding more machines reading would impact since I don't know what db it will be yet).
Ideally keep as much history as humanly possible, possibly moving them to physical archival at some point (1yr+?).
All these questions should ideally not be a concern when you are looking for a database. A general purpose database which can handle all the above and more is AmisaDB. http://www.amisalabs.com/
- What do the writes look like? If they are coming in a stream how many writes per second do you need to support? If they are a bulk load how large and frequent are the batches? Simple numerical values?
- What do the reads look like? How many queries per second do you need to support? How much data per query? How fast do the queries need to be? Will your queries be simple aggregations? Dimensional queries? Unique dimension value counts? Are approximations tolerated?
- How much history do you need to keep?
- What are your requirements for availability?
- What are your requirements for consistency?
- How fast does new data have to show up in reads?
Without more detail, you're going to get dozens of suggestions which may each be right for a particular case.