09-16-2016, 07:04 PM
You're still better of with using a 'proper' linux filesystem such as ext4... you get the journaling features that will make it more likely a unexpected shutdown can be recovered from. NTFS is a journalling filesystem also, but I don't think it is anywhere near as reliable as ext4 is (on linux, anyway). Plus as tkaiser said, performance will be an issue, and since you're working with large data sets, you want the best performance you can get.
(09-16-2016, 11:09 AM)pharris430 Wrote: I'm not so much worried about speed as proof in concept. I need to build multiple sample databases of fake companies that I would then build an integration platform that combines the separate databases into on hadoop filesystem, and then from there reporting. That's the idea at least.