Extract data from MongoDB with Sqoop to write on HDFS?


I am concerning about extracting data from MongoDB where my application transact most of the data from MongoDB.

I have worked on sqoop to extract data and found RDBMS gel up with HDFS via sqoop. However, no clear direction found to extract data from NoSQL DB with sqoop to dump it over HDFS for big chunk of data processing? Please share your suggestions and investigations.

I have extracted static information and data transactions from MySQL. Simply, used sqoop to store data in HDFS and processed the data. Now, I have some live transactions of 1million unique emailIDs per day which data modelled into MongoDB. I need to move data from mongoDB to HDFS for processing/ETL. How can I achieve this goal using Sqoop. I know I can schedule my task but what should be the best approach to take out data from mongoDB via sqoop.

Consider 5DN cluster with 2TB size. Data size varies from 1GB ~ 2GB in peak hours.


Answers:


Sqoop is applied to import data only from relational databases. There are other ways to get data from mongo to Hadoop.

eg: https://docs.mongodb.com/ecosystem/tools/hadoop/

Or else you can use any data flow management tools like Nifi or Streamsets and get data from mongo in realtime.