If you are importing or exporting data frequently, automating the process via Scheduled Tasks (DataHub) can save you time. Using DataHub, you can import files directly from online repositories such as FTP, SCP, and HTTP sites as well as Amazon S3. You can also export files to the same repositories or allow DataHub to send them to your via email.
However, there are several factors that may affect the operation and performance of DataHub. This article describes the best practices for ensuring that DataHub runs in its optimum capability.
- Schedule DataHub tasks during low activity times, both by account users and app users. Ensure that your repository is fast and is capable of high-speed data transfer.
Note: During the import process, the performance of the destination table may degrade and the table may have to be locked partially or completely. These events can affect app users that are working with the same table.
- One common problem you might encounter after scheduling an import task is when the design of the future import files or the design of the destination table is modified.
- Do not schedule a task too frequently. Allow enough time for the task to completely finish before letting another recurrence of the same task to begin.
- Do not schedule any manual import or export of the same table or view.
- Your table may be growing and, as it incurs more data, a task that used to take only a few minutes to run may gradually require longer processing time. We recommend that you perform a task manually to see exactly how long it takes to complete. Taking the growth of your table into consideration, schedule it in such a way that it provides adequate time between tasks. Also, note that the processing time of the task depends on the speed of your repository.
- Occasionally review DataHub logs to address any issues that may have occurred during the processing of tasks. Mouse-over a task on the Scheduled Tasks screen and click History. On the next screen, mouse-over the log you wish to view and click Properties. These logs are also sent to you by email in real time.
Mission Critical Import/Exports
Many factors can affect DataHub’s ability to run your tasks on time. The best practices provided in this article can help you in scheduling your tasks, but if your import/export is highly time-sensitive you have two other options:
- Use the Caspio API for real-time changes to your data.
- Sign up for a Caspio Enterprise Plan, which provides a dedicated infrastructure for your organization.
DataHub does its best to run your task at the time you have set. However, there are no guarantees that this is possible due to the factors discussed below. Since many customers schedule tasks exactly at the top of the hour (i.e., 8:00, 11:00, 1:00), we recommend that you schedule your tasks at odd hours (i.e., 9:15, 7:45, 12:30). This increases the chance that your task starts on time and completes quickly.
Maximum Simultaneous Tasks
Currently, DataHub allows up to three tasks per account to run simultaneously. Additional tasks are queued to run after one of your task has finished. On Caspio Enterprise accounts, this number is flexible.
Tasks that cannot begin within 15 minutes from their scheduled time are considered expired and are skipped. This can happen when your account has too many tasks and some tasks cannot begin until more than 15 minutes from their original scheduled time has passed.
For example, there are six tasks scheduled to run at 11:00 AM. The system starts tasks 1, 2, and 3 at the scheduled time. Assuming task 1 and 2 finish at 11:05, the system starts tasks 4 and 5. Assuming task 3 finished at 11:16, the system skips task 6 because at 11:16, task 6 is already expired.
Any data that goes into or out of your account incurs data transfer—this includes DataHub. Make sure that your Plan contains adequate data transfer allocation for your DataHub activity.
Common Error and Warning Messages
Pay close attention to the details of the DataHub error and warning messages to understand the issue and see how they can be resolved. Below is a list of the most common error and warning messages.
|Connection failure||This occurs when a connection cannot be established with the target repository. Verify that you can connect with the repository directly and if the latest access credentials are available in DataHub.|
|Upload/Download failed||This occurs when objects are not uploaded or downloaded to/from target server. This means that while a connection was made successfully, the expected file was not there; Caspio had no write permissions to save an export file; the connection was terminated during transmission; etc.|
|Task occurrence skipped||This occurs when Caspio is unable to run a task within 15 minutes of its scheduled time. Caspio will then try to run the next occurrence. See the Expired/Skipped Tasks section above for ways on how to address this issue.|
|Account limit reached||Your plan includes a specific number of DataHub tasks per month. Once your monthly limit is reached you will get this message and tasks will not run until the beginning of the next billing cycle. Contact your account manager to increase your limit.|
|Export/Import issues||This message indicates that a task was completed but some data or records were not exported/imported due to the reasons listed in the details of the message.|
|Import/Export failed||This message indicates that an error occurred during the importing or exporting of tables/views. Read the details for more information.|