Very slow backup

Morgan    Jul 4 2:57PM 2017

You may be getting tired of me by now, but I've come across another issue...

I have been trying to back up a VM from one esxi 6.5 to another one. The actual space used on this is around 60GB, and after running for a day, it says 3 more days to go just for the first (and smaller of the two) virtual drive:

 Uploading *****---------------------------------- 694KB/s 3 days 08:26:40 13.2%

Now, to test out whether there's a problem with the connection between the machines, last night I used transferred a VM of about half that size from a machine on the same internet, to the same destination server (esxi 6.5 server). It completed by this morning.

The only thing I can figure out is that it seems VerticalBackup may be ignoring the thin provisioning of these drives?

The actual space used on the drive above that is taking 3+ days is about 30GB. However, at the command line, the actual flat file (with the -flat.vmdk suffix) is 221.0G. I am wondering if VerticalBackup is taking this whole file?

But that doesn't explain it all...

When I have the backup running, the disk read rate climbs to only 1-2 MB/S. That is very slow, and this is a Toshiba SSD Pro drive. So it seems that the VerticalBackup is somehow being limited in its disk read speed.

I suppose in the end that a slow backup is okay, but for other purposes such as transferring VM's from one machine to another (which is what I am attempting here), this is not fast enough to be useable.

Is there some bandwidth/resource limiting built in to VerticalBackup to avoid swamping resources? If so, is there a switch to turn it off? (I did read the Guide and did not see anything obvious).

Thanks


gchen    Jul 4 4:15PM 2017

You can run 'vertical benchmark' to test the disk performance. Are you using an esxi server as the sftp storage? My experience is that they are generally very slow when served as file servers. Can you try using a mac or linux computer as the server?


Morgan    Jul 4 5:20PM 2017

Thanks - I am awaiting the benchmark to complete. It is very fast on my source machine:

Creating a 1,024M test file with random data
Write time: 5.270, speed: 194.314MB/s
Read time: 3.628, speed: 282.220MB/s
Creating another 1,024M test file with random data
Write time: 5.400, speed: 189.616MB/s
Read and hash time: 6.097, speed: 167.948MB/s

It seems to take a long time on the remote. I have been waiting a while with no obvious progress. I tried it a few times. It just pauses for a long time after here:

 Storage set to sftp://myname@myhost.com//Volumes/TimeMachine/vmbackup

I also switched from sending to an esxi host to a MacOS 10.11 Server host with a 1Gb connection. Initial results are also very slow and I am still waiting. I will report back.

Also note: I had to kill a running vertical backup to run the benchmark. I used ^C. However, it only killed the parent, not the children. I had to find those and kill them manually.


Morgan    Jul 4 8:34PM 2017

It is a few hours later, and the only progress that the benchmark made:

 Upload time: 1854.791, speed: 0.552MB/s

Eventually, the connection died with a broken pipe so I did not see other results.

This upload speed is definitely not saturating my connection.

I am now trying a backup to the Mac server, and getting the same slow result:

Uploading ---------------------------------------- 744KB/s 8 days 04:46:37 0.1%

Something seems to be going on such that VerticalBackup isn't able to use anywhere near the full bandwidth for backups..?

Thanks


gchen    Jul 4 10:29PM 2017

Can you run this command to check if your network adapter is running at 1Gbps or 100Mbps?

esxcfg-nics -l


Morgan    Jul 6 2:55PM 2017

Both sending and receiving machine are running at 1Gbps.

Even if it were only 100Mbps, that's 12.5Mbps.

I think we are looking at the wrong problem.

The transfer speed is showing right now as 1.13MB/s on this backup:

 Uploading --------------------------------------- 1.13MB/s 5 days 06:07:19 0.3%

At a rate of 1.13MB/s, that is 4.068GB/h and 97GB/day.

My earlier question to you is about why Vertical Backup is backing up the entire .vmdk as if it were THICK provisioned, in which case the numbers reported would make sense - my drives total 774GB.

*However, these drives are provisioned inside esxi as Thin.* Here is what shows in the header file for this disk:

ddb.thinProvisioned = "1"

Now, even though it's thin provisioned, in the filesystem, the ".flat" file shows up as full size (the full size once used):

 -rw-------    1 root     root      503.0G Jul  6 19:11 owncloudConv-flat.vmdk

From some research, this is normal. Here is what the GUI says:

provisioned vs actual

You can see that only 99.21GB is used.

However, so far I have seen no speedup from the de-duplication in vertical backup.

I have now attempted transferring this machine > 5 times, and each time it fails. I have implemented the "nohup" command as you suggested, and I left one running since Tuesday. Today I checked it, and there was no sign of the process running on the machine, nor was the backup completed.

As shown above, it is still expecting 5 days to complete (for the 500GB drive, that does not include the 224 GB one).

I hope you can help me figure out what is wrong, or I will have to look for other solutions.


Morgan    Jul 6 2:57PM 2017

Again, about internet speeds, the real limitation is the upload speed from my local machine. It is limited to 10Mbps, or 1.25MB/s. I cannot change that.

However, when I transfer a VM using VMware Fusion, it goes across using nearly the full speed, and completes in acceptable time.

Something is causing VerticalBackup to take 5X as long.

Morgan


Morgan    Jul 6 3:11PM 2017

I found out how to determine exactly how much is used and provisioned based on this link: http://www.virten.net/2014/11/identify-disk-usage-of-a-thin-provisioned-virtual-disk/.

Here is provisioned: ls -lh owncloudConv-flat.vmdk -rw------- 1 root root 503.0G Jul 6 19:11 owncloudConv-flat.vmdk

Here is actual used

du -h owncloudConv-flat.vmdk 
49.0G   owncloudConv-flat.vmdk

So it seems clear that VerticalBackup is using the whole provisioned drive, rather than just the used portion - at least for its calculations. Since I have yet to see a backup complete after trying to leave them running for more than 24 hours, I suspect it is not just a calculation issue.

Thanks


gchen    Jul 6 4:16PM 2017

 Upload time: 1854.791, speed: 0.552MB/s

This is normal for a 10Mbps connection, considering SFTP is not the most efficient transferring protocol.

Vertical Backup doesn't treat thin-provisioned and thick-provisioned disk files differently. It will just scan the entire disk file, one chunk at at time. However, when it hits those thin parts (which are segments of consecutive zeroes) it will be much faster and won't upload them again and again due to the deduplication. So the estimate it gives you at the beginning may not be accurate.

So maybe we should first figure out why all your backups failed. If you used nohup to run the backup command then there should be a nohup.out file that contains the logs which can tell you what went wrong. Alternatively, you can setup the email notifications to send logs to an email address.

I would also suggest trying a smaller virtual machine first, so that you can run a full backup in less time and see what the real speed is.


Morgan    Jul 7 9:03AM 2017

Ok, you say this is "normal" - but it is 1/3-1/4 or less of the speed that VMWare's file transfers occur at. It makes the use of VerticalBackup very problematic for in situations like this. Not everyone has a 1000Gb upload/download connection to all machines, and you're basically telling me that that's the only way to practically use this tool.


gchen    Jul 7 9:49AM 2017

First, you're comparing apples and oranges. '0.552MB/s' is the speed of uploading random data, not an actual disk file. When uploading an actual disk file, especially a thin-provisioned one, the speed will be much faster. Therefore, I highly doubt the actual upload speed will be 1/3-1/4 of a VMWare file transfer.

Second, even if the initial backup is slower than a VMWare file transfer, the subsequent backups will likely be faster, because of the advanced deduplication technique implemented by Vertical Backup. Of course, I only compared Vertical Backup with vmkfstools and I don't know if VMWare Fusion supports incremental transfer.

Third, you can use more than one uploading thread in Vertical Backup. An option of --threads 2 or --threads 4 will perhaps saturate your 10Mbps uplink.


Morgan    Jul 7 9:48PM 2017

Well, okay.

I've been working with unix since the the 1970's, and though I am less experienced with Linux than BSD varieties, I am not slow at this.

Yet VerticalBackup has been very challenging to get working right. I have had bugs, problems, and slowness.

Fortunately, you have addressed the bugs quickly - that gives me hope. But your response now was not very customer-friendly to someone who was planning on starting with 2 copies and expanding to more. It gives me pause.

Your argument about comparing apples to oranges does not make sense to me. I was reporting the actual number provided by VerticalBackup.

I have measured the actual speed, and it is much slower. I recently transferred a complete VM from my local machine across the same connection to a remote (fast connection) machine. Using Vmware's tools, that 30GB transfer completed overnight.

On the other hand, leaving a Vertical Backup running the latest time to an OS X server resulted in a net of 1.6G transferred when I left it running overnight, on the same (slow 10Mbps upload) connection.

So I have actual data: 30G versus 1.6G. You can argue as much as you want - and perhaps there is something I'm missing in how to use this - but that's a very clear difference. I was being kind when I said 3-4X difference, when in my experience so far it is much greater.

As for the threads, it's clear from your previous statements and from the fact that multiple processes are launched that VerticalBackup is already doing multiple threads. How is that different than specifying it as a switch?

I have been very patient trying to get this to work for me - and I do not have a lot of time to play around. I like the theory of how this software works, and I appreciate that you have been fairly supportive so far. That's why I have given it time.

But now that you are trying to point fingers instead of getting to the bottom of what's going on, I'm not so sure.


gchen    Jul 7 11:03PM 2017

By default Vertical Backup will use only one uploading thread, even if you can see multiple processes running.

But even with a '0.552MB/s' transfer rate, 1.6G overnight is still too low. I'll run some tests on my home internet connection (at most 5Mbps I think) and report back results.


gchen    Jul 8 7:55AM 2017

Google speed test showed that my home internet connection is 12.8Mbps down and 2.23Mbps up with a latency of 472ms, about 4 times slower than yours.

Here is the output of .\vertical benchmark:

Vertical Backup 1.0.4           
Test directory: /vmfs/volumes/5308afa8-75dfaca0-3805-6805ca20cc3e
Creating a 1,024M test file with random data
Write time: 12.041, speed: 85.041MB/s
Read time: 10.223, speed: 100.168MB/s
Creating another 1,024M test file with random data
Write time: 13.216, speed: 77.485MB/s
Read and hash time: 10.295, speed: 99.469MB/s
Storage set to sftp://gchen@build.acrosync.com/homestorage
Deleting old remote test files  
Upload time: 3890.938, speed: 0.263MB/s
Download time: 627.848, speed: 1.631MB/s

The upload speed of 0.263MB/s indicates it is only twice as slow as yours.

The output of ./vertical backup 'Linux Admin' --threads 4:

 
Vertical Backup 1.0.4           
Licensed to Acrosync LLC; expires on 2018-01-25
Storage set to sftp://gchen@build.acrosync.com/homestorage
Listing all virtual machines    
Backing up Linux Admin, id: 13, vmx path: /vmfs/volumes/datastore1/Linux Admin/Linux Admin.vmx, guest os: ubuntu64Guest
No previous backup found        
Virtual machine Linux Admin is powered off
Removing all snapshots of Linux Admin
Uploaded file /vmfs/volumes/datastore1/Linux Admin/Linux Admin.vmdk
Uploading file Linux Admin-flat.vmdk
Using 4 uploading threads       
Uploaded file Linux Admin-flat.vmdk 934KB/s 04:59:28
Uploaded file Linux Admin.vmx   
Uploaded file Linux Admin.vmxf  
Backup Linux Admin@esxi55 at revision 1 has been successfully completed
Total 16391 chunks, 16385.21M bytes; 9796 new, 9791.21M bytes, 4809.73M uploaded
Total backup time: 04:59:33     

This is a Linux virtual machine running Ubuntu 16.10 on a 16 GB thin-provisioned disk. The actual data on the disk is about 9.7G, but because of compression only 4.8G data were uploaded.

The speed of 934KB/s was based on the disk size of 16G. If we use the actual data size of 9.7G then the actual speed would be 545KB/s. When taking compression into account, the real transfer speed would be 267KB/s, very close to that reported by ./vertical benchmark.

So I think this level of performance is acceptable. I don't know why it was so slow for you, but if you want I can do my best to help you get to the bottom of it. You are indeed very patient -- if this happened to me with a new software I may have given up already.


Morgan    Jul 9 10:51PM 2017

Thanks for following up.

I tried the four thread version, and got much better results! Hooray. It completed transfer of my first drive in under a day, with a total "rate" (including compression/deduplication) of 22.49MB/s.

I did notice that even in running in background (nohup) that it is quite sensitive. At one point my connection was down for ~30-45 seconds, and the background processes quit, having to be manually restarted.

I think this had to do with some of the challenges I was previously having. The internet at my office has been having problems lately.

I do hope you can make VerticalBackup more tolerant to temporary loss of connection.

In any case, now that it seems to be working, I am more optimistic. I just hope the restore to the destination machine goes smoothly.

Thanks for the help.


gchen    Jul 10 3:32PM 2017

Glad to hear the good news! I guess the fact that many chunks had been uploaded by previously failed backups and thus were simply skipped also contributed to the unrealistic rate of 22.49MB/s.

And yes, retrying on network errors is one feature I was thinking about, but it looks more urgent than ever before.


Lee Tickett    Sep 1 5:46AM 2017

that's a hidden gem- i wonder if the --threads should default to 4? my backup went from 10.99MB/s to 27.56MB/s


Log in to comment
Copyright © Acrosync LLC 2017