It overflowed the internal buffer. Thus, we split vector x into blocks when m is very large. Thank @wangqian for this patch.