Init issue#230
Conversation
Codecov Report
@@ Coverage Diff @@
## main #230 +/- ##
==========================================
- Coverage 86.99% 86.97% -0.03%
==========================================
Files 89 89
Lines 4429 4436 +7
==========================================
+ Hits 3853 3858 +5
- Misses 576 578 +2
Continue to review full report at Codecov.
|
JeanKossaifi
left a comment
There was a problem hiding this comment.
Thanks @caglayantuna, left comments directly in the code.
|
|
||
| The type of initialization is set using `init`. If `init == 'random'` then | ||
| initialize factor matrices using `random_state`. If `init == 'svd'` then | ||
| initialize factor matrices with uniform distribution using `random_state`. If `init == 'svd'` then |
There was a problem hiding this comment.
I'm actually thinking of changing the init so the reconstructed tensor has 0-mean and a variance specified by users, while the factors will have a Gaussian distribution.
There was a problem hiding this comment.
Sure I think this makes more sense for unconstrained Parafac. It's not hard to compute the variance of the tensor given the variance of each iid factor entry I can help with that if you want.
However this is not great for nonnegative CP and others, for which keeping the uniform distribution may be better?
There was a problem hiding this comment.
I already made the calculations for the variance and implemented them for the torch version, I just need to port it for all backend, will try to do when I get some time.
That's a good point for the CP-NN, do you think uniform is better than a folder gaussian (abs(gaussian)) for initialization? Gaussian distribution might be more predictable when the order of the tensor grows?
There was a problem hiding this comment.
Even after working man years with nonnegativity, I don't have a strong idea on that. Maybe it's good to leave the choice open to the user.
| try: | ||
| try: | ||
| if normalize_factors is True: | ||
| warnings.warn('It is not recommended to initialize an initial tensor with normalizing. Consider normalizing the initial tensor before using this function') |
There was a problem hiding this comment.
We can probably clarify the warning -- the normalization is on the factors, by initial tensor, users might understand the input tensor.
More generally, why wouldn't we want users to request a normalized tensor while also providing their own init?
There was a problem hiding this comment.
We will clarify the warning indeed.
The behavior we wanted to avoid was
- the user inputs a tensor with non-unit weights
- the weights are pulled on the last factor
- the factors are normalized
In that case I was worried that the weights pulling was useless and that could confuse the user, therefore suggesting to normalize beforehand. Indeed a user might want to normalize his input initialization, but then is it not a better practice to do it outside the init function, so that the init function does not change the init tensor itself? To me the normalization option should only be allowed for random or svd init.
There was a problem hiding this comment.
Yes, it's good to warn the user that their normalization will be overwritten. But that will anyway right? I don't think the user expects the initialization to not change? I tend to just view (weights, factors) as a generic representation in CP form rather than a normalized CP. They may have gotten the (weights, factors) from e.g. a previous run of CP with normalize=True and just want to do warm restart.
I'm actually curious about the actual impact of normalization on stability within CP.
| kt = CPTensor((None, factors)) | ||
| else: | ||
| for i in range(len(factors)): | ||
| factors[-1] = factors[-1]*weights[i] |
There was a problem hiding this comment.
Why that factor rather than any other? At first glance, I don't have a strong intuition about that - would it affect convergence/stability? Another thing I wanted to add was to actually spread the weights in all the factors (more generally, to manipulate decomposed factors without changing reconstruction).
There was a problem hiding this comment.
Actually, there is no specific reason for that. Scaling only one factor doesn't effect the reconstruction. If you prefer, I would change to scale all the factors with related weights.
There was a problem hiding this comment.
It could even be a separate function in that case, e.g. to make all factors equal weights? Do you think it would make sense to check the numerical stability when i) absorbing all weights in one factors and ii) spreading them in all factors?
There was a problem hiding this comment.
As we agreed last week, we updated factor initialization by spreading weights in all factors. In order to avoid unstable weight values, we update all the factors with geometric mean of weight values. I hope current version is ok for merging
d3652ef to
4592da2
Compare
|
Thanks @caglayantuna, looks good to me. |
4592da2 to
4fe6014
Compare
4fe6014 to
65a9c62
Compare
|
Hey @JeanKossaifi , thanks. I squashed all the commits into one commit. But, I couldn't understand the problem with MxNet. |
|
Thanks @caglayantuna, merging! |
his pull request is related to issue #222. We suggest some updates in _cp.py file.
init docstring
Since there are 3 options to initialize tensors but only 2 of them are explained in the docstring, We added "If init is a previously initialized
cp tensor, all the weights are pulled in the last factor and then the weights are set to "1" for the output tensor." and 'cp tensor' to 'init : {'svd', 'random', cptensor}, optional'. We have also addeduniform distributioninformation to random option.Reverse normalization for weights
We added some lines to third option in order to avoid weight conflicts:
Thus, if a user puts an input weights with values different than "1", these lines will rescale the last factor accordingly.
Warning for normalize_factors
We have also added a warning as
if normalize_factors is Truewhen using an initial tensor. Because it is not useful to normalize after pulling all weights to the last factor.Mttkrp calculation
Besides, we changed the mttkrp lines which were taking into account weights before. Now, one option is enough to calculate mttkrp.