Add test for various length mask functions (code from Samsung, AI Center, Cambridge)#2894
Conversation
This tests five different functions in SpeechBrain that do essentially the same thing.
|
I think we have discussed this merge with @Adel-Moumen and @pplantinga and we agree, they should be merged. I also think that the ones found here are the most used ones. |
TParcollet
left a comment
There was a problem hiding this comment.
Thanks Rogier. I honestly don't have much to say, it looks good to me and I don't think it will impact existing models. I'd like a second opinion from @Adel-Moumen however.
|
|
||
|
|
||
| def make_padding_mask(x, lengths=None, length_dim=1, eps=1e-8): | ||
| def make_padding_mask(x, lengths=None, length_dim=1, eps=1e-5): |
There was a problem hiding this comment.
What is the reason for going from 1e-8 to 1e-5?
There was a problem hiding this comment.
1e-8 is less than 1 unit in the last place in float32. I chose 1e-5 without much thinking.
>>> torch.tensor(1., dtype=torch.float32) - 1e-8 - 1
tensor(0.)
>>> torch.tensor(1., dtype=torch.float32) - 1e-7 - 1
tensor(-1.1921e-07)
>>> torch.tensor(1., dtype=torch.float32) - 1e-6 - 1
tensor(-1.0133e-06)
>>> torch.tensor(1., dtype=torch.float32) - 1e-5 - 1
tensor(-1.0014e-05)
Should I make it 1e-7 instead you think?
There was a problem hiding this comment.
What is the risk here if eps is too large? Could there be an error if a tensor had a length of 1e5?
BTW at a sampling rate of 16000 thats about 6 seconds of audio, so we do actually have cases where tensors are that long. But I doubt that it ever matters if we drop the last sample
pplantinga
left a comment
There was a problem hiding this comment.
Everything LGTM. Thanks for this nice contribution beginning to harmonize the implementations.
|
|
||
|
|
||
| def make_padding_mask(x, lengths=None, length_dim=1, eps=1e-8): | ||
| def make_padding_mask(x, lengths=None, length_dim=1, eps=1e-5): |
There was a problem hiding this comment.
What is the risk here if eps is too large? Could there be an error if a tensor had a length of 1e5?
BTW at a sampling rate of 16000 thats about 6 seconds of audio, so we do actually have cases where tensors are that long. But I doubt that it ever matters if we drop the last sample
| # Convert relative lengths to absolute lengths, then compute boolean mask | ||
| max_len = x.size(length_dim) | ||
| abs_lengths = (lengths * max_len + eps).unsqueeze(1) | ||
| abs_lengths = (lengths * max_len - eps).unsqueeze(1) |
There was a problem hiding this comment.
Good catch here, I assume this is the cause of the failing tests?
The test fails at Float32 has 24 bits in the significand. |
Adel-Moumen
left a comment
There was a problem hiding this comment.
LGTM! You did a very nice work @rogiervd. I am quite surprised that we actually didn't had careful checks for length/masking functions. Thanks a lot!
Wouldn't have been a problem. I believe the docstring is now both correct and well-formatted. |
…ter, Cambridge) (speechbrain#2894) Co-authored-by: Rogier van Dalen <[email protected]> Co-authored-by: Adel Moumen <[email protected]>
This tests five different functions in SpeechBrain that do essentially the same thing.
What does this PR do?
Test functions that turn a list of lengths into a mask:
Are there any other functions in SpeechBrain with the same functionality? Maybe I ought to add tests for them too.
Ultimately, it would be good to merge these functions into one or two, I think; these tests would be a first step towards that.
Commit
d14ed49causes tests to fail, and6eb6828should fix them.I caused one of these failures with a mistaken comment in #2835 (comment).
Before submitting
PR review
Reviewer checklist