Skip to content

Add test for various length mask functions (code from Samsung, AI Center, Cambridge)#2894

Merged
pplantinga merged 8 commits into
speechbrain:developfrom
rogiervd:test_length_mask
May 20, 2025
Merged

Add test for various length mask functions (code from Samsung, AI Center, Cambridge)#2894
pplantinga merged 8 commits into
speechbrain:developfrom
rogiervd:test_length_mask

Conversation

@rogiervd

@rogiervd rogiervd commented May 2, 2025

Copy link
Copy Markdown
Contributor

This tests five different functions in SpeechBrain that do essentially the same thing.

What does this PR do?

Test functions that turn a list of lengths into a mask:

  • speechbrain.dataio.dataio.length_to_mask
  • speechbrain.lobes.models.transformer.Transformer.get_mask_from_lengths
  • speechbrain.nnet.losses.get_mask
  • speechbrain.nnet.losses.compute_length_mask
  • speechbrain.processing.features.make_padding_mask

Are there any other functions in SpeechBrain with the same functionality? Maybe I ought to add tests for them too.

Ultimately, it would be good to merge these functions into one or two, I think; these tests would be a first step towards that.

Commit d14ed49 causes tests to fail, and 6eb6828 should fix them.

I caused one of these failures with a mistaken comment in #2835 (comment).

Before submitting
  • Did you read the contributor guideline?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist
  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified
  • Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
  • Review the self-review checklist to ensure the code is ready for review

This tests five different functions in SpeechBrain that do essentially the same thing.
@rogiervd rogiervd changed the title Add test for various length mask functions Add test for various length mask functions (code from Samsung, AI Center, Cambridge) May 2, 2025
@TParcollet

Copy link
Copy Markdown
Collaborator

I think we have discussed this merge with @Adel-Moumen and @pplantinga and we agree, they should be merged. I also think that the ones found here are the most used ones.

@TParcollet TParcollet left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Rogier. I honestly don't have much to say, it looks good to me and I don't think it will impact existing models. I'd like a second opinion from @Adel-Moumen however.

Comment thread speechbrain/processing/features.py Outdated


def make_padding_mask(x, lengths=None, length_dim=1, eps=1e-8):
def make_padding_mask(x, lengths=None, length_dim=1, eps=1e-5):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for going from 1e-8 to 1e-5?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1e-8 is less than 1 unit in the last place in float32. I chose 1e-5 without much thinking.

>>> torch.tensor(1., dtype=torch.float32) - 1e-8 - 1
tensor(0.)
>>> torch.tensor(1., dtype=torch.float32) - 1e-7 - 1
tensor(-1.1921e-07)
>>> torch.tensor(1., dtype=torch.float32) - 1e-6 - 1
tensor(-1.0133e-06)
>>> torch.tensor(1., dtype=torch.float32) - 1e-5 - 1
tensor(-1.0014e-05)

Should I make it 1e-7 instead you think?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the risk here if eps is too large? Could there be an error if a tensor had a length of 1e5?

BTW at a sampling rate of 16000 thats about 6 seconds of audio, so we do actually have cases where tensors are that long. But I doubt that it ever matters if we drop the last sample

@pplantinga pplantinga left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything LGTM. Thanks for this nice contribution beginning to harmonize the implementations.

Comment thread speechbrain/processing/features.py Outdated


def make_padding_mask(x, lengths=None, length_dim=1, eps=1e-8):
def make_padding_mask(x, lengths=None, length_dim=1, eps=1e-5):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the risk here if eps is too large? Could there be an error if a tensor had a length of 1e5?

BTW at a sampling rate of 16000 thats about 6 seconds of audio, so we do actually have cases where tensors are that long. But I doubt that it ever matters if we drop the last sample

# Convert relative lengths to absolute lengths, then compute boolean mask
max_len = x.size(length_dim)
abs_lengths = (lengths * max_len + eps).unsqueeze(1)
abs_lengths = (lengths * max_len - eps).unsqueeze(1)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch here, I assume this is the cause of the failing tests?

@rogiervd

Copy link
Copy Markdown
Contributor Author

What is the risk here if eps is too large? Could there be an error if a tensor had a length of 1e5?

BTW at a sampling rate of 16000 thats about 6 seconds of audio, so we do actually have cases where tensors are that long. But I doubt that it ever matters if we drop the last sample

The test fails at 4e-7 but succeeds at 5e-7 so I'll make it 1e-6, which gives a better safety margin.

Float32 has 24 bits in the significand. 2**24 == 6e-8 so there could be an argument for doing these fractions at float64.

Comment thread tests/unittests/test_length_mask.py Outdated

@Adel-Moumen Adel-Moumen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! You did a very nice work @rogiervd. I am quite surprised that we actually didn't had careful checks for length/masking functions. Thanks a lot!

Comment thread tests/unittests/test_length_mask.py Outdated
Comment thread tests/unittests/test_length_mask.py
Comment thread tests/unittests/test_length_mask.py Outdated
@rogiervd

Copy link
Copy Markdown
Contributor Author

Indeed, good catch! Sorry (I wanted to free you from modifying the docstring by yourself since you did an amazing job already). PS: I am running the pre--commit etc.

Wouldn't have been a problem. I believe the docstring is now both correct and well-formatted.

@pplantinga pplantinga merged commit ba9364a into speechbrain:develop May 20, 2025
5 checks passed
pplantinga pushed a commit to pplantinga/speechbrain that referenced this pull request Jun 2, 2025
…ter, Cambridge) (speechbrain#2894)

Co-authored-by: Rogier van Dalen <[email protected]>
Co-authored-by: Adel Moumen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants