provides a workaround for unreasonable overhead encountered in prepro…#303
provides a workaround for unreasonable overhead encountered in prepro…#303jsmcibm wants to merge 3 commits into
Conversation
…cessors - specifically in datasets.map applied to the tokenizer
|
To verify: Examine profile stats with: after the fix: |
|
Regarding NQ dataset: I ran Turns out a huge amount of time was spent in I can't find the corresponding is still suspicious. |
|
should this PR be closed @jsmcibm ? |
|
During code review, Bhavani was unable to replicate the problem. I suspect that there is some additional factor (python version, etc.) that we haven't identified that is influencing the behavior in |
…cessors - specifically in datasets.map applied to the tokenizer
PrimeQA Pull Request
What does this PR do?
provides a workaround for unreasonable overhead encountered in preprocessors - specifically in datasets.map applied to the tokenizer
Closes #(issue)
Notes:
(issue)above ↑↑↑ with the issue this PR closes to automatically link the two.This must be done when the PR is created.
Closes #(issue)as needed.Closes.Description
Describe the changes proposed by this PR below to give the reviewer context below ↓↓↓
Wraps the output of the tokenizer in a dictionary of np.arrays - datasets.map is observed to be much faster with this data structure than with standard tokenizer output object.
(description)
Request Review
Be sure to request a review from one or more reviewers (unless the PR is to an unprotected branch).
Versioning
When opening a PR to make changes to PrimeQA (i.e.
primeqa/) master, be sure to increment the version followingsemantic versioning. The VERSION is stored here
and is incremented using
bump2version {patch,minor,major}as described in the development guide documentation (https://github.com/primeqa/primeqa/blob/main/docs/development.md).primeqapackage or was not into master?After pulling in changes from master to an existing PR, ensure the VERSION is updated appropriately.
This may require bumping the version again if it has been previously bumped.
If you're not quite ready yet to post a PR for review, feel free to open a draft PR.
Releases
After Merging
If merging into master and VERSION was updated, after this PR is merged:
Checklist
Review the following and mark as completed: