Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

README.TXT

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
  1. This file describes some of the more important bits of information in the
  2. SQLite databases, stored in this directory. The plain text metadata files
  3. (e.g. SPEAKERS.TXT) contain a subset of the information in these databases.
  4. ia.sqlite.db
  5. ============
  6. This file contains the metadata extracted, about the LibriVox recordings,
  7. hosted on the Internet Archive. The database contains two tables:
  8. * meta - each row represents a LV project stored on the Internet Archive.
  9. That is the URL of the main IA page for the project, the title and
  10. the author of the book on which the recordings are based
  11. * mp3 - each row is an audio chapter. It references the project this chapter
  12. is part of(i.e. a row in 'meta'), the URL from which the .mp3 file can
  13. be downloaded, it's size and checksums. For the 64kbit/s .mp3 recordings
  14. the "parent_id" points to another row, containg information about the
  15. 128 kbit/s version from which it was derived.
  16. pg.sqlite.db
  17. ============
  18. Contains the parts of the metadata provided by Project Gutenbergs XML/RDF files,
  19. which is relevant to the LibriSpeech corpus. The tables are:
  20. * books - each row represents a PG book. Has columns for book's title, various
  21. classification codes, and the URLs for the ASCII and/or UTF-8 version
  22. of the text
  23. * authors - each row represents an author, and have many-to-many relationship
  24. with the "books" table - i.e. a book can have more than one author
  25. and one author can be associated with more than one book.
  26. lv-annotated.sqlite.db
  27. ======================
  28. This database is a hodge-podge collection of various things that were relevant
  29. for the alignment process and subsequent corpus creation.
  30. First there are tables with information about LibriVox projects. This was
  31. the original information in the database- the other things were then added
  32. as needed:
  33. * projects - each row describes a LibriVox audio book project. Has columns for
  34. the project's title, the URL for the associated LibriVox page, the
  35. number of the audio chapters, total time in seconds, the URL for
  36. relevant Internet Archive page and so on.
  37. * audio_chapters - each row corresponds to an audio chapter a LibriVox project.
  38. The "project_id" column contains the foreign key pointing to
  39. the parent project, and there are columns containing the
  40. duration of the chapter in seconds, the URL from which it
  41. can be downloaded and so on. Another foreign key links the
  42. chapter to a row in the "readers" table below.
  43. * readers - contains the ID and the name of a LibriVox volunteer.
  44. * authors - contains the name and dates of birth and death of a book author
  45. * genres - ID and name of the genres under which the LibriVox projects are
  46. classified
  47. The tables that follow were used in the process of scheduling alignment jobs:
  48. * jobs - contains the basic information about each scheduled job. Things like
  49. dates when the task was scheduled, enqueued on SGE, started and
  50. finished, as well as status code for the outcome of the job(e.g.
  51. success or failure). The 'successor_id' field points the re-scheduled
  52. instance of this job if we had to restart it for some reason.
  53. Two types of jobs were used to align the LibriSpeech's audio- in the parlance of
  54. the source code they are called "top-half" and "bottom-half" jobs. The former
  55. are responsible for normalizing a book's, building the phase 1 decoding graph,
  56. and g2p lexicon generation(for more details search for the yet-to-be published
  57. paper). The "bottom-half" jobs are user oriented, and are responsible for the
  58. actual alignment. This design was chosen in order to share the initial processing
  59. of a book, for which there could be LibriVox audio chapters, read by different
  60. readers. The tables below extend the definition of "jobs", through the magic
  61. of SQLAlchemy:
  62. * top_half_jobs - contains the location of the source text to process.
  63. * bottom_half_jobs - contains the ID of the reader, whose audio should be
  64. processed by this job. Also stores number of statistics
  65. about the job, filled after its completion. Information
  66. like, the percentage of the audio successfully aligned
  67. the number of the audio chapters for this reader which
  68. succeeded and failed, the real-time factor, that is the
  69. ratio between the useful, aligned audio and the time
  70. it took to obtain it and so on. The "worst_verification_wer"
  71. field records the result of the post-processing verification
  72. of the obtained utterances(a process that produces a certain
  73. number of false alarms)
  74. * bottom_half_chapters - each row contains information about the processing
  75. of an individual chapter within the bottom-half job.
  76. Mostly the same information as above but at per-chapter
  77. granularity.
  78. The next set of tables contain information that was added using a process of
  79. quickly reviewing tiny fraction of the audio to make sure that there aren't
  80. multi-speaker recordings and other undesirable artifacts in the corpus. Note
  81. that this process should not be considered infallible.
  82. * reader_annotations - contains annotations about the individual reader.
  83. Perhaps the only (more or less) reliable field here
  84. is the gender information.
  85. * audio_chapter_annotations - per-chapter "noisy" and "multi-speaker" flags.
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...