Information Theory Meets Power Laws
Information Theory Meets Power Laws
Stochastic Processes and Language Models
Debowski, Lukasz
John Wiley & Sons Inc
04/2021
384
Dura
Inglês
9781119625278
15 a 20 dias
710
Descrição não disponível.
Preface ix
Acknowledgments xiii
Basic Notations xv
1 Guiding Ideas 1
1.1 The Motivating Question 1
1.2 Further Questions About Texts 5
1.3 Zipf's and Herdan's Laws 8
1.4 Markov and Finite-State Processes 14
1.5 More General Stochastic Processes 20
1.6 Two Interpretations of Probability 23
1.7 Insights from Information Theory 25
1.8 Estimation of Entropy Rate 28
1.9 Entropy of Natural Language 30
1.10 Algorithmic Information Theory 35
1.11 Descriptions of a Random World 37
1.12 Facts and Words Related 43
1.13 Repetitions and Entropies 47
1.14 Decay of Correlations 52
1.15 Recapitulation 54
2 Probabilistic Preliminaries 57
2.1 Probability Measures 59
2.2 Product Measurable Spaces 63
2.3 Discrete Random Variables 65
2.4 From IID to Finite-State Processes 68
Problems 73
3 Probabilistic Toolbox 77
3.1 Borel ??-Fields and a Fair Coin 79
3.2 Integral and Expectation 83
3.3 Inequalities and Corollaries 87
3.4 Semidistributions 92
3.5 Conditional Probability 94
3.6 Modes of Convergence 101
3.7 Complete Spaces 103
Problems 106
4 Ergodic Properties 109
4.1 Plain Relative Frequency 111
4.2 Birkhoff Ergodic Theorem 116
4.3 Ergodic and Mixing Criteria 119
4.4 Ergodic Decomposition 125
Problems 128
5 Entropy and Information 131
5.1 Shannon Measures for Partitions 133
5.2 Block Entropy and Its Limits 139
5.3 Shannon Measures for Fields 145
5.4 Block Entropy Limits Revisited 155
5.5 Convergence of Entropy 159
5.6 Entropy as Self-Information 160
Problems 163
6 Equipartition and Universality 167
6.1 SMB Theorem 169
6.2 Universal Semidistributions 171
6.3 PPM Probability 172
6.4 SMB Theorem Revisited 178
6.5 PPM-based Statistics 180
Problems 186
7 Coding and Computation 189
7.1 Elements of Coding 191
7.2 Kolmogorov Complexity 197
7.3 Algorithmic Coding Theorems 207
7.4 Limits of Mathematics 215
7.5 Algorithmic Randomness 220
Problems 225
8 Power Laws for Information 229
8.1 Hilberg Exponents 231
8.2 Second Order SMB Theorem 238
8.3 Probabilistic and Algorithmic Facts 241
8.4 Theorems About Facts and Words 248
Problems 255
9 Power Laws for Repetitions 259
9.1 Renyi-Arimoto Entropies 261
9.2 Generalized Entropy Rates 266
9.3 Recurrence Times 268
9.4 Subword Complexity 272
9.5 Two Maximal Lengths 280
9.6 Logarithmic Power Laws 284
Problems 289
10 AMS Processes 291
10.1 AMS and Pseudo AMS Measures 293
10.2 Quasiperiodic Coding 295
10.3 Synchronizable Coding 298
10.4 Entropy Rate in the AMS Case 301
Problems 304
11 Toy Examples 307
11.1 Finite and Ultrafinite Energy 309
11.2 Santa Fe Processes and Alike 315
11.3 Encoding into a Finite Alphabet 323
11.4 Random Hierarchical Association 334
11.5 Toward Better Models 345
Problems 348
Future Research 349
Bibliography 351
Index 365
Acknowledgments xiii
Basic Notations xv
1 Guiding Ideas 1
1.1 The Motivating Question 1
1.2 Further Questions About Texts 5
1.3 Zipf's and Herdan's Laws 8
1.4 Markov and Finite-State Processes 14
1.5 More General Stochastic Processes 20
1.6 Two Interpretations of Probability 23
1.7 Insights from Information Theory 25
1.8 Estimation of Entropy Rate 28
1.9 Entropy of Natural Language 30
1.10 Algorithmic Information Theory 35
1.11 Descriptions of a Random World 37
1.12 Facts and Words Related 43
1.13 Repetitions and Entropies 47
1.14 Decay of Correlations 52
1.15 Recapitulation 54
2 Probabilistic Preliminaries 57
2.1 Probability Measures 59
2.2 Product Measurable Spaces 63
2.3 Discrete Random Variables 65
2.4 From IID to Finite-State Processes 68
Problems 73
3 Probabilistic Toolbox 77
3.1 Borel ??-Fields and a Fair Coin 79
3.2 Integral and Expectation 83
3.3 Inequalities and Corollaries 87
3.4 Semidistributions 92
3.5 Conditional Probability 94
3.6 Modes of Convergence 101
3.7 Complete Spaces 103
Problems 106
4 Ergodic Properties 109
4.1 Plain Relative Frequency 111
4.2 Birkhoff Ergodic Theorem 116
4.3 Ergodic and Mixing Criteria 119
4.4 Ergodic Decomposition 125
Problems 128
5 Entropy and Information 131
5.1 Shannon Measures for Partitions 133
5.2 Block Entropy and Its Limits 139
5.3 Shannon Measures for Fields 145
5.4 Block Entropy Limits Revisited 155
5.5 Convergence of Entropy 159
5.6 Entropy as Self-Information 160
Problems 163
6 Equipartition and Universality 167
6.1 SMB Theorem 169
6.2 Universal Semidistributions 171
6.3 PPM Probability 172
6.4 SMB Theorem Revisited 178
6.5 PPM-based Statistics 180
Problems 186
7 Coding and Computation 189
7.1 Elements of Coding 191
7.2 Kolmogorov Complexity 197
7.3 Algorithmic Coding Theorems 207
7.4 Limits of Mathematics 215
7.5 Algorithmic Randomness 220
Problems 225
8 Power Laws for Information 229
8.1 Hilberg Exponents 231
8.2 Second Order SMB Theorem 238
8.3 Probabilistic and Algorithmic Facts 241
8.4 Theorems About Facts and Words 248
Problems 255
9 Power Laws for Repetitions 259
9.1 Renyi-Arimoto Entropies 261
9.2 Generalized Entropy Rates 266
9.3 Recurrence Times 268
9.4 Subword Complexity 272
9.5 Two Maximal Lengths 280
9.6 Logarithmic Power Laws 284
Problems 289
10 AMS Processes 291
10.1 AMS and Pseudo AMS Measures 293
10.2 Quasiperiodic Coding 295
10.3 Synchronizable Coding 298
10.4 Entropy Rate in the AMS Case 301
Problems 304
11 Toy Examples 307
11.1 Finite and Ultrafinite Energy 309
11.2 Santa Fe Processes and Alike 315
11.3 Encoding into a Finite Alphabet 323
11.4 Random Hierarchical Association 334
11.5 Toward Better Models 345
Problems 348
Future Research 349
Bibliography 351
Index 365
Este título pertence ao(s) assunto(s) indicados(s). Para ver outros títulos clique no assunto desejado.
<p>statistical language models; power laws; Zipf's law; Herdan's law;mutual information; maximal repetition; decay of correlations; stochastic processes; non-Markovian processes; nonergodic processes; stationary processes; asymptotically mean stationary processes; ergodic theorems; information measures; universal coding; Kolmogorov complexity; Renyi entropy; recurrence time; subword complexity</p>
Preface ix
Acknowledgments xiii
Basic Notations xv
1 Guiding Ideas 1
1.1 The Motivating Question 1
1.2 Further Questions About Texts 5
1.3 Zipf's and Herdan's Laws 8
1.4 Markov and Finite-State Processes 14
1.5 More General Stochastic Processes 20
1.6 Two Interpretations of Probability 23
1.7 Insights from Information Theory 25
1.8 Estimation of Entropy Rate 28
1.9 Entropy of Natural Language 30
1.10 Algorithmic Information Theory 35
1.11 Descriptions of a Random World 37
1.12 Facts and Words Related 43
1.13 Repetitions and Entropies 47
1.14 Decay of Correlations 52
1.15 Recapitulation 54
2 Probabilistic Preliminaries 57
2.1 Probability Measures 59
2.2 Product Measurable Spaces 63
2.3 Discrete Random Variables 65
2.4 From IID to Finite-State Processes 68
Problems 73
3 Probabilistic Toolbox 77
3.1 Borel ??-Fields and a Fair Coin 79
3.2 Integral and Expectation 83
3.3 Inequalities and Corollaries 87
3.4 Semidistributions 92
3.5 Conditional Probability 94
3.6 Modes of Convergence 101
3.7 Complete Spaces 103
Problems 106
4 Ergodic Properties 109
4.1 Plain Relative Frequency 111
4.2 Birkhoff Ergodic Theorem 116
4.3 Ergodic and Mixing Criteria 119
4.4 Ergodic Decomposition 125
Problems 128
5 Entropy and Information 131
5.1 Shannon Measures for Partitions 133
5.2 Block Entropy and Its Limits 139
5.3 Shannon Measures for Fields 145
5.4 Block Entropy Limits Revisited 155
5.5 Convergence of Entropy 159
5.6 Entropy as Self-Information 160
Problems 163
6 Equipartition and Universality 167
6.1 SMB Theorem 169
6.2 Universal Semidistributions 171
6.3 PPM Probability 172
6.4 SMB Theorem Revisited 178
6.5 PPM-based Statistics 180
Problems 186
7 Coding and Computation 189
7.1 Elements of Coding 191
7.2 Kolmogorov Complexity 197
7.3 Algorithmic Coding Theorems 207
7.4 Limits of Mathematics 215
7.5 Algorithmic Randomness 220
Problems 225
8 Power Laws for Information 229
8.1 Hilberg Exponents 231
8.2 Second Order SMB Theorem 238
8.3 Probabilistic and Algorithmic Facts 241
8.4 Theorems About Facts and Words 248
Problems 255
9 Power Laws for Repetitions 259
9.1 Renyi-Arimoto Entropies 261
9.2 Generalized Entropy Rates 266
9.3 Recurrence Times 268
9.4 Subword Complexity 272
9.5 Two Maximal Lengths 280
9.6 Logarithmic Power Laws 284
Problems 289
10 AMS Processes 291
10.1 AMS and Pseudo AMS Measures 293
10.2 Quasiperiodic Coding 295
10.3 Synchronizable Coding 298
10.4 Entropy Rate in the AMS Case 301
Problems 304
11 Toy Examples 307
11.1 Finite and Ultrafinite Energy 309
11.2 Santa Fe Processes and Alike 315
11.3 Encoding into a Finite Alphabet 323
11.4 Random Hierarchical Association 334
11.5 Toward Better Models 345
Problems 348
Future Research 349
Bibliography 351
Index 365
Acknowledgments xiii
Basic Notations xv
1 Guiding Ideas 1
1.1 The Motivating Question 1
1.2 Further Questions About Texts 5
1.3 Zipf's and Herdan's Laws 8
1.4 Markov and Finite-State Processes 14
1.5 More General Stochastic Processes 20
1.6 Two Interpretations of Probability 23
1.7 Insights from Information Theory 25
1.8 Estimation of Entropy Rate 28
1.9 Entropy of Natural Language 30
1.10 Algorithmic Information Theory 35
1.11 Descriptions of a Random World 37
1.12 Facts and Words Related 43
1.13 Repetitions and Entropies 47
1.14 Decay of Correlations 52
1.15 Recapitulation 54
2 Probabilistic Preliminaries 57
2.1 Probability Measures 59
2.2 Product Measurable Spaces 63
2.3 Discrete Random Variables 65
2.4 From IID to Finite-State Processes 68
Problems 73
3 Probabilistic Toolbox 77
3.1 Borel ??-Fields and a Fair Coin 79
3.2 Integral and Expectation 83
3.3 Inequalities and Corollaries 87
3.4 Semidistributions 92
3.5 Conditional Probability 94
3.6 Modes of Convergence 101
3.7 Complete Spaces 103
Problems 106
4 Ergodic Properties 109
4.1 Plain Relative Frequency 111
4.2 Birkhoff Ergodic Theorem 116
4.3 Ergodic and Mixing Criteria 119
4.4 Ergodic Decomposition 125
Problems 128
5 Entropy and Information 131
5.1 Shannon Measures for Partitions 133
5.2 Block Entropy and Its Limits 139
5.3 Shannon Measures for Fields 145
5.4 Block Entropy Limits Revisited 155
5.5 Convergence of Entropy 159
5.6 Entropy as Self-Information 160
Problems 163
6 Equipartition and Universality 167
6.1 SMB Theorem 169
6.2 Universal Semidistributions 171
6.3 PPM Probability 172
6.4 SMB Theorem Revisited 178
6.5 PPM-based Statistics 180
Problems 186
7 Coding and Computation 189
7.1 Elements of Coding 191
7.2 Kolmogorov Complexity 197
7.3 Algorithmic Coding Theorems 207
7.4 Limits of Mathematics 215
7.5 Algorithmic Randomness 220
Problems 225
8 Power Laws for Information 229
8.1 Hilberg Exponents 231
8.2 Second Order SMB Theorem 238
8.3 Probabilistic and Algorithmic Facts 241
8.4 Theorems About Facts and Words 248
Problems 255
9 Power Laws for Repetitions 259
9.1 Renyi-Arimoto Entropies 261
9.2 Generalized Entropy Rates 266
9.3 Recurrence Times 268
9.4 Subword Complexity 272
9.5 Two Maximal Lengths 280
9.6 Logarithmic Power Laws 284
Problems 289
10 AMS Processes 291
10.1 AMS and Pseudo AMS Measures 293
10.2 Quasiperiodic Coding 295
10.3 Synchronizable Coding 298
10.4 Entropy Rate in the AMS Case 301
Problems 304
11 Toy Examples 307
11.1 Finite and Ultrafinite Energy 309
11.2 Santa Fe Processes and Alike 315
11.3 Encoding into a Finite Alphabet 323
11.4 Random Hierarchical Association 334
11.5 Toward Better Models 345
Problems 348
Future Research 349
Bibliography 351
Index 365
Este título pertence ao(s) assunto(s) indicados(s). Para ver outros títulos clique no assunto desejado.
<p>statistical language models; power laws; Zipf's law; Herdan's law;mutual information; maximal repetition; decay of correlations; stochastic processes; non-Markovian processes; nonergodic processes; stationary processes; asymptotically mean stationary processes; ergodic theorems; information measures; universal coding; Kolmogorov complexity; Renyi entropy; recurrence time; subword complexity</p>