.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/workflow_lisn_lda_kmeans.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end <sphx_glr_download_auto_examples_workflow_lisn_lda_kmeans.py>` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_workflow_lisn_lda_kmeans.py: Extracting and processing LISN data for Cartolabe (LDA projection) ======================================================================= In this example we will: - extract entities (authors, articles, labs, words) from a collection of scientific articles - project those entities in 2 dimensions - cluster them - find their nearest neighbors. .. GENERATED FROM PYTHON SOURCE LINES 16-20 Download data ============= We will first download the CSV file that contains all articles from HAL (https://hal.archives-ouvertes.fr/) published by authors from LISN (Laboratoire Interdisciplinaire des Sciences du Numérique) between 2000-2022. .. GENERATED FROM PYTHON SOURCE LINES 20-28 .. code-block:: Python from download import download csv_url = "https://zenodo.org/record/7323538/files/lisn_2000_2022.csv" download(csv_url, "../datas/lisn_2000_2022.csv", kind='file', progressbar=True, replace=False) .. rst-class:: sphx-glr-script-out .. code-block:: none Downloading data from https://zenodo.org/records/7323538/files/lisn_2000_2022.csv (6.3 MB) file_sizes: 0%| | 0.00/6.59M [00:00<?, ?B/s] file_sizes: 39%|██████████▏ | 2.59M/6.59M [00:00<00:00, 25.3MB/s] file_sizes: 100%|██████████████████████████| 6.59M/6.59M [00:00<00:00, 44.2MB/s] Successfully downloaded file to ../datas/lisn_2000_2022.csv '../datas/lisn_2000_2022.csv' .. GENERATED FROM PYTHON SOURCE LINES 29-31 Load data to dataframe =================== .. GENERATED FROM PYTHON SOURCE LINES 31-38 .. code-block:: Python import pandas as pd # noqa df = pd.read_csv('../datas/lisn_2000_2022.csv', index_col=0) df.head() .. raw:: html <div class="output_subarea output_html rendered_html output_result"> <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>structId_i</th> <th>authFullName_s</th> <th>en_abstract_s</th> <th>en_keyword_s</th> <th>en_title_s</th> <th>structAcronym_s</th> <th>producedDateY_i</th> <th>producedDateM_i</th> <th>halId_s</th> <th>docid</th> <th>en_domainAllCodeLabel_fs</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>[2544, 92966, 411575, 441569]</td> <td>Frédéric Blanqui</td> <td>In the last twenty years, several approaches t...</td> <td>Higher-order rewriting,Termination,Confluence</td> <td>Termination and Confluence of Higher-Order Rew...</td> <td>LRI,UP11,CNRS,LISN</td> <td>2000</td> <td>7.0</td> <td>inria-00105556</td> <td>105556</td> <td>Logic in Computer Science,Computer Science</td> </tr> <tr> <th>1</th> <td>[2544, 92966, 411575, 441569]</td> <td>Sébastien Tixeuil</td> <td>When a distributed system is subject to transi...</td> <td>Self-stabilization,Distributed Systems,Distrib...</td> <td>Efficient Self-stabilization</td> <td>LRI,UP11,CNRS,LISN</td> <td>2000</td> <td>1.0</td> <td>tel-00124843</td> <td>124843</td> <td>Networking and Internet Architecture,Computer ...</td> </tr> <tr> <th>2</th> <td>[1167, 300340, 301492, 564132, 441569, 2544, 9...</td> <td>Michèle Sebag,Céline Rouveirol</td> <td>One of the obstacles to widely using first-ord...</td> <td>Bounded reasoning,First order logic,Inductive ...</td> <td>Resource-bounded relational reasoning: inducti...</td> <td>LMS,X,PSL,CNRS,LRI,UP11,CNRS,LISN</td> <td>2000</td> <td>NaN</td> <td>hal-00111312</td> <td>2263842</td> <td>Mechanics,Engineering Sciences,physics</td> </tr> <tr> <th>3</th> <td>[994, 15786, 301340, 303171, 441569, 34499, 81...</td> <td>Philippe Balbiani,Jean-François Condotta,Gérar...</td> <td>This paper organizes the topologic forms of th...</td> <td>Temporal reasoning,Constraint handling,Computa...</td> <td>Reasoning about generalized intervals : Horn r...</td> <td>LIPN,UP13,USPC,CNRS,IRIT,UT1,UT2J,UT3,CNRS,Tou...</td> <td>2000</td> <td>NaN</td> <td>hal-03300321</td> <td>3300321</td> <td>Artificial Intelligence,Computer Science</td> </tr> <tr> <th>4</th> <td>[1315, 25027, 59704, 564132, 300009, 441569, 4...</td> <td>Roberto Di Cosmo,Delia Kesner,Emmanuel Polonovski</td> <td>We refine the simulation technique introduced ...</td> <td>Linear logic,Proof nets,Lambda-calculus,Explic...</td> <td>Proof Nets and Explicit Substitutions</td> <td>LIENS,DI-ENS,ENS-PSL,PSL,Inria,CNRS,CNRS,LRI,U...</td> <td>2000</td> <td>NaN</td> <td>hal-00384955</td> <td>384955</td> <td>Logic in Computer Science,Computer Science</td> </tr> </tbody> </table> </div> </div> <br /> <br /> .. GENERATED FROM PYTHON SOURCE LINES 39-40 The dataframe that we just read consists of 4262 articles as rows. .. GENERATED FROM PYTHON SOURCE LINES 40-43 .. code-block:: Python print(df.shape[0]) .. rst-class:: sphx-glr-script-out .. code-block:: none 4262 .. GENERATED FROM PYTHON SOURCE LINES 44-45 And their authors, abstract, keywords, title, research labs and domain as columns. .. GENERATED FROM PYTHON SOURCE LINES 45-48 .. code-block:: Python print(*df.columns, sep="\n") .. rst-class:: sphx-glr-script-out .. code-block:: none structId_i authFullName_s en_abstract_s en_keyword_s en_title_s structAcronym_s producedDateY_i producedDateM_i halId_s docid en_domainAllCodeLabel_fs .. GENERATED FROM PYTHON SOURCE LINES 49-62 Creating correspondance matrices for each entity type ============================================================== From this table of articles, we want to extract matrices that will map the correspondance between these articles and the entities we want to use. Authors ------------- Let's start with the authors for example. We want to create a matrix where the rows represent the articles and the columns represent the authors. Each cell (n, m) will have a 1 in it if the *nth* article was written by the *mth* author. .. GENERATED FROM PYTHON SOURCE LINES 62-67 .. code-block:: Python from cartodata.loading import load_comma_separated_column # noqa authors_mat, authors_scores = load_comma_separated_column(df, 'authFullName_s') .. GENERATED FROM PYTHON SOURCE LINES 68-77 The `load_comma_separated_column` function takes in a dataframe and the name of a column and returns two objects: - a sparse matrix - a pandas `Series` Each column of the sparce matrix `authors_mat`, corresponds to an author and each row corresponds to an article. We see that there are 7348 distict authors for 4262 articles. .. GENERATED FROM PYTHON SOURCE LINES 77-80 .. code-block:: Python authors_mat.shape .. rst-class:: sphx-glr-script-out .. code-block:: none (4262, 7348) .. GENERATED FROM PYTHON SOURCE LINES 81-85 The series, which we named `authors_scores`, contains the list of authors extracted from the column `authFullName_s` with a score that is equal to the number of rows (articles) that this value was mapped within the `authors_mat` matrix. .. GENERATED FROM PYTHON SOURCE LINES 85-88 .. code-block:: Python authors_scores.head() .. rst-class:: sphx-glr-script-out .. code-block:: none Frédéric Blanqui 4 Sébastien Tixeuil 47 Michèle Sebag 137 Céline Rouveirol 2 Philippe Balbiani 2 dtype: int64 .. GENERATED FROM PYTHON SOURCE LINES 89-92 If we look at the *2nd* column of the matrix, which corresponds to the author **Sébastien Tixeuil**, we can see that it has 47 non-zero rows, each row indicating which articles he authored. .. GENERATED FROM PYTHON SOURCE LINES 92-95 .. code-block:: Python print(authors_mat[:, 1]) .. rst-class:: sphx-glr-script-out .. code-block:: none (1, 0) 1 (7, 0) 1 (22, 0) 1 (60, 0) 1 (128, 0) 1 (136, 0) 1 (150, 0) 1 (179, 0) 1 (205, 0) 1 (212, 0) 1 (233, 0) 1 (238, 0) 1 (241, 0) 1 (246, 0) 1 (262, 0) 1 (282, 0) 1 (294, 0) 1 (356, 0) 1 (358, 0) 1 (359, 0) 1 (363, 0) 1 (371, 0) 1 (372, 0) 1 (409, 0) 1 (498, 0) 1 (501, 0) 1 (536, 0) 1 (541, 0) 1 (542, 0) 1 (878, 0) 1 (893, 0) 1 (1600, 0) 1 (1717, 0) 1 (2037, 0) 1 (2075, 0) 1 (2116, 0) 1 (2222, 0) 1 (2373, 0) 1 (2449, 0) 1 (2450, 0) 1 (2611, 0) 1 (2732, 0) 1 (2976, 0) 1 (2986, 0) 1 (3107, 0) 1 (3221, 0) 1 (3791, 0) 1 .. GENERATED FROM PYTHON SOURCE LINES 96-101 Labs -------- Similarly, we can create matrices for the labs by simply passing the `structAcronym_s` column to the function. .. GENERATED FROM PYTHON SOURCE LINES 101-107 .. code-block:: Python labs_mat, labs_scores = load_comma_separated_column(df, 'structAcronym_s', filter_acronyms=True) labs_scores.head() .. rst-class:: sphx-glr-script-out .. code-block:: none LRI 4789 UP11 6271 CNRS 10217 LISN 5203 LMS 1 dtype: int64 .. GENERATED FROM PYTHON SOURCE LINES 108-110 Checking the number of columns of the sparse matrix `labs_mat`, we see that there are 1818 distict labs. .. GENERATED FROM PYTHON SOURCE LINES 110-113 .. code-block:: Python labs_mat.shape[1] .. rst-class:: sphx-glr-script-out .. code-block:: none 1818 .. GENERATED FROM PYTHON SOURCE LINES 114-123 Filtering low score entities ---------------------------------------- A lot of the authors and labs that we just extracted from the dataframe have a very low score, which means they're only linked to one or two articles. To improve the quality of our data, we'll filter the authors and labs by removing those that appear less than 4 times. To do this, we'll use the `filter_min_score` function. .. GENERATED FROM PYTHON SOURCE LINES 123-144 .. code-block:: Python from cartodata.operations import filter_min_score # noqa authors_before = len(authors_scores) labs_before = len(labs_scores) authors_mat, authors_scores = filter_min_score(authors_mat, authors_scores, 4) labs_mat, labs_scores = filter_min_score(labs_mat, labs_scores, 4) print(f"Removed {authors_before - len(authors_scores)} authors with less " f"than 4 articles from a total of {authors_before} authors.") print(f"Working with {len(authors_scores)} authors.\n") print(f"Removed {labs_before - len(labs_scores)} labs with less than " f"4 articles from a total of {labs_before}.") print(f"Working with {len(labs_scores)} labs.") .. rst-class:: sphx-glr-script-out .. code-block:: none Removed 6654 authors with less than 4 articles from a total of 7348 authors. Working with 694 authors. Removed 1255 labs with less than 4 articles from a total of 1818. Working with 563 labs. .. GENERATED FROM PYTHON SOURCE LINES 145-153 Words ---------- For the words, it's a bit trickier because we want to extract n-grams (groups of n terms) instead of just comma separated values. We'll call the `load_text_column` which uses scikit-learn's `CountVectorizer <https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html>`_ to create a vocabulary and map the tokens. .. GENERATED FROM PYTHON SOURCE LINES 153-172 .. code-block:: Python from cartodata.loading import load_text_column # noqa from sklearn.feature_extraction import text as sktxt # noqa with open('../datas/stopwords.txt', 'r') as stop_file: stopwords = sktxt.ENGLISH_STOP_WORDS.union( set(stop_file.read().splitlines())) df['text'] = df['en_abstract_s'] + ' ' \ + df['en_keyword_s'].astype(str) + ' ' \ + df['en_title_s'].astype(str) + ' ' \ + df['en_domainAllCodeLabel_fs'].astype(str) words_mat, words_scores = load_text_column(df['text'], 4, 10, 0.05, stopwords=stopwords) .. GENERATED FROM PYTHON SOURCE LINES 173-175 Here `words_scores` contains a list of all the n-grams extracted from the documents with their score, .. GENERATED FROM PYTHON SOURCE LINES 175-178 .. code-block:: Python words_scores.head() .. rst-class:: sphx-glr-script-out .. code-block:: none abilities 21 ability 164 absence 53 absolute 19 abstract 174 dtype: int64 .. GENERATED FROM PYTHON SOURCE LINES 179-181 and the `words_mat` matrix counts the occurences of each of the 4282 n-grams for all the articles. .. GENERATED FROM PYTHON SOURCE LINES 181-184 .. code-block:: Python words_mat.shape .. rst-class:: sphx-glr-script-out .. code-block:: none (4262, 4682) .. GENERATED FROM PYTHON SOURCE LINES 185-192 To get a better representation of the importance of each term, we'll also apply a TF-IDF (term-frequency times inverse document-frequency) normalization on the matrix. The `normalize_tfidf` simply calls scikit-learn's `TfidfTransformer <https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfTransformer.html#sklearn.feature_extraction.text.TfidfTransformer>`_ class. .. GENERATED FROM PYTHON SOURCE LINES 192-197 .. code-block:: Python from cartodata.operations import normalize_tfidf # noqa words_mat = normalize_tfidf(words_mat) .. GENERATED FROM PYTHON SOURCE LINES 198-202 Articles ------------- Finally, we need to create a matrix that simply maps each article to itself. .. GENERATED FROM PYTHON SOURCE LINES 202-208 .. code-block:: Python from cartodata.loading import load_identity_column # noqa articles_mat, articles_scores = load_identity_column(df, 'en_title_s') articles_scores.head() .. rst-class:: sphx-glr-script-out .. code-block:: none Termination and Confluence of Higher-Order Rewrite Systems 1.0 Efficient Self-stabilization 1.0 Resource-bounded relational reasoning: induction and deduction through stochastic matching 1.0 Reasoning about generalized intervals : Horn representability and tractability 1.0 Proof Nets and Explicit Substitutions 1.0 dtype: float64 .. GENERATED FROM PYTHON SOURCE LINES 209-231 Dimension reduction =================== One way to see the matrices that we created is as coordinates in the space of all articles. What we want to do is to reduce the dimension of this space to make it easier to work with and see. LDA projection ---------------------- We use LDA (Latent Dirichlet Allocation) technique to identify keywords in our data and thus reduce the number of rows in our matrices. The `lda_projection` method takes three arguments: - the number of dimensions you want to keep - the id of the documents/words matrix in the 3rd parameter list - a list of matrices to project It returns a list of the same length containing the matrices projected in the latent space. We also apply an l2 normalization to each feature of the projected matrices. .. GENERATED FROM PYTHON SOURCE LINES 231-240 .. code-block:: Python from cartodata.projection import lda_projection # noqa from cartodata.operations import normalize_l2 # noqa lda_matrices = lda_projection(50, 2, [articles_mat, authors_mat, words_mat, labs_mat]) lda_matrices = list(map(normalize_l2, lda_matrices)) .. GENERATED FROM PYTHON SOURCE LINES 241-243 We've reduced the number of rows in each of `articles_mat`, `authors_mat`, `words_mat` and `labs_mat` to just 50. .. GENERATED FROM PYTHON SOURCE LINES 243-249 .. code-block:: Python print(f"articles_mat: {lda_matrices[0].shape}") print(f"authors_mat: {lda_matrices[1].shape}") print(f"words_mat: {lda_matrices[2].shape}") print(f"labs_mat: {lda_matrices[3].shape}") .. rst-class:: sphx-glr-script-out .. code-block:: none articles_mat: (50, 4262) authors_mat: (50, 694) words_mat: (50, 4682) labs_mat: (50, 563) .. GENERATED FROM PYTHON SOURCE LINES 250-261 This makes it easier to work with them for clustering or nearest neighbors tasks, but we also want to project them on a 2D space to be able to map them. UMAP projection --------------------------- The `UMAP <https://github.com/lmcinnes/umap>`_ (Uniform Manifold Approximation and Projection) is a dimension reduction technique that can be used for visualisation similarly to t-SNE. We use this algorithm to project our matrices in 2 dimensions. .. GENERATED FROM PYTHON SOURCE LINES 261-266 .. code-block:: Python from cartodata.projection import umap_projection # noqa umap_matrices = umap_projection(lda_matrices) .. GENERATED FROM PYTHON SOURCE LINES 267-269 Now that we have 2D coordinates for our points, we can try to plot them to get a feel of the data's shape. .. GENERATED FROM PYTHON SOURCE LINES 269-301 .. code-block:: Python import matplotlib.pyplot as plt # noqa import numpy as np # noqa import seaborn as sns # noqa # %matplotlib inline sns.set(style='white', rc={'figure.figsize': (12, 8)}) labels = ('article', "auth", "words", "labs") colors = ['g', 'r', 'b', 'y'] markers = ['o', 's', '+', 'x'] def plot(matrices): plt.close('all') fig, ax = plt.subplots() axes = [] for i, m in enumerate(matrices): axes.append(ax.scatter(m[0, :], m[1, :], color=colors[i], marker=markers[i], label = labels[i])) leg = ax.legend((axes[0], axes[1], axes[2], axes[3]), labels, fancybox=True, shadow=True) return fig, ax fig, ax = plot(umap_matrices) .. image-sg:: /auto_examples/images/sphx_glr_workflow_lisn_lda_kmeans_001.png :alt: workflow lisn lda kmeans :srcset: /auto_examples/images/sphx_glr_workflow_lisn_lda_kmeans_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 302-313 On the plot above, articles are shown in green, authors in red, words in blue and labs in yellow. Because we don't have labels for the points, it doesn't make much sense as is. But we can see that the data shows some clusters which we could try to identify. Clustering ========== In order to identify clusters, we use the KMeans clustering technique on the articles. We'll also try to label these clusters by selecting the most frequent words that appear in each cluster's articles. .. GENERATED FROM PYTHON SOURCE LINES 313-339 .. code-block:: Python from cartodata.clustering import create_kmeans_clusters # noqa cluster_labels = [] c_lda, c_umap, c_scores, c_knn, _, _, _ = create_kmeans_clusters(8, # number of clusters to create # 2D matrix of articles umap_matrices[0], # the 2D matrix of words umap_matrices[2], # the articles to words matrix words_mat, # word scores words_scores, # a list of initial cluster labels cluster_labels, # LDA space matrix of words lda_matrices[2]) c_scores "" fig, ax = plot(umap_matrices) for i in range(8): ax.annotate(c_scores.index[i], (c_umap[0, i], c_umap[1, i]), color='red') .. image-sg:: /auto_examples/images/sphx_glr_workflow_lisn_lda_kmeans_002.png :alt: workflow lisn lda kmeans :srcset: /auto_examples/images/sphx_glr_workflow_lisn_lda_kmeans_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 340-345 The 8 clusters that we created give us a general idea of what the big clusters of data contain. But we'll probably want a finer level of detail if we start to zoom in and focus on smaller areas. So we'll also create a second bigger group of clusters. To do this, simply increase the number of clusters we want. .. GENERATED FROM PYTHON SOURCE LINES 345-355 .. code-block:: Python mc_lda, mc_umap, mc_scores, mc_knn, _, _, _ = create_kmeans_clusters(32, umap_matrices[0], umap_matrices[2], words_mat, words_scores, cluster_labels, lda_matrices[2]) mc_scores .. rst-class:: sphx-glr-script-out .. code-block:: none fault tolerant, fault tolerance 72 discrete event systems, touch 262 approximate bayesian, adjacency 88 evolutionary robotics, deductive program verification 157 natural language processing, reinforcement learning 153 means, architectures 101 verification, floating point 152 belief propagation, number searchers 76 black optimization, distributed algorithm 86 documents, ontologies 251 internet architecture, wireless networks 177 genomes, materialized views 161 adaptation, displays 202 cognitive, visualization techniques 130 compiler, automata 226 tangible, modulo 174 mobile robots, lower bounds 100 computational complexity, population protocols 82 numerical simulations, fluid mechanics 113 large scale, sequences 106 neural evolutionary computing, large hadron collider 80 analytics, social networks 196 social sciences, internet 180 cloud radio access networks, cloud radio access network 24 metabolic, ontology alignment 92 regulatory network, gesture 49 secondary structures, cloud computing 196 challenge, matter 182 monte carlo search, black 110 interfaces, supported 131 molecular biology, protein protein 63 query, semantics 90 dtype: int64 .. GENERATED FROM PYTHON SOURCE LINES 356-373 Nearest neighbors ---------------------------- One more thing which could be useful to appreciate the quality of our data would be to get each point's nearest neighbors. If our data processing is done correctly, we expect the related articles, labs, words and authors to be located close to each other. Finding nearest neighbors is a common task with various algorithms aiming to solve it. The `get_neighbors` method uses one of these algorithms to find the nearest points of each type. It takes an optional weight parameter to tweak the distance calculation to select points that have a higher score but are maybe a bit farther instead of just selecting the closest neighbors. Because we want to find the neighbors of each type (articles, authors, words, labs) for all of the entities, we call the `get_neighbors` method in a loop and store its results in an array. .. GENERATED FROM PYTHON SOURCE LINES 373-386 .. code-block:: Python from cartodata.neighbors import get_neighbors # noqa scores = [articles_scores, authors_scores, words_scores, labs_scores] weights = [0, 0.5, 0.5, 0] all_neighbors = [] for idx in range(len(lda_matrices)): all_neighbors.append(get_neighbors(lda_matrices[idx], scores[idx], lda_matrices, weights[idx])) .. GENERATED FROM PYTHON SOURCE LINES 387-391 Exporting ----------------- We now have sufficient data to create a meaningfull visualization. .. GENERATED FROM PYTHON SOURCE LINES 391-416 .. code-block:: Python from cartodata.operations import export_to_json # noqa natures = ['articles', 'authors', 'words', 'labs', 'hl_clusters', 'ml_clusters' ] export_file = '../datas/lisn_workflow_lda.json' # add the clusters to list of 2d matrices and scores matrices = list(umap_matrices) matrices.extend([c_umap, mc_umap]) scores.extend([c_scores, mc_scores]) # Create a json export file with all the infos export_to_json(natures, matrices, scores, export_file, neighbors_natures=natures[:4], neighbors=all_neighbors) .. GENERATED FROM PYTHON SOURCE LINES 417-420 This creates the `lisn_workflow_lda.json` file which contains a list of points ready to be imported into Cartolabe. Have a look at it to check that it contains everything. .. GENERATED FROM PYTHON SOURCE LINES 420-427 .. code-block:: Python import json # noqa with open(export_file, 'r') as f: data = json.load(f) data[1]['position'] .. rst-class:: sphx-glr-script-out .. code-block:: none [2.274456024169922, 9.282912254333496] .. rst-class:: sphx-glr-timing **Total running time of the script:** (2 minutes 0.487 seconds) .. _sphx_glr_download_auto_examples_workflow_lisn_lda_kmeans.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: workflow_lisn_lda_kmeans.ipynb <workflow_lisn_lda_kmeans.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: workflow_lisn_lda_kmeans.py <workflow_lisn_lda_kmeans.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: workflow_lisn_lda_kmeans.zip <workflow_lisn_lda_kmeans.zip>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_