Tr : 28 March 2022 (Scholarcy)

Chat Log: https://futuretextlab.info/2022/03/28/chat-28-march-2022-scholarcy/

TRANSCRIPT:

Phil Gooch: Great, thank you very much, Frode. Thanks for the introduction. My name is Phil Gooch, I founded a company called Scholarcy about three years ago. But my interest in interactive text goes a bit further than that, mainly through the field of natural language processing, which is what I did my PhD in. And really, I was trying to solve this problem that I had when I was doing my PhD, was that, discovering new materials to read wasn’t a problem.  I discovered lots of papers, lots of resources, I downloaded folders full of PDFs, as I’m sure you all have sitting on your hard drives at home, and in Google Drive, and in the cloud elsewhere. So I had all these documents I knew I needed to read, and I wanted to try to find a way of speeding up that process. At the time, not necessarily connecting them together or visualizing them, but just really pulling out the key information and just bringing that to the forefront. So I started building some software that could try and do this. And what emerged was something called Scholarcy Library, which I will show you now. So Scholarcy Library is like a document management system. You upload your documents, and they can be in any format. They can be PDFs. They can also be Word documents, they can be PowerPoint presentations, they can be web pages, they can be LaTeX documents, they can be pretty much any format. And what it does is, if we look at… This is being the original PDF for one of the papers that I’ve got in my system. So this is a typical PDF, in this case, it’s like the original author manuscript that’s been made available ahead of time, as the open-access version of this paper, if you like. And as you’re obviously familiar with PDFs that aren’t created in software such as Liquid, Author that Frode has built, most PDFs don’t give you any interactivity or anything at all. You can’t click on these citations or go anywhere from here, for example. So the first thing I wanted to solve was if I bring this into Scholarcy, what does this look like? The same paper. Well, the first thing it does, is try to pull out what is the main finding of the study and it brings that to the forefront. The other thing that it tries to do is take the full text and then make citations clickable. So you saw in the original text, the original PDF, that citations aren’t clickable. So my first goal was to make citations clickable. So I can go on to that citation and go straight to that paper, and then read that, or pull it into my system and link it together. So that was the first goal. And then, the second goal was, well, once I’ve broken this PDF down, can I do things like extract the figures and so on, and again, just make them first-class citizens so I can zoom in on them? And that was the other goal. And this all really then turn into a process to turn documents into structured data. And so, what I’ve built was this back-end API, which is freely available to anybody. It’s in the public domain. It’s not open-source code, but it’s an open API, so anyone can use it. If you go to api.scolicy.com there are a number of endpoints here, which look a little bit esoteric. So there is some documentation on this on GitHub. If you go to scholarsy.github.io.slate there’s a whole bunch of documentation on what this API does. But essentially, what it does is, you give it a document, such as a PDF, but it doesn’t have to be PDF, and you upload it, and it basically turns it into JSON. And as you know, once you’ve got JSON, or you’ve got XML, or any structured data, then you can pretty much do what you want with it. So what it’s done is turn that into JSON with all the information broken down into key-value pairs. So you’ve got a key for the references and you’ve got all the references there, you’ve got a key for the funding group that has been broken down. So once you’ve got structured data, then it’s quite easy to bring it into this nice interactive format like this, where everything is, kind of, hyperlinked and clickable. So we can go straight to the study subjects and so on and find out there were 16 people involved in the study. And then, we can deal with things like, what were the main contributions of the study, and we can just scroll down, click on one of those, and it takes us straight to that finding. So the idea was really, I suppose, not really to cheat, but to, basically, speed read this paper by highlighting the key findings and making them clickable, as we made the references clickable. So all this is in the JSON data that’s underlying all this. And then it makes it more interactive, basically. So that’s the goal. That’s all well and good, basically what we’ve done is turn PDF into an interactive HTML with clickable citations and expandable and collapsible sections. But obviously, you want to deal with more than one paper at a time. And so, what I looked at doing was building something that could turn this into linked data. Now, I didn’t want to build a new piece of software like Noda, Triplestore, or anything like that. And so I found that the lingua franca for a lot of new tools that try to connect stuff together is this format called Markdown. One of the tools I use for hosting Markdown, and you may be familiar with it, is called Obsidian. But there’s many tools like this there’s Roam Research, there’s Bear, there’s Logseq, there’s a whole bunch of tools, I’m sure you’re aware of, that handle Markdown data. So here is that same paper that was once a PDF, now in Scholarcy. But now we can export that to Markdown, and we can do this one at a time. Or if I really want to, I can export all of them in one go. So I’ve only got four here, but I could have 400. Export them all in one go as Markdown and then I can load those into Obsidian. Put that in my Obsidian library, and when I open that in Obsidian, it looks like this. So it’s the same data but now it’s in Markdown format, now it’s editable so if I want to, then, edit the Markdown in here I can go away and start doing that and visualize it. But now we’ve got the same information along with all the other papers that I had in my collection that were also converted to Markdown using Scholarcy. I can connect them together. So I’ve got some of the key concepts, if I click on one of those, then it shows me other papers in my collection, like this one that also talks about functional connectivity. I can go straight to that paper, and then, I can see the main finding of that study, and I can see other things that it talks about, like the medial prefrontal cortex, and I can see other papers that talk about that. As you can see we’ve got this network graph going on, but it’s, kind of, embedded in the text, I can read the papers I wouldn’t have before, and I can view all the figures and zoom in on them. But they’re in this Markdown format where you get all this linking for free, which is great. And in common with other tools that handle Markdown, you can do these visualizations, so it’s going to show me here, if I could click on this graph view, how these papers are connected, in this case, by citations that they have in common. But if I really want to see all the concepts that I have in common, then I can click the tags view, and then suddenly you’ve got all these green nodes here that show me where all these papers are and how they’re connected by their key concepts. And as you know, Mark and I were discussing, just before this session started, the issue that you’ve got too much information. This becomes a question of: 

What do we do with these kinds of visualizations?

And I’m sure many of you here will have suggestions and ideas about how to deal with this because once you’ve got more than 10 or 20 papers, these kinds of visualizations become a bit intractable, but Obsidian lets you do that kind of graph analysis, can write queries and so on. Which I haven’t really done much with, my background is not, at all, in visualization, it’s in NLP and converting documents from one format to another. That’s been my motivation in building this, is being able to turn documents from PDF into Markdown or other formats. And that’s what Scholarcy does. It’s a document conversion software, it’s a summarization software, it gives you the key highlights of the paper, its objectives, methods, and results. It pulls all that automatically from each document using NLP and deep learning and it makes it interactive. But it does also extract into these various different formats. And one other thing it tries to do is to show you how the paper relates to what’s gone on before. So when an author has talked about how their work sits within the wider field of research, Scholarcy tries to pull out that information and highlight it so it shows you where are some potential differences with previous work, and who’s talked about that, in terms of, counterpointing with what this author is saying, and again, we can click on that and go straight to that paper if we want to. Or which studies does it build on, for example. Does it build on a study by these guys? How is it different? So it’s pulling out all those citation contexts, and then classifying them in, is it confirming, is it contrasting, is it just building on it, how does it relate? It doesn’t always get it right, but most time it gets in the right ballpark and it just gives you that extra context. And again, just making that information interactive. I guess the next step is:

What more could be done with this data? 

At the moment, it’s in the very two-dimensional view, either in Scholarcy as one paper at a time, or in Obsidian via a network graph view. But what else could be done with this data? And maybe some of you have some suggestions about how that could be visualized, perhaps using virtual reality or some other means. But really, the motivation for this was to make it easier for me to read all the papers that I had to read for my PhD, make it into a friendlier format that I could, for example, read on my mobile phone. This tool called Obsidian has an iOS app and I can actually read this paper in a nice friendly format that will be responsive, including the tables, because Scholarcy also converts the tables in the PDFs to HTML. So I can get all that data out as well and read that on the go, which was the goal of doing this, really. So that’s Scholarcy in a nutshell. It exports to various other formats as well, so if I really wanted to export my paper, imagine this was my own paper and I want to export it to PowerPoint, I will turn that into a presentation. That is one thing I did do with the chapter of my PhD, was turn that into a presentation. I could just export this as a PowerPoint slide deck, and it will summarize this paper and distil it down into a series of slides. But that is the goal, really, is to be able to convert and switch between different formats without having to worry about whether did it start off as a PDF, or was it a Word document, or was it something else? Really, every document gets turned into this standardized format that we call our summary flashcard, where it’s got that same structure. I was hoping to show you the PowerPoint export that doesn’t seem to be playing ball today. So that’s basically, in a nutshell, I mean there’s a free demonstrator you can try out because we do have a number of free tools including the reference extraction component that Frode was alluding to earlier, that links all the references together, and that’s a freely available tool. And so is this. If you want to try this out with any document, you just upload a paper and it does exactly the same as what I showed you in the main document management tool, but it’s just one document at a time. You can load a paper and it breaks it down into this flashcard and then you can download that in Markdown if you want and then visualize that with all your other documents. So there’s a whole sequence of other tools, as well, that we have, but this is the main one. It’s what we call our Flashcard Generator. So, yeah. That’s it, really, in a nutshell. Let’s maybe bring it into a discussion and get some feedback, really, because it’d be good to know about how we could take the structured data and do something else with it other than put it into Obsidian, or Roam, or other Markdown aware tools. Maybe there are some more interesting things that could be done there. So, I’ll pause at this point.

Discussion- https://youtu.be/pdVHOoh-EL8?t=1768

Frode Hegland: It really is a nutshell, and it’s just amazing what you have done. And you presented like you’ve done a few Lego blocks, and that’s about it. It’s just a British understatement. But there’s one thing before we’re going to proper dialogue I’d really like to see more of. And that is the bit where you come across a citation in a document and you can click on it to find out, to put it crudely, its value or relevance. Would you mind showing that? Because when you do these big graphs, where to go next is always a huge question and I think this, navigationally, really helps.

Phil Gooch: Sure. So, let’s look at that same paper that I was looking at earlier. If we’ve got a citation, we can mouse over it and we can click and go to that paper. But in terms of what is the value of this citation, we’ve partnered with another start-up called site.ai, that you may be familiar with, and what we can do is show the statistics that sites have gathered on every citation. And I think we’ve got a huge database now, we’ve got about a billion citations. What this shows me is how many other people have, not only just cited this study but how many people have agreed with it. So this had about 1258 citations, but of those, 18 have been confirming the results of this study. What that means is if I click on that link there, it’s going to give me some more background. Here’s the paper by Stam that this author cited here. And we can see that it’s got 18 supporting citations, and three contrasting. Let’s see what that means. It basically means that they’re saying things like, “Our results agree with previous studies which include Stam, and so on.” Consistent with this guy, for example. Basically, 18 of these citations are ones they’re all saying, “Yeah, we found something similar.” But three of these, they found something different. So what do they say? Well, we can just click on that. So, previous studies, blah, blah, Stam. Looks a bit ambiguous to me, not sure if it’s definitely contrasting or not. This is the thing with machine learning, sometimes you don’t get it quite right, and it’s a bit borderline if it’s actually contrasting. But you can see that, again, the context in which the other people have cited this study, that these guys have also cited it. Were they positive about it? Or were they negative? And so, site.ai is a really cool tool for showing you this context about how everyone else has talked about this paper. So, for example, in this paper here, we could find out who else has spoken about it. Because these relationships go in two directions, we want to know what this paper is saying about other people’s studies, and what other people have said about those same studies. But also what other people say about this study itself that I’m reading. If I click on this link here, it should take me to what other people say about this van Lutterveld paper, and we can see that actually people are a bit neutral about it, there are 31 citations and they just mention it, but none of them are contrasting, and none of them are supporting. They’ve cited it, but they haven’t really said anything positive or negative about it. So site.ai is a really cool tool that just lets you explore those citations. And we link to it as a matter of course. So every citation in here should have a button where you can see those stats. And then the other thing we try and do, and this doesn’t always work is, say, “Well, rather than going, looking, and reading all these cited papers, can we just get the gist of them?” We have a little button here that will go and find each of those papers and it will just do a quick summary of what was done in that paper and then we can see. It’s like a subset of the abstract, effectively. What was this paper about? Is it something that I’m actually interested in going and reading more about? If I am then I can click on it and go and read it. So the idea is to bring all that information, from each of those studies, into one place, either with citation statistics from the site. Again, this looks like a reliable study, 13 people have supported it, so that looks good. But what did it say? And again, we can just click the findings button here and it will go and try to pull out what the study found. And there are some of the findings there. So that’s another aspect of what Scholarcy does, that citation linking and classification.

Frode Hegland: Question to everyone in the group before I do the hands up thing. how amazing is this? It is absolutely amazing, isn’t it? And also, the way that Phil works with other APIs from other services. The way these things can link together is just so incredibly amazing. And I don’t think most academics are aware of it. Because you’re the newest in this session, Ismail, please you go. And then, Peter.

Ismail Serageldin- https://youtu.be/pdVHOoh-EL8?t=2093

Ismail Serageldin: Thank you. You probably are very familiar with the work of David King and a few others at Oklahoma State. I was quite interested in their work a few years ago, because they had done hermeneutics of Islamic and Quranic work on 12,000 things. Phil, you had this diagram with all the nodes connected with the greens, and you said, “Well, where you go from here?” It all looks pretty much like one big tapestry. What struck me about what David King was doing at that time was that, they were able, and this was really stunning for me, able to put all the authors and then, surprisingly, the graph tended to group authors together. So, all of a sudden, the group of these Israelite debaters, back in the 10th century were all in one part of the graph, and all the Ash’ari were in another part of the graph. And the schools of thought, somehow, emerged out of that. So it didn’t look exactly flat, like the diagram. Based on the diagram of this thing, they were able to group them into, maybe the citations, maybe other things would be able to assist in that, but if it did that, then you might see schools of thought emerging in the pattern in front of you.

Phil Gooch: Yeah, that’s great. I think there’s a huge amount like that, that could be done. So that network that I showed was in another tool, in which I’m not involved in, it’s called Obsidian. And I’ve just put a link to it in the chat. So it’s obsidian.md. And that is just the tool that allows you to visualize these relationships. It’s quite basic, and it doesn’t show, I don’t think it can show those levels of annotations that you mentioned that David King showed, where he had the authors, and so on. But there are other tools that do a bit more than this, along the lines of what you’re suggesting. And one is called Connected Papers, which I’ll put in the chat, where they do try to find out similar schools of thought. The idea is, you put in one seed paper and it will find other related papers, not ones that are related by citations, but also similar themes. I will also quickly share my screen to show you. And I think that, what they’re trying to do at Connected Papers, is trying to generalize what you were suggesting, what you’re talking about with David King, where they show, here they’ve got the authors for a given paper and what they’ve tried to do is show related papers where, maybe other people have cited them together in a group, or they’ve got similar themes. And so you can click on each of those and find out more about them, and you’ve got the abstract and so on. And there’s another one, there’s quite a few tools like this, there’s one called Research Rabbit, which is pretty cool, but unlike Connected Papers, it only works if you’ve got an academic email address, which I don’t have anymore. But those of you that have, you might want to check out Research Rabbit because that tries to do that. So in answer to your question, Ismail, there are other people doing those visualizations and trying to generalize them. It’s not something that I’m going to do myself. My role, really, is just to build tools that do convert from one format to another, so that other people can do those visualizations. But, yeah. I think it’s a great suggestion. And I think the potential hasn’t really been fulfilled of all this visualization and linking yet. Partly because, when the data sets become large, it does get hard to then keep track of all these nodes, edges, and what they mean. I think, Mark, you’ve done some work on this with citations, showing things about who’s citing it and who cited it by, looking at alternative ways of doing it, other than a network graph. But I think there’s still room to come up with some new type of visualization that would show all those relationships in a compact way. But you guys know more about the people that are doing that and me. I’m an NLP person. I’m not a visualization person. So I’d love to hear more about those that kind of work.

Frode Hegland: Thank you, Phil. Peter?Peter Wasilko-https://youtu.be/pdVHOoh-EL8?t=2383

Peter Wasilko: I was just wondering, have you received any pushback from any of the Scholarcy publishing houses complaining about your personal document?

Phil Gooch: No, because I think… That’s a good question. We haven’t had any complaints because we’re not making those converted papers publicly available. So it’s a tool like Dropbox or Google Drive. You drop your papers in, you’re the only person who has access to those condensed versions of those papers, those interactive versions of papers. We’re not putting them out there in a massive database that everybody could access. So, no one’s complained about copyright breaches because it’s really only for personal use. But  I think there could be a lot of value in taking every open-access paper and putting it into this kind of structured format and showing how they’re connected. And I think, if we were to do that, then, yes, publishers would complain. But we are in discussions with some publishers about, maybe, doing it on a subset of their papers, in some way. But it’s just a question of priorities. There’s only me and one other person working on this at the moment. So it’s about where do we spend our time, and publishers are a bit of a distraction at the moment for us. So we’ve had one or two conversations, but yeah, they haven’t complained, basically, is a short answer.

Peter Wasilko: Ah, that’s encouraging. Also, have you taken a look at the bibliometric literature?

Phil Gooch: The bibliometric literature? I’m familiar with some of it. But not massively, no. I know there’s lots of stuff about the whole open citations, thinking about making every citation open. There’s the open citations initiative, but did you have something in mind, particularly?

Peter Wasilko: Just that there’s like a whole subset of the information retrieval literature looking at co-citation relationship and term clustering’s amongst society documents. And also, there’s a whole sub-community that’s been poking at those statistics for quite some time, and you might be able to find some useful connections there.

Phil Gooch: Yeah, I’ve talked to a couple of people. There’s a chap called Bjorn Brems, and there’s also Bianca Kramer and David Shotton, who’s in the open citations initiative. I’ve had a conversation with some of them. We’ve actually created an API that some of them are actually using within open citations, I’ll just put it in here, to extract references from papers so that they can be connected together. Because one issue with citations, although it’s not so much of an issue now as it was, was that these citation networks were not freely available, publishers weren’t making them available unless you signed up to Web of Science or Scopus. But now more publishers are putting their citations into Crossref, so that people can do those kinds of network analysis that you mentioned. But we’ve also created this tool that other people can use, authors can put their own papers in there or pre-print, and they will extract the references and then they can be used. Some of the people at open citations have used this API to do some of that extraction. We’ve made that freely available for anyone to use as much as they want until the server falls over. It’s not on a very powerful server. But, yeah. There’s a lot of work going on in this, but it’s not something that I’m personally involved in. I focus more on the data conversion side, and then, once that data is converted, I like to give it to other people to actually do the analysis, if that’s what they want to do with it.

Peter Wasilko: Also, have you considered applying your tool to bodies and source codes like, throw to GitHub and look at all of the citation relationships that actually take the form of code inclusion?

Phil Gooch: No. It’s a great idea, though. No, I haven’t done that. That could be a good project.

Brandel Zachernuk- https://youtu.be/pdVHOoh-EL8?t=2636

Brandel Zachernuk: This is a really cool tool. I’m really excited by the idea of rehydrating things that are essentially inherently already in hypertext and just making them navigable in the way that they should be based on that conceptual content. In terms of suggestions or questions about further directions, the main question that I would have is what drives my work, so it hopefully doesn’t come across as offensive:

What is the point of the functionality? What are the intentions that people have that they follow as a result of using the system? And in particular, when somebody is good at using the tool, what are the primitives that they establish mentally and procedurally that drive their behaviour and action within it? And then beyond that, what are the ways in which you can render those primitives concretely in order to make sure that the use of the tool intrinsically lends itself to understanding things in the way that an expert does? 

So, right now, you have a lot of things in it that are useful, but they’re not especially opinionated about what you do with them. And so, the suggestion I would have and the question is:

How do you ramp up that opinionation? What are the ways in which you can, more strongly, imply the things that you do with the things and the way to read the specific things? So there are numbers, like the confirmations and address become the contrasting results and things like that. What do those mean and how can people understand those more directly, if they need to? One of the things, as these folks have heard me bang on about, I am not from academia, so I’m not familiar with the sort of, people’s relationship with academic papers and what people spend their time doing. Something that I have spent time in, within the context of academia is, debugging my friend’s prose. So where is a P.I. in neuroscience, and I’m sure it’s not peculiar to his discipline, but you can end up with an incredibly tangled prose, where it’s, essentially, trying to do too many things in a single sentence, because there’s a lot to get through. And the sort of approach that I take is very similar to… Have you ever heard of visual syntactic text formatting? It’s a system of breaking sentences and indenting on prepositions and conjunctions. Oh, you built it into liquid? Right. Yeah, you did too. And it’s basically taking something more like code formatting and turning something that I think is pretty generally the case to academic text, that it can end up pretty hard to read. And so that it allows you to follow individual ideas, and understand the regards in which they’re nested and indented through that. So, I guess, what is the hardest stuff to do with these academic texts? And then also, potentially, I’m sure you’ve read and reread “As We May Think,” Vannevar Bush’s book, paper, column, article in 1945. Have you read it before? In large part kicked off the idea of computing for everybody who does computing. And it was made by the man who was responsible for the National Science Foundation during the American War Effort. And he was then complaining about the impractically large body of knowledge that was being produced year on year. And needing some memory extension that would allow him to navigate all of the, I think, academic papers, be able to create hyperlinks between them, and have some kind of desktop environment for doing. It’s a really wonderful read, because he’s basically describing the modern desktop computer, except built out of gears and microfiche, because that’s what his mind was thinking of in the 1940s. The reason why I bring it up and belabour the point is because one of the things that were really wonderful about Bush’s conception of it is that, the navigation of the information was just as important as the information itself. So one of the things that I would be really curious with is, in terms of somebody’s use of Scholarcy to navigate the Docuverse, what are the artifacts that might be kind of re-rendered themselves about somebody’s consumption and processing of a series of documents? Because it strikes me that the browsing interacting behaviour that somebody engages in within the context of your system and framework that you have set up, could itself be a valuable artifact. Not only to the individual doing that navigation, but potentially to other people. Bush envisioned people being trailblazers, constructing specific trails for other people to navigate. Where the artifact was solely the conceptual linkages and navigation through those specific documents, which I think is something that Google essentially is able to leverage in terms of making page rank. But most other people don’t have access to it. But your individual trip, and the traversal of people, actually, between pages is one of the major indicators of what are going to be good Google search results. They have the benefit to be able to make use of that data, whereas other people don’t. But in your case, because you are particularly interested in the individual, the user making the connections, and drawing it between that actual browsing history, and navigation through specific things, it strikes me itself as a very useful artifact to see what people have missed, what people have spent their time on, and things like that. But, yeah. Really exciting work. 

Phil Gooch: Great. There are some really great questions there. To answer them briefly, the first motivation and use case for this was my own need to understand the literature in what my PhD, which was in health informatics. So the actual idea of linking all this stuff together, at the time, wasn’t there. It was, actually, can I break this single paper down into something I can read on my iPad without having to scroll through the PDF in tiny print? Can I turn this PDF into interactive HTML? So it’s really much focused on, what can I do with individual papers to make them easier to read and digest? And what we started hearing back from users was that, actually, particularly novice users, novice academics, I should say, most of our users are people doing master’s degrees, or maybe in the first year of their PhD, where they may not be used to reading academic literature, and it takes them a couple of hours or longer to go through a paper and figure out what’s going on. People tell us that it helps them reduce the time by, as much as 70% in terms of understanding the key ideas of the paper and just being able to follow up on the citations and the sources and so on. That was the prime motivation, just to really make the reading experience easier. And, in fact, as recently just at the beginning of this year, we’ve been awarded the status of assistive technology by the U.K. Department of Education, because we’ve got a large user base. People who have dyslexia or attention deficit type disorders, where they have specific needs, they’re in university and find it’s hard to deal with an overwhelming amount of information in one go, and they really find it beneficial to have it broken down. And there’s a lot of research on this, in terms of, generally, why students don’t read the literature that they’re given by their lecturers or by their educators. They enrol on a course, they’re given a long reading list, and then, they have a lecture, and they go to the next lecture, and the lecturer says, “Okay, who’s read the material?” And most people haven’t. And educators have been tearing their hair out for years trying to figure out how do we encourage people to read. And there’s some research on this about what will encourage students to read, and basically, it’s: break the information down, make it more visual, make it more interactive, highlight some of the key points for them. Just give them a bit of hand-holding, if you like. And so that’s what the technology here tries to do. It provides that hand-holding process. But in terms of the linking of everything together, that’s a bit of a late addition, really, to Scholarcy. And it was really motivated by the fact that, I noticed there was a big academic community on the Discord channel for this tool called, Obsidian, where people were saying, “Well, how can I incorporate all these tools into my research workflow?” 

And the big need that most researchers, or most students, anything from masters level onwards, the big tasks they have to do is, they have to write a literature review that justifies the existence of their research. What have other people said about this topic? And then, when you write your thesis, or you write your essay, you’ve got to say, Well, all these people said this. This is my contribution.” So the task of doing literature reviews is an ongoing one that everybody, every academic has to do. And so we wanted to make that process easier. Once you’ve got those papers that you’re going to write about, drop them into something like Scholarcy, and it’ll break them down, and you can export them into a tabular format. So, one of the things I didn’t show is the export of everything to this, what we call, a literature review matrix in Excel, where, basically, you have about 100 papers say in your review, and you want to compare them side by side. That was one of the other motivations for building it. It was to do that side by side comparison of papers, which I can quickly show you, actually, while I’m talking. So, yeah. Writing literature, some people in academia, there’s this whole department that is just writing literature reviews. So if I’ve got all my papers here, and there are 26 of them in this case, what do they look like side by side? So, in Excel, here’s the raw format, I’m just going to make that a table in Excel, and then, what I can do is just make this a bit bigger. And then, what I can do is slice and dice the information. Excel has this really cool functionality called slicers. So I can say, “Right, I want the authors as a slicer, I want the keywords as a slicer, and maybe the study participants.” And so, what we’ve got now is able to slice and dice these papers according to their keywords. So most academics are quite familiar with tools like Excel. Let’s just look at all the papers that had 112 individuals or 125 participants, for example. And we can just show those. Or look at all the ones that are about cerebral palsy or DNA methylation. So we can do that quick filtering of papers and compare them side by side. And obviously, I can make this look a bit prettier, but the key idea is being able to filter papers by different topics or by numbers of participants, for example. We typically want studies that have a lot of participants, and ones that only got eight subjects, for example, maybe aren’t going to be as useful to us. So that was the other motivating factor and this is how people use it to help with their literature review. So the whole thing about linking everything together, as I showed you in Obsidian, is a relatively new development if you like. And so, yeah, I’m open to hearing about how people might use this. At the moment I don’t think many people are using it for this kind of linking. They’re mostly using it for reading, and they’re mostly using it for creating these matrices that they then use to help figure out the literature and what’s going on. For example, you might say, “Well, I’m only interested in papers that have open data availability.” So I can just look at ones that are non-empty, for example. So if I select all the ones that are not blank, then it filters those papers, the only ones that have got some open data available are the ones I’m going to look at. Or I might want to say, “I’m only interested in papers that talk about the limitations.” It’s quite important for studies that talk about the limitations, but not every paper does. So again, I can filter by the presence or absence of limitations. So this kind of literature review is one of the ways that people are using Scholarcy. But primarily as a reading tool or as a document ingestion tool. So for example, the other way I can get information in is, if I’m reading a paper for nature, for example, I want to get it straight in, while I’m reading it, I can just run this little widget that we built for the browser which basically will read, go away, read and summarize that paper for us. And then, we can click save and then it’ll save it to our library, so I’ve got that nature paper here. Again with its main findings, highlights, and everything. And I can do that with a news article as well. If I’m reading a page in The Guardian, I can click on my extension button, and again, get some of the highlights, key points and links to, you know, who’s Sophie Ridge, I can click on that, she’s a BBC journalist and newsreader, for example. So, it does all that key term extraction as well. And again, I can save that. So if I’m interested in news articles, then I can also use that. And then the other thing that people use it for is to subscribe to feeds. So you’re probably all familiar with RSS feeds, which seems to be making a comeback, which is great. So, if I want to, I can subscribe to The Guardian U.K. politics feed, and just put our asses in front of that. And then, if I go back to my library and say, let’s create Guardian politics, and put in that feed it’s going to go away and pull in those articles and turn them into that interactive flashcard format for me. And I can do that with a journal article, so I’m actually subscribed to a feed on neurology from a preprint server called “medRxiv” and it’s pulling in each day, it’s going to pull in the latest papers. So it’s like an RSS reader as well. So people are using it for that. So, yeah. They’re mainly using it as an enhanced reading tool. And there’s a tool to help with literature reviews. But the whole hypertext linking and things like that is a relatively new thing that we’re not quite sure how many people are actually using to create those relationships between things. While I was talking, it’s gone away and just started to put in those Guardian articles here. So, I put in Guardian Politics and already it started to pull in those articles here. It doesn’t just work with PDFs, it works with news articles as well. So it tells us more about Grant Shapps, he’s the Secretary of State for transport. People use it if they’re new to a subject. If I’m new to neurology, I want to know what some of these terms mean. We’ve got these hyperlinks to Wikipedia, so if the Akaike information criterion is unfamiliar to me, I click on that and it tells me what it means. I’ve got the Wikipedia page about it. If I don’t know what basal ganglion is, I click on it and it tells me all about it in Wikipedia. So that level of linking is something we’ve had right from the beginning and this is well used by people who use Scholarcy. But this kind of graph view is not really well used at the moment. And we’re trying to figure out how to make this friendlier, because we have to do this in a separate application at the moment. But the Wikipedia linking is very popular. So the basic level of doing those key concepts, and their definitions is certainly something that people use to get up to speed on a subject if they’re quite new to it.

Brandel Zachernuk: That’s awesome. In terms of the use of things like site linking to people and concrete entities like the basal ganglion, I would love to see in the direct adornment and representation of those entities within the document that you have as being reinforcing the category of things that they are. So, having a consistent representation, for example, of people so that you have, if available, a thumbnail, but otherwise some indicator that these things are definitely people, show them, rather than concepts. I saw that you have a little bit of, it’s being able to pre-emptively pull a little bit more information about what you’ll find behind those things. One of the things that I really love to do is make sure that people minimize the surprises behind clicks so that they have the ability to anticipate what kind of content they’re in for. And that helps frame their experience because hypertext is very valuable insofar as it allows you to navigate those things. But if it’s anybody’s guess what’s behind them, then that can be very distracting. Because it means that it’s difficult for them to process things in those flows. Another thing that I’m really excited by just looking at that, natural language processing lends itself incredibly well, it’s a question answer and agentive mediated action and stuff. Have you played with the speech-to-text and the text-to-speech engines within browsers in order to be able to create conversational agents and participants? And it strikes me as a lot of fun to be able to do, where you could actually ask pre-formed questions of a certain kind about your corpus, in order to be able to do things like that. 

Phil Gooch: Yeah, that would be a great idea. I know there are some other tools. There’s a tool that does some similar stuff to what we do, it’s called Genei and they have a question answering thing. We haven’t done that kind of thing. But, yeah, certainly something we could add. Either you may type in a question like, “What is the best evidence that supports the use of this particular drug against Covid-19?” for example. And then, it would go and search all those documents and show you which ones generally support the use of that drug, for example. We could do that. And that could also be a speech-type interface. So, yeah. That’s something that we could add to it, certainly, as a future enhancement, that’s a great idea. 

Brandel Zachernuk: The other benefit of a speech primary environment is that you have the opportunity to use the visual feedback as a secondary channel, where you can say “I’ve found these documents and they are here.” And then the documents are up here and things like that. But, yeah. It’s super cool. One of the things again that strikes me, that you’re doing with it, as well, is the academic paper format is very curious and very dense, in no small part, because it’s for shipping important information on (indistinct). And so, as a general concept being able to be a little bit more generous with the space, in order to be able to characterize and categorize the different things that are in a paper, is a really good viewpoint perspective on what it is that you’re able to do. Because, like I said, even though an iPad is a smaller, in many regards, device than the papers that you’re going to be reading, or especially a phone, you do have the ability to renegotiate the space, the real estate that’s devoted to those things. And, yeah. Being even more generous with the space that you use to carve out the, this thing is this, that thing is that, might be a valuable way of playing with all of the different elements that you’re presenting.

Phil Gooch: Yeah, that’s right. That was one of the main motivations. To reduce that problem of squinting at PDFs on screen. Because they were meant for print. But everyone’s using them as an online distribution format, as well, which wasn’t what their intended purpose was. And so, just to try to transform that content into something that was a bit easier to read on screen, I don’t think we always succeed. And I think, actually, within the academic community, there is this, people are trying to move away from PDFs as a means of distributing knowledge, but people are still struggling to get away from that format for various reasons. Which is a subject for another discussion, perhaps. But, yes. 

Frode Hegland: For a long time. But since you just talked about the provocative three-letter word, PDF. It is something we’re discussing here. We use it archival, and we accept academics use it, but as an intermediary in rich format, clearly, it’s not up to snuff. Mark?

Mark Anderson- https://youtu.be/pdVHOoh-EL8?t=4036

Mark Anderson: Well, first of all, thanks so much. Fascinating to see Scholarcy again. It’s something I’ve been meaning to find some time to dive into again. Because it’s interesting you talking about the Obsidian graph and things. So, for instance, one of the problems there is what it actually does. It shows you the links that you made. When I say you made, now this gets to the interesting part. If we begin to do automatic extraction, who made what link for what purpose, this is where we get lost. And there’s a massive, I mean, obviously, there’s Obsidian and Roam and there’s a whole cult around Zettelkasten. But a lot of these things, unintentionally, is the underpants’ business names. Underpants business theory, where if you collect enough stuff, a magical thing will happen, and success at the end, and no one quite knows what the magic is. I think one of the interesting challenges, but opportunities, actually, to the data set you’re now sitting across is to be able to start to surface some of the relationships. The real privilege you have with the dataset is that you know what’s there, and you can begin to make more objective study comments as to what the links mean, that many people can’t. So some interesting, in a sense, research to be done there. So one way one could look and try and make sense about diagramming would be, to take an area that we know has been just really well trampled by people, so you might say, “Well, there are a few surprises in the literature.” And then play around with the visualization. Because you want to be able, to then, have something that’s otherwise really hard to do, saying, “Okay, I’ve made this wonderful-looking thing. Is it meaningful or not?” And most of the time we do this, we just don’t know. The main thing is we know it looks pretty. And that’s another problem because we like to make pretty and aesthetically pleasing graphs, whereas life would tend to suggest that the messier it is, probably the closer you are to the ground in truth. So I think that might be an interesting area to look at. I think, then, to make sense of what the either infertile or extracted hypertextual nature or linkage in the data is. It’s probably most meaningful to take, or most useful to take a bit that’s essentially well known for whatever reason. But one where there isn’t a great thing. So, don’t pick something that’s a great topic of anxiety, or social warfare at the moment. But I think that there ought to be places where we can see this. Which brings me on to another thought which is the degree to which I’m guessing that the sciences, the paper in the sciences are more tractable to this process than the arts. Because the language is, by in large, more direct. So we’ll talk about a thing, and that’s the thing that we can go and look up, whereas, in a more pure humanities side, the reference is just maybe elliptical, and did you have to know the subject matter quite well, to know that they’re actually referring at one form removed from the subject that’s actually under discussion. I think that’s just a state of where we are with the art, rather than a limitation, per se. But is it the case you get more back from science areas?

Phil Gooch: Yeah, that’s right. It does. For the reasons that you mentioned, the structure tends to be quite standardized. They have what they call the IMRAD format, Introduction Methods Results And Discussion. They’re very much about stuff that can be packaged neatly into facts if you like, or the factoids, or things that got some evidence about it. Well, we have tried using it on certain subjects in the social sciences,  things like philosophy and biography, and well, I mean, literature generally, as you go towards the literature, and particularly fiction, it doesn’t really work at all other than the fact that we can pull out named entities from people and places and so on. But in terms of pulling out the argumentation structures, is much harder in the humanities. But interestingly though, actually some of the feedback we’ve got from some of the users is that, in a more social sciences subject, it does really well. And less well in things like philosophical and rhetorical-type articles, in the hard sciences, or the stem sciences. It doesn’t do very well in engineering. And I think the reason for that is a lot of mathematics, and we don’t really handle mathematics very well. Getting decent mathematics out of PDF is hard. And then often, an engineering or mathematics paper is all about the equations, and the discussion around it is maybe not peripheral but it’s secondary to the main maths and formula that you’re presenting and putting forward. So, yeah. There are some subjects that are harder to apply this kind of NLP to, and certainly, humanities is one of them, anyway.

Mark Anderson: I hope you’re mentioning it because I don’t share that as a negative at all, I was just interested to see how the coverage goes. Because another thing that occurred to me, in terms of, again, because you’ve got this fabulous rich data set, one of the things I always find myself worried about when I was doing the research was avoiding the stylist theory of art. Because 81 people said this was really good, it must be good. And indeed one of the things in our PhD group in Southampton was discussing was actually a way that you could start, for instance, to classify what’s a drive-by citation. “Oh, I have to cite that because otherwise I get shouted at.” And that was, to my mind, a meaningless citation, because actually it’s been done for no real good intent, as opposed to the thing you actually genuinely wanted to cite because it actually added interest. And that strikes me as a challenge when doing this extraction, not because of the sin of commission, but you get to the next level. So, in a sense, do we need to start learning new ways to read this? So as a student or a user of this rich data set, what are the new questions I need to learn to ask? Because, to a certain extent, we arrive at this technology at the moment. Sort of, “Oh, look. This number is bigger than that number.” And we don’t often stop ourselves from thinking, yes, but is that a deep enough thing? There are some interesting angles to be played there as well. How one might tease apart some of the raw numbers which otherwise float up the surface. Because this is what I was thinking when bibliographies and these raw citation counts. Because maybe it’s just a field I was working but I don’t think so, that I was often surprised at how many times I went to a really highly cited paper, I’m thinking, I just don’t see what is so special about this. And even when I put in the context of what was known at the time, it’s still not special. It’s clearly being cited because it’s getting cited a lot. But no one has ever thought to say, “This actually isn’t a very interesting or useful paper.” And at a slight tangent, I’m interested to know what do you see as an edge, as to how far back you can easily go with things? Because, presumably, with PDFs, you don’t get back very far before the OCR and stuff was not that hot. Or are you re-OCRing stuff, or?

Phil Gooch: No, we don’t have an OCR engine at the moment. The PDFs do need to have extractable text. We did a project a few years ago with the British Medical Journal where we were just pulling out the end of article references from a collection of PDFs which were only scanned images. They did the OCR themselves, they sent them off to a company to do the OCR. We got the OCR versions of the PDFs back. And then we did all this extraction for them. And the data was really noisy, but at that time, we were just interested in getting the bibliography from each other. So the trouble is, we’re doing OCR on-demand, we often get people uploading 200, 300 page PDFs, and the idea of doing that on-demand just fills me with fear, having that run at scale. So we don’t do that. But, yeah. It could be done but that would be a separate standalone project, I think, that would be a research project to go and try to text mine that archive if you like old PDFs. And do something interesting.

Mark Anderson: The reason that it sticks in mind is it so there’s an almost implied temporal cliff somewhere, some distance back from us, where things start to come into easy digital focus. Which is unavoidable, but it’s perhaps something we need to start to recognize. Yeah so that was one thought but it’s passed through my mind, so I’ll let that phrase be. 

Frode Hegland: So, Phil. The reason you are here, as we’ve discussed before, is you allow for analysis, for interactivity. And I’m wondering, before, actually, I’m going to ask the question first, not to you, actually. Brandel and Fabien. I’m going to waffle on it for a minute now, but if you guys have something you want to show Phil that you have worked on or something else in VR, to help him see where this fits. I just want to highlight, for my own personal work, with my own personal software, when I look out and I see so many people doing amazing stuff, the only thing that I’m trying to contribute to is simplification. Because you can make things really horrendously complex, obviously. So I’m wondering if, maybe, by making interactions with this more tangible, we can have more… Yes, here we go. I can stop waffling now, Fabien would like to show something. 

Fabien Benetou- https://youtu.be/pdVHOoh-EL8?t=4700

Fabien Benetou: Hey, everyone. So this is not actually a network analysis, graph analysis, or any Scientometrics. Simply putting the PDFs in space of an upcoming conference, it was for a VR conference. And then I think a lot of people got that struggle, a lot of people look interested, but then you have to start with one. I know, at least I can’t read two or ten papers at once, so I need to find which one. And basically what I do is, I put them in space, I set up the space to make it friendly or wanting with the conference. And then I’m going to put them, I have a little annotation system with a 3D object where I put a post-it note if I need to write something on it if I’m not sure if it’s interesting if it’s really mind-blowing and I want to read it first. And, yeah. That’s the result. It’s a social space, so I can invite somebody to go through and then we can discuss which one to read first. And then, at the bottom right, I don’t know if you can see clearly, there is a grey platform, and then I can send it to my ink reader and writer so that I can sketch on top and update it and all that. I have a couple of other examples where it’s more the graph view. And then you can go through it, but it’s a bit more abstract. So I think this was the more tangible way, and I would definitely like to have my personal annotation through this, for example. But I could very easily list to next to a paper or an article, information related to it. For example, scaling based on popularity or anything like this. Just a simple example. 

Phil Gooch: That’s great.  Yeah, that looks like a really nice way of navigating and picking out which sections you want to read, and papers you want to read. What I was looking at when you were showing that, just reminded me of a paper I saw years ago called document cards, which was one of the motivations for Scholarcy. Where they turn each paper into what they call Top Trumps. So, if you’ve got a lot of papers to visualize, it turns each paper into a single graphic that’s got the main image from the paper and maybe a couple of quotes from the paper. And it’s a way of showing everything on a paper in a single thumbnail. And maybe there’s a way of doing something like that, instead of showing those PDFs in your virtual reality, you’re showing, maybe, a condensed version of them, that maybe has just enough information to decide whether you want to read it or not, perhaps.

Frode Hegland- https://youtu.be/pdVHOoh-EL8?t=4868

Frode Hegland: That’s definitely worth us having a good look at just a little bit. Phil, on a sales pitch for the whole VR thing: How long ago has it been since you put on a headset? More than a year? Because, Phil, you must have done some VR at some point, right?.

Phil Gooch: I’ve not done anything. I might have put on a headset in a museum once or something, but…

Frode Hegland: Because the key thing is, it’s nothing like Second Life at all. And what Fabien was showing there is, once you’re in that space, it becomes really useful and navigable. I sometimes write using Author, my own Author in VR. For the opposite reason that it’s normally good for because it means I have a limited field of view, I have a nice background, I have a decent size screen, the visual quality is good enough for writing, and it’s good enough for reading. You wouldn’t want to read forever. Sure, absolutely. But where the whole system is now is that we’ve done some experiments of a mural, and just having a single mural is absolutely amazing. Because it is really hard to describe, when that mural as an image, is on a computer screen, you kind of move it about, yes, of course, you can do that. But when you can have it huge and then you do a pinch gesture and it comes towards you and you zoom in different things, it’s kind of not explainable why it’s so special. And one of the reasons Ismail is here, we’re looking at doing some mural and timeline related to Egyptian history. It is really hard for us, we only started. I mean, Fabien and Brandel have been going for a long time, but the rest of us, we only started, basically in January. So I have my headset here, goes on and off depending on what we’re doing, but it’s really hard to explain the point of it. Because sitting down VR is one thing, but what really brought me over the edge was when Brandel said just moving your head a little bit as you naturally do, it changes everything. When we have meetings in VR, which we sometimes do, the sense of presence and being with other people, because the audio is spatialized, so if someone’s sitting there, the sound comes from there, it’s absolutely phenomenal. So, I really think that Obsidian and all of that it’s nice, and even, as you saw in the beginning, Mark has taken, not even that many documents, but enough documents that that’s all the system can do, into this space, it quickly becomes messy. So, I think what you contribute is the ability to change the view rapidly and intelligently. There are so many interfaces for VR, and a lot of them is about using your hands, grabbing, and moving, and that’s all well and fine. But in some of them, you can have literally buttons to press for certain things to happen. So, I could easily imagine a document space, you start with one document, and at least in the beginning, you have a huge set of buttons underneath, very inelegant, obviously, that when you come across a citation, you can do what you already showed. They can start growing the trees. But all these buttons, again, initially can help you constrain and expand that view. It would be nice to have a spoken interface, it would be nice to pull, and that needs to be experimented with. But the reason I was so excited to have you here today was and is the real interactivity that you give. You take data that’s out there and you make it tangible in a whole new way. So I hope that, what we’re trying to do, we’re trying to do some sort of a demo for the next Future of Text. We’re looking into building some work room. And Brandel has already taken, from Author, because Author documents are dots. They’re called dot Liquid, like, dot, dot, dot, Liquid. Inside them, we have JSON. So we have some of those goodies already. He’s been able to take the map view, with the relationships into VR. And, of course, it’s relatively static, but you can already touch things and see lines appear. To be able to go further, and to do, with what you have made available, would be really quite exciting. I mean, I could very well imagine doing the reading you’re talking about. You talked about making it fit on an iPad, but what about making it fit a whole room, right? Just putting one wall, to begin with. Where do you actually put the pictures? Where would you put the graphs? So many questions come up. It gets really interesting. It’s not very obvious at all. But thank you for allowing us to think with available data.

Phil Gooch: Thanks. Well, it was great to have the opportunity to chat with you all. Thanks for inviting me. I just wanted to touch on one thing that we spoke about in an email. Because at the time I had a hard time thinking what is the VR/AR angle on this. But you can imagine, in an augmented reality setting that you might have a book or a document in front of you, and you’ve got this augmented view that says, “Actually, here are the main people citing this paper. This is what they say about it. Here are the main findings of this paper” as a separate layer. So you’ve got the paper there you can read and navigate in this 3D space, and you’ve also got this layer that says “Hey, here’s the really important stuff in this paper that you need to know. And this is what other people are saying about it.” And maybe that’s one of the use cases for AR in this kind of idea. And as you know, Frode, the API is open, so if there’s anything that you want to add for your demo, just give me a shout and we can make it available to you. 

Frode Hegland: I’ll go over to Brandel, but just really briefly, the thing about how things are connected in a VR space is really up for grabs at the moment. It really is the wild west. So one thing I think we need to do now is, just dream crazy dreams. For instance, the Egyptian opportunity, let’s say you have the mosque of Tutankhamun sitting literally on your desk as you’re working on a project, you should then be able to say, “Show me how that relates to timeline, when it was found, and when it was used. Show me that geographically.” All these things should be able to come together. And right now, other than some idiot on “Zoom,” me, doing it with his hands, it doesn’t really connect. And I’m hoping that your, first of all, your parsing and your genius, but also your willingness to open your APIs to others, and to use other APIs can be a really powerful knowledge growing hub. And, yeah. Brandel, please? 

Brandel Zachernuk: Thank you. Yeah, I definitely echo everything Frode is saying. If I can characterize what it is virtual reality, augmented reality, spatial computing at large does is that, when you have a display, be it a phone or a screen, even if it’s 30 inches or whatever that is, it’s still very much performs the function of a foveal vision. The central vision of what you’re looking at. And there was a lot of really neat exploration of the practical cognitive consequences that in the 1980s, where they’re saying, “It’s like browsing a newspaper through a hole the size of one column wide.” And what virtual reality does is take that filter away, so that you’re able to read those newspapers but you’re also able to see the whole space around it. And to that end, I think we, unfortunately, as a result of having 50 odd years of computing being the primary mode of interaction for at least some information knowledge workers, and certainly 30 years of it being in the absolute dominant form, is that we have surrendered the space that we would typically do information and knowledge work in, to a small computer with a very even smaller visual real estate. So we don’t have the ability to think about what the entire space is for, can be encoded for. And to that end, I feel like we have to go back to the metaphors that spring from understanding something like a kitchen or a woodshop, where you have tools, they have places, when you’re standing in those places, when you’re gripping things in certain ways, that means you’re doing certain things. And that you might move a workpiece from one place to another in order to be able to undertake some kind of manipulation over it. And so, my hope is that, when people can return to that, within the context of knowledge work, where you can say, “I’m looking at this thing right now. But I have this stuff around me.” One of the things that I showed Frode and other folks in this group was being able to have writing that you’re doing here, and then having the word count over here. So you don’t have to click a button, open a menu in order to see that that information is available. Simply from something as simply reflexive as turning your head. Likewise with visual image search happening at the same time. But the other thing that this increased capacity for context does is that it increases, by orders of magnitude, the way in which scale can be used. If you think about a museum, in contrast to a book, or in contrast to an academic paper, which is even more compressive constrained, the way that type scale can be vastly changed in order to tell you things. Like, the exit signs and the titles over things are not just two times larger, but they’re maybe a hundred times larger. When you have a big piece of writing on a wall talking about how great Vanco is, they’re four different things but the sort of experiential consequences are absolutely legit. Because of the fact that you can devote that space to that, and this space to this. And so, yeah. I’m really excited about seeing all of the semantic information and insight that you have in Scholarcy, and really excited thinking about how to encode that into an entire space that people can manipulate and intervene on, at that scale.

Phil Gooch: Yeah, that would be awesome to be able to do that. Our API is open, so if people want to try doing that, integrating it into other systems they can do that. So, thank you. And also thanks for your suggestions about (indistinct) those entities. What type of thing are they? Are they a person? A place? And so on. Like you said, clicking on those links so you know where it’s going to take you in advance without having to wonder. Some great suggestions here, so thanks very much, Brandel. That’s great. I’m afraid I have to go. Lovely to meet you all. And, yeah. I’ll chat to you again soon. Hopefully at the next Future of Text.

Frode Hegland: Sorry I couldn’t see you on Thursday, Phil. But I’m presenting on semantics something that I don’t know anything about. But will be a fun presentation. All about Visual-Meta. Anyway, we’ll have coffee soon. Thanks for your time. 

Phil Gooch: Take care. See you soon. Cheers. Bye-bye. 

Frode Hegland: I was just going to say while he was still here but I’ll just say to you guys anyway. This obviously gives an opportunity to go from static documents to living dynamic knowledge object kinds of things. Imagine, have virtual bookshelves behind us and we can be writing a paper, one of our citations is refuted. Wouldn’t it be nice to be told that? Maybe that citation in our own document starts to pulse red and we have to pull it off the shelf and see what someone else does. Those are the kinds of opportunities we have. And in terms of the dreaming that we’re doing in the group now. I think, maybe, we should also try to dream about having absolutely all our knowledge in VR, not just one project at a time. Just a little thing. Because, as Doug said, “Dreaming is hard work.” Fabien, I think you are next. 

Fabien Benetou: Thanks. Again I think I mentioned it once or twice about why I’m doing going to free space. But it’s absolutely to have all my knowledge in here. Like 100%. So far, they’ve been a mix of me not knowing how to do it, the technology not being out there, not yet. Or again, a mix of both. But definitely. And also, the bridge between what I don’t know and what I already know. So that, I could, for example, go from one reference, things I have read, to another one, for example, that you suggested. So definitely what I wanted to show, also, before your remark was to criticize what I have shown. For example, you mentioned on the manipulation aspect, basically, and I think that’s the problem when I or others share images of VR, or VR content, people think, “Oh it’s another form of visualization.” It’s absolutely not that. Visualization can be part of it, but then, as you do the hand waving, that part is really fundamental. And being able to have a fast-paced interaction, tight feedback loop, being able to do it, again with a hand, relatively. Naturally it’s not perfect yet. But I think that’s the part that is really important. So I think always having gear for a short image to show that it’s not just like, they’re in space for the movement, the head movement also you describe. Having it in motion as a process. Ideally, also, have a green screen where we see the actual body of the person with a headset moving and grabbing the actual note, I think all that, especially for people who are not familiar, makes a huge difference. Of course, it’s a bit of work, having a studio set up and ready, so that everything is calibrated right for this. But for people who didn’t try it yet, they’re not the head thinking, “Oh, yeah. I understand.” They have no idea. So I think that also helps. Your product is still, of course, not good enough. And as you say, yeah, they need to try it. But I think that that’s like a little bit further still, so.

Frode Hegland: Yeah, I mean. Even if we have a virtual camera in the room, that doesn’t move. So that the virtual camera records from one perspective, and then you just see, yeah. Mark? 

Mark Anderson: Yeah, because I actually flowing for that. That’s one thing I found myself just returning to do some more playing around with stuff in VR, was thinking that I know in principle how I could share that with people. But one of the things I find myself really wanting to do is, rather than write stuff down, is to essentially, for this state of the art to be probably streaming my first-person perspective of what I’m doing. Because I think what’s more important is to be able to… Because in that context, I’m not trying to convince someone they should use VR or not. But what I’d be trying to do is explain to people how things that some things that are hard that we thought weren’t. And you actually need to be seeing it through somebody’s eyes to do that. When I can’t turn my arm to this angle to do what I thought I could do. And so, as I sometimes found. And an interesting thing, I found myself reflecting on thinking back on the notes I’ve been making about the processes. I can do this with my hand, it’s 180 degrees rotation, which is quite natural. But I found in some of the little puzzles and things I’ve been using to practice VR work. I was quite surprised, often I find I can only somehow, for some reason, it feels I can only turn it so far. They have to let go, get it again, and turn it again, which seems counterintuitive. So I haven’t bottomed out quite what’s happening there. I wonder if it’s a cognitive blockage on this end or whether it’s something the way things move. And things like that I think are remarkably powerful to be able to show to someone as seen through your eyes, regardless of whether they’re in the VR or not. Simply because, otherwise, it’s very hard to explain. And the thing I set my hand up to say and I forgot about when we were talking to Phil earlier is that it strikes me, and they’re taking on board in a positive way the limitations of what we can do with some of our structural decomposition of text in terms of natural language processing and things. It says to me that one thing we can do, even as we mole over whether we are or aren’t going to move away from PDFs, would be to push towards more structured writing. So in a sense, in something like an academic publishing context, where there are some rules, I mean, to get published you have to obey some rules and I know some areas, I think in health and things have very much gone towards this. But there’s no reason to just not make that more explicit. And it doesn’t have to be everything. But even if it was just core things like abstracts, conclusions, or end pieces. Absolutely had to pass that, as you don’t get to go on the ride if you don’t do that. It’s not impossible to do. And it begs the question, for instance, if your conclusion is so vague and woolly that you can’t really break it down into something, in fact, maybe, you haven’t got any conclusions. Which goes back to the saying, “One of the joys of doing documentation is it teaches you how little you understand about the thing you thought you knew.” 

Frode Hegland: We are recording. Would you like to pause a little bit, Fabien? And now let’s just pretend we’re talking and we’re going to continue in a second and no pausing is happening at all. And nothing important happened, over to you, Fabien. 

Fabien Benetou: Yes. I think there is a big difference between having meta as a goal. Let’s say, using VR to analyse new ways to work in VR, versus bringing history back into the space. We start with being a room rule something else. Bob mentioned the idea of when he starts a mural, he has his own murals. He considers them. Maybe integrate part of them. And I think that’s very valuable. I personally have a wiki, every page of my wiki has a history, so I can go back at any point in time. And I have, I think I briefly mentioned to Brandel on Twitter today, that I have a personal obsession with phylogenies. I think the blank page doesn’t exist, it’s maybe a western conception. But overall, we always have something older that comes from. So I think bring some history, let’s say, if we do a VR room, we bring the Future of Text volumes. Or we bring whatever we want it. It’s not just interesting, but I think has to be done, is valuable. Still starting from, let’s say, an empty space based on one target, one goal, for example, some points about global warming, but then, yes, bringing that history back in. 

Frode Hegland: I think we’ve run over a lot. Just, yes. First of all, Jacob is working on exporting to HTML from Author. I don’t know exactly what that’ll be, in terms of what metadata will be how. I also I’m madly in love with all glossary-defined terms thing. And I think the reason I’m so in love with it is you probably, I know half of you will have this book, and you look at so many of the different diagrams. Things get messy so quickly. So if you have something, you decide yourself is things you really want to have in there, that that becomes very useful. And in the Future of Text books, of course, we also have a text timeline, history of text timeline. Which we can expand upon and import. So we do have some of that data to try things in as well. But, yeah. Okay. So, shall we try to dream a little bit for Friday? And then next Friday we dream? And then, but once we dream about, whatever, Egypt, or our own record, or climate change, or whatever it is. Then we decide, Okay, we settle on a topic, we settle on the beginning of a quote-unquote room. Is that kind of where we’re at? 

Brandel Zachernuk: Yeah, that sounds pretty reasonable to me.

Frode Hegland: Very good. So, I wish you luck. I wish me luck. Dreaming is, I’ll say it again, it’s hard. Because it’s different from fantasy, obviously, right? Which is why I hate Harry Potter. Green flash, blue flash, no, no, the purple one is dangerous. Sorry, yeah. I gotta go. Okay. Thanks, everyone for today. Bye for now. Take care.

1 comment

Leave a comment

Your email address will not be published.