“The approach is to say, ‘I think I know what the underlying physical laws are that give rise to everything that I see in the system.’ So I have a recipe for star formation, I have a recipe for how dark matter behaves, and so on. I put all of my hypotheses in there, and I let the simulation run. And then I ask: Does that look like reality?” What he’s done with generative modeling, he said, is “In some sense, exactly the opposite of a simulation. We don’t know anything; we don’t want to assume anything. We want the data itself to tell us what might be going on.”
The apparent success of generative modeling in a study like this obviously doesn’t mean that astronomers and graduate students have been made redundant – but it appears to represent a shift in the degree to which learning about astrophysical objects and processes can be achieved by an artificial system that has little more at its electronic fingertips than a vast pool of data.
“I just think we as a community are becoming far more sophisticated about how we use the data. In particular, we are getting much better at comparing data to data. But in my view, my work is still squarely in the observational mode.”
These systems can do all the tedious grunt work, he said, leaving you “To do the cool, interesting science on your own.”
Whether Schawinski is right in claiming that he’s found a “Third way” of doing science, or whether, as Hogg says, it’s merely traditional observation and data analysis “On steroids,” it’s clear AI is changing the flavor of scientific discovery, and it’s certainly accelerating it.
Perhaps most controversial is the question of how much information can be gleaned from data alone – a pressing question in the age of stupendously large piles of it.
In The Book of Why, the computer scientist Judea Pearl and the science writer Dana Mackenzie assert that data are “Profoundly dumb.” Questions about causality “Can never be answered from data alone,” they write.
“Anytime you see a paper or a study that analyzes the data in a model-free way, you can be certain that the output of the study will merely summarize, and perhaps transform, but not interpret the data.” Schawinski sympathizes with Pearl’s position, but he described the idea of working with “Data alone” as “a bit of a straw man.” He’s never claimed to deduce cause and effect that way, he said.
The orginal article.
Online, you’ll find people who use hashtags like “#digitalhoarder” and hang out in the 120,000-subscriber Reddit forum called /r/datahoarder, where they trade tips on building home data servers, share collections of rare files from video game manuals to ambient audio records, and discuss the best cloud services for backing up files.
By contrast, many self-proclaimed digital hoarders say they enjoy their collections, can keep them contained in a relatively small amount of physical space, and often take pleasure in sharing them with other hobbyists or anyone who wants access to the same public data.
“Data hoarder means to me simply someone who collects and curates digital data,” said the user -Archivist, one of the moderators of /r/datahoarder, in a private message on Reddit.
Many people active in the data hoarding community take pride in tracking down esoteric files of the kind that often quietly disappear from the internet-manuals for older technologies that get taken down when manufacturers redesign their websites, obscure punk show flyers whose only physical copies have long since been pulled from telephone poles and thrown in the trash, or episodes of old TV shows too obscure for streaming services to bid on-and making them available to those who want them.
Some /r/datahoarder users acknowledge they collect files that other people might not find interesting: HeloRising, a man in his mid-30s from the Pacific Northwest, said via Reddit PM that he’s built up a collection of high-quality digital copies of illuminated manuscripts, which he said he finds fascinating but has yet to find other uses interested in sharing.
HeloRising, who has about 30 terabytes in total of data and spends five or six hours per week on the hobby, said the Reddit community has been a “Treasure trove” of useful advice and information.
Still, problem digital hoarding, where massive collections of files, inbox messages and other digital data bring stress to their owners, isn’t unheard of, including among people who already struggle with hoarding tangible objects.
Many systems don’t make it easy to find, organize and back up valuable files, while shunting more ephemeral data to the digital trash heap.
The orginal article.
The data brokers quietly buying and selling your personal information.
Thanks to a new Vermont law requiring companies that buy and sell third-party personal data to register with the Secretary of State, we’ve been able to assemble a list of 121 data brokers operating in the U.S. It’s a rare, rough glimpse into a bustling economy that operates largely in the shadows, and often with few rules.
Even Vermont’s first-of-its-kind law, which went into effect last month, doesn’t require data brokers to disclose who’s in their databases, what data they collect, or who buys it.
Still, these 121 entities represent just a fraction of the broader data economy: The Vermont law only covers third-party data firms-those trafficking in the data of people with whom they have no relationship-as opposed to “First-party” data holders like Amazon, Facebook, or Google, which collect their own enormous piles of detailed data directly from users.
If you’re concerned about how a company is handling your personal data, you can file a complaint with the Federal Trade Commission, which has issued millions of dollars in penalties over unfair or unlawful behavior by credit agencies and data brokers.
A sibling of the giant U.S. credit reporting agency Experian Information Solutions and one of many subsidiaries of the Ireland-based data giant Experian PLC, the company operates Experian RentBureau, a database updated daily with millions of consumers’ “Rental payment history data from property owners/managers, electronic rent payment services and collection companies.”
Data giant “Oracle Data Cloud gives marketers access to 5 billion global IDs, $3 trillion in consumer transactions, and more than 1,500 data partners available through the BlueKai Marketplace. With more than 45,000 prebuilt audiences spanning demographic, behavioral, B2B, online, offline, and transactional data, we bring together more data into a single location than any other solution.”
“Twine is a mobile data platform that works with app publishers who generate mobile data & the companies who need data for ad targeting.”
The orginal article.
Your phone and TV are tracking you, and political campaigns are listening in – Los Angeles Times.
Welcome to the new frontier of campaign tech – a loosely regulated world in which simply downloading a weather app or game, connecting to Wi-Fi at a coffee shop or powering up a home router can allow a data broker to monitor your movements with ease, then compile the location information and sell it to a political candidate who can use it to surround you with messages.
As a result, if you have been to a political rally, a town hall, or just fit a demographic a campaign is after, chances are good your movements are being tracked with unnerving accuracy by data vendors on the payroll of campaigns.
The RealOptions case turned out to be a harbinger for a new generation of political campaigning built around tracking and monitoring even the most private moments of people’s lives.
Just as the antiabortion organizations did around clinics, political campaigns large and small are building “Geo-fences” around locations from which they can fetch the unique identifying information of the smartphones of nearly everyone who attended an event.
“I don’t think a lot of people are aware their location data is being sent to whomever,” said Justin Croxton, a managing partner at Propellant Media, an Atlanta-area digital firm that works with political campaigns.
Which political campaigns and other clients receive all that tracking information can’t be traced.
Serge Egelman, research director of the Usable Security & Privacy Group at UC Berkeley’s International Computer Science Institute, said his team can unearth which opaque data brokerages are amassing information, but not which political campaigns or interest groups buy it from them.
The orginal article.
Plus, the company is still wrestling with costly IT upgrades that have been necessary to pump data into Philyra from disparate record-keeping systems while keeping some of the information confidential from the perfumers themselves.
Such productivity gains are largest at the biggest and richest companies, which can afford to spend heavily on the talent and technology infrastructure necessary to make AI work well.
Last September, a data scientist named Peter Skomoroch tweeted: “As a rule of thumb, you can expect the transition of your enterprise company to machine learning will be about 100x harder than your transition to mobile.” It had the ring of a joke, but Skomoroch wasn’t kidding.
If companies don’t stop and build connections between such systems, then machine learning will work on just some of their data.
Even if a company gets data flowing from many sources, it takes lots of experimentation and oversight to be sure that the information is accurate and meaningful.
When Genpact, an IT services company, helps businesses launch what they consider AI projects, “10% of the work is AI,” says Sanjay Srivastava, the chief digital officer.
Smaller companies often require employees to delve into several technical domains, says Anna Drummond, a data scientist at Sanchez Oil and Gas, an energy company based in Houston.
Fluor, a huge engineering company, spent about four years working with IBM to develop an artificial-intelligence system to monitor massive construction projects that can cost billions of dollars and involve thousands of workers.
The orginal article.
As Roser is quick to note, it’s not “His” chart – it’s similar to charts many economists working on poverty have produced, such as one in Georgetown professor Martin Ravallion’s book The Economics of Poverty.
Hickel argues that focusing on data showing declines in global poverty does political work on behalf of global capitalism, defending an inherently unjust global system that has failed residents of rich and poor nations alike.
“The present rate of poverty reduction is too slow for us to end $1.90/day poverty by 2030, or $7.40/day poverty in our lifetimes. To achieve this goal, we would need to change economic policy to make it fairer for the world’s majority.”
Hickel insists that the $1.90-per-day poverty line is unacceptably low and that we should focus on absolute numbers – how many total people are living in poverty – as well as the share of people in extreme poverty.
We use poverty rates, not absolute numbers, in discussions of US poverty as well.
Sure enough, if you look again at the chart that opens this post, you’ll see that extreme poverty fell by very, very meager amounts before about 1950, gains that were concentrated in rich European countries thriving off of extracting resources from the global South.
That’s consistent with the story’s Roser’s telling: All of humanity was poor, barred from breaking out of bare subsistence, until industrialization; colonization prevented the mass of humanity from benefiting from the economic growth that industrialization enabled in Europe and North America, and substantially worsened conditions for its victims; but global growth from 1950 onward led to a massive poverty reduction.
Just about everyone agrees life expectancy is up, education is more common, and poverty rates are down over the past three or four decades regardless of where you set the poverty line.
The orginal article.
TechCrunch has found several popular iPhone apps, from hoteliers, travel sites, airlines, cell phone carriers, banks and financiers, that don’t ask or make it clear – if at all – that they know exactly how you’re using their apps.
Worse, even though these apps are meant to mask certain fields, some inadvertently expose sensitive data.
Apps like Abercrombie & Fitch, Hotels.com and Singapore Airlines also use Glassbox, a customer experience analytics firm, one of a handful of companies that allows developers to embed “Session replay” technology into their apps.
These session replays let app developers record the screen and play them back to see how its users interacted with the app to figure out if something didn’t work or if there was an error.
The App Analyst, a mobile expert who writes about his analyses of popular apps on his eponymous blog, recently found Air Canada’s iPhone app wasn’t properly masking the session replays when they were sent, exposing passport numbers and credit card data in each replay session.
We asked The App Analyst to look at a sample of apps that Glassbox had listed on its website as customers.
Not every app was leaking masked data; none of the apps we examined said they were recording a user’s screen – let alone sending them back to each company or directly to Glassbox’s cloud.
Without analyzing the data for each app, it’s impossible to know if an app is recording a user’s screens of how you’re using the app.
The orginal article.
Notably, the Research app seemed to be a repackaging of the Onavo Protect app, a different Facebook program that Apple banned last year for violating its rules on data collection by developers.
Missouri Republican Sen. Josh Hawley tweeted, “Wait a minute. Facebook PAID teenagers to install a surveillance device on their phones without telling them it gave Facebook power to spy on them? Some kids as young as 13. Are you serious?” Connecticut Democratic Sen. Richard Blumenthal sent TechCrunch a statement noting, “Wiretapping teens is not research, and it should never be permissible.”
Many seemed perfectly aware of all the digital activity they would be giving up, and that Facebook would be benefiting from it.
In a statement to CNBC, a Facebook spokesperson claimed that “Less than 5 percent of the people who chose to participate in this market research program were teens,” though it’s not clear how many underage individuals that might represent, or if it was possible that some lied about their age.
Facebook also isn’t the only tech giant in the game: Google had, until Wednesday, been running a data-hoovering program similar to Facebook Research called Screenwise Meter, which offered participants the opportunity to earn gift cards in exchange for allowing the company track various forms of their digital activity via an app or Google-provided router.
The dollar sign that programs like Facebook Research put in front of its exchange made it easier to see the kinds of bad deals users are being offered.
As security expert Will Strafach, speaking about the Facebook Research VPN, told TechCrunch, “[M]ost users are going to be unable to reasonably consent to this regardless of any agreement they sign, because there is no good way to articulate just how much power is handed to Facebook when you do this.
To me, this is the most startling thing about the Facebook Research VPN and many of the other digital privacy trade-offs we make.
The orginal article.
One constant is that machine learning teams have a hard time setting goals and setting expectations.
In the first week, the accuracy went from 35% to 65% percent but then over the next several months it never got above 68%. 68% accuracy was clearly the limit on the data with the best most up-to-date machine learning techniques.
My friend Pete Skomoroch was recently telling me how frustrating it was to do engineering standups as a data scientist working on machine learning.
Engineering projects generally move forward, but machine learning projects can completely stall.
Machine learning generally works well as long as you have lots of training data *and* the data you’re running on in production looks a lot like your training data.
Machine Learning requires lots and lots of relevant training data.
What’s Next?.The original goal of machine learning was mostly around smart decision making, but more and more we are trying to put machine learning into products we use.
As we start to rely more and more on machine learning algorithms, machine learning becomes an engineering discipline as much as a research topic.
The orginal article.
They want to compare healthy people’s brains to those of people with mental health disorders.
Psychiatry is seeking to measure the mind, which is not quite the same thing as the brain For the Virginia Tech team looking at my brain, computational psychiatry had already teased out new insights while they were working on a study published in Science in 2008.
The algorithm can find new patterns in our social behaviors, or see where and when a certain therapeutic intervention is effective, perhaps providing a template for preventative mental health treatment through exercises one can do to rewire the brain.
With those patterns in hand, Chiu imagines the ability to diagnose more acutely, say, a certain kind of depression, one that regularly manifests itself in a specific portion of the brain.
The fMRI has its problems: for instance, scientists are not truly looking at the brain, according to Science Alert.
There is a brain chemical composition that is associated with some depressed people, Greenberg says, but not all who meet the DSM criteria.
The lab’s approach asks what the brain is doing during a task while considering the entire brain.
As the afternoon sun slants through the windows of a common area – partitioned by a math-covered wall – Chiu and King-Casas take turns bouncing their young baby and discussing a future of psychiatry in which she may live: algorithm-driven diagnostic models, targeted therapies, and brain training methods, driven by real-time fMRI results, that shift psychiatry into the arena of preventative medicine.
The orginal article.